-
1
-
-
0018709825
-
A dynamic allocation index for the discounted multiarmed bandit problem
-
DOI 10.2307/2335176
-
J. C. Gittins, D. M. Jones, A dynamic allocation index for the discounted multiarmed bandit problem, Biometrika 66 (3) (1979) 561-565. (Pubitemid 10218405)
-
(1979)
Biometrika
, vol.66
, Issue.3
, pp. 561-565
-
-
Gittins, J.C.1
Jones, D.M.2
-
2
-
-
0022409707
-
Optimal designs for clinical trials with dichotomous responses
-
D. Berry, L. Pearson, Optimal designs for clinical trials with dichotomous responses, Statistics in Medicine 4 (4) (1985) 497-508. (Pubitemid 16176681)
-
(1985)
Statistics in Medicine
, vol.4
, Issue.4
, pp. 497-508
-
-
Berry, D.A.1
Pearson, L.M.2
-
3
-
-
0002955623
-
A dynamic allocation index for the sequential design of experiments
-
J. Gani (Ed)
-
J. C. Gittins, D. M. Jones, A dynamic allocation index for the sequential design of experiments, in: J. Gani (Ed.), Progress in Statistics, 1974, pp. 241-266.
-
Progress in Statistics
, vol.1974
, pp. 241-266
-
-
Gittins, J.C.1
Jones, D.M.2
-
5
-
-
0036334330
-
Optimal learning and experimentation in bandit problems
-
M. Brezzi, T. Lai, Optimal learning and experimentation in bandit problems, Journal of Economic Dynamics and Control 27 (1) (2002) 87-108.
-
(2002)
Journal of Economic Dynamics and Control
, vol.27
, Issue.1
, pp. 87-108
-
-
Brezzi, M.1
Lai, T.2
-
6
-
-
67650362301
-
Some results on the Gittins index for a normal reward process
-
H. Ho, C. Ing, T. Lai (Eds.), Institute of Mathematical Statistics, Beachwood, OH, USA
-
Y. Yao, Some results on the Gittins index for a normal reward process, in: H. Ho, C. Ing, T. Lai (Eds.), Time Series and Related Topics: In Memory of Ching-Zong Wei, Institute of Mathematical Statistics, Beachwood, OH, USA, 2006, pp. 284-294.
-
(2006)
Time Series and Related Topics: In Memory of Ching-Zong Wei
, pp. 284-294
-
-
Yao, Y.1
-
7
-
-
67649990621
-
Economic analysis of simulation selection options
-
S. Chick, N. Gans, Economic analysis of simulation selection options, Management Science 55 (3) (2009) 421-437.
-
(2009)
Management Science
, vol.55
, Issue.3
, pp. 421-437
-
-
Chick, S.1
Gans, N.2
-
8
-
-
0000854435
-
Adaptive treatment allocation and the multi-armed bandit problem
-
T. Lai, Adaptive treatment allocation and the multi-armed bandit problem, The Annals of Statistics 15 (3) (1987) 1091-1114.
-
(1987)
The Annals of Statistics
, vol.15
, Issue.3
, pp. 1091-1114
-
-
Lai, T.1
-
9
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
P. Auer, N. Cesa-Bianchi, P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning 47 (2-3) (2002) 235-256. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
10
-
-
21144463800
-
The learning component of dynamic allocation indices
-
J. Gittins, Y. Wang, The learning component of dynamic allocation indices, The Annals of Statistics 20 (3) (1992) 1625-1636.
-
(1992)
The Annals of Statistics
, vol.20
, Issue.3
, pp. 1625-1636
-
-
Gittins, J.1
Wang, Y.2
-
11
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai, H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics 6 (1985) 4-22.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
12
-
-
0000616723
-
Sample mean based index policies with O (log n) regret for the multi-armed bandit problem
-
R. Agrawal, Sample mean based index policies with O (log n) regret for the multi-armed bandit problem, Advances in Applied Probability 27 (4) (1995) 1054-1078.
-
(1995)
Advances in Applied Probability
, vol.27
, Issue.4
, pp. 1054-1078
-
-
Agrawal, R.1
-
13
-
-
79953827701
-
Distributed learning in multi-armed bandit with multiple players
-
K. Liu, Q. Zhao, Distributed Learning in Multi-Armed Bandit with Multiple Players, IEEE Transactions on Signal Processing 58 (11) (2010) 5667-5681.
-
(2010)
IEEE Transactions on Signal Processing
, vol.58
, Issue.11
, pp. 5667-5681
-
-
Liu, K.1
Zhao, Q.2
-
14
-
-
0004007508
-
-
The MIT Press, Cambridge, Massachusetts
-
R. Sutton, A. Barto, Reinforcement Learning, The MIT Press, Cambridge, Massachusetts, 1998.
-
(1998)
Reinforcement Learning
-
-
Sutton, R.1
Barto, A.2
-
16
-
-
0032640507
-
Stalking information: Bayesian inventory management with unobserved lost sales
-
M. A. Lariviere, E. Porteus, Stalking information: Bayesian inventory management with unobserved lost sales, Management Science 45 (3) (1999) 1346-363.
-
(1999)
Management Science
, vol.45
, Issue.3
, pp. 1346-1363
-
-
Lariviere, M.A.1
Porteus, E.2
-
17
-
-
77249163740
-
Dynamic pricing with a prior on market response
-
V. Farias, B. Van Roy, Dynamic pricing with a prior on market response, Operations Research 58 (1) (2010) 16-29.
-
(2010)
Operations Research
, vol.58
, Issue.1
, pp. 16-29
-
-
Farias, V.1
Van Roy, B.2
-
18
-
-
0000511415
-
Bayesian look ahead one stage sampling allocations for selecting the largest normal mean
-
S. Gupta, K. Miescke, Bayesian look ahead one stage sampling allocations for selecting the largest normal mean, Statistical Papers 35 (1994) 169-177.
-
(1994)
Statistical Papers
, vol.35
, pp. 169-177
-
-
Gupta, S.1
Miescke, K.2
-
19
-
-
0030590294
-
Bayesian look ahead one-stage sampling allocations for selection of the best population
-
DOI 10.1016/0378-3758(95)00169-7
-
S. Gupta, K. Miescke, Bayesian look ahead one-stage sampling allocations for selection of the best population, Journal of statistical planning and inference 54 (2) (1996) 229-244. (Pubitemid 126161097)
-
(1996)
Journal of Statistical Planning and Inference
, vol.54
, Issue.2
, pp. 229-244
-
-
Gupta, S.S.1
Miescke, K.J.2
-
20
-
-
55549135706
-
A knowledge gradient policy for sequential information collection
-
P. I. Frazier, W. B. Powell, S. Dayanik, A knowledge gradient policy for sequential information collection, SIAM Journal on Control and Optimization 47 (5) (2008) 2410-2439.
-
(2008)
SIAM Journal on Control and Optimization
, vol.47
, Issue.5
, pp. 2410-2439
-
-
Frazier, P.I.1
Powell, W.B.2
Dayanik, S.3
-
21
-
-
70449498873
-
The knowledge-gradient policy for correlated normal rewards
-
P. I. Frazier, W. B. Powell, S. Dayanik, The knowledge-gradient policy for correlated normal rewards, INFORMS J. on Computing 21 (4) (2009) 599-613.
-
(2009)
INFORMS J. on Computing
, vol.21
, Issue.4
, pp. 599-613
-
-
Frazier, P.I.1
Powell, W.B.2
Dayanik, S.3
-
22
-
-
77949359798
-
Sequential sampling to myopically maximize the expected value of information
-
S. Chick, J. Branke, C. Schmidt, Sequential Sampling to Myopically Maximize the Expected Value of Information, INFORMS J. on Computing 22 (1) (2010) 71-80.
-
(2010)
INFORMS J. on Computing
, vol.22
, Issue.1
, pp. 71-80
-
-
Chick, S.1
Branke, J.2
Schmidt, C.3
-
23
-
-
78651309095
-
Paradoxes in learning and the marginal value of information
-
P. Frazier, W. Powell, Paradoxes in learning and the marginal value of information, Decision Analysis 7 (4) (2011) 378-403.
-
(2011)
Decision Analysis
, vol.7
, Issue.4
, pp. 378-403
-
-
Frazier, P.1
Powell, W.2
-
24
-
-
67650505320
-
The knowledge gradient algorithm for online subset selection
-
Nashville, TN
-
I. O. Ryzhov, W. B. Powell, The knowledge gradient algorithm for online subset selection, in: Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, Nashville, TN, 2009, pp. 137-144.
-
(2009)
Proceedings of the 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
, pp. 137-144
-
-
Ryzhov, I.O.1
Powell, W.B.2
-
25
-
-
77951568757
-
A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies
-
M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds
-
I. O. Ryzhov, W. B. Powell, A Monte Carlo Knowledge Gradient Method For Learning Abatement Potential Of Emissions Reduction Technologies, in: M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds.), Proceedings of the 2009 Winter Simulation Conference, 2009, pp. 1492-1502.
-
(2009)
Proceedings of the 2009 Winter Simulation Conference
, pp. 1492-1502
-
-
Ryzhov, I.O.1
Powell, W.B.2
-
27
-
-
78650269856
-
On the robustness of a one-period look-ahead policy in multi-armed bandit problems
-
I. O. Ryzhov, P. I. Frazier, W. B. Powell, On the robustness of a one-period look-ahead policy in multi-armed bandit problems, in: Proceedings of the 2010 International Conference on Computational Science, 2010, pp. 1629-1638.
-
(2010)
Proceedings of the 2010 International Conference on Computational Science
, pp. 1629-1638
-
-
Ryzhov, I.O.1
Frazier, P.I.2
Powell, W.B.3
-
30
-
-
77951529657
-
The conjunction of the knowledge gradient and the economic approach to simulation selection
-
M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds)
-
S. E. Chick, P. I. Frazier, The Conjunction Of The Knowledge Gradient And The Economic Approach To Simulation Selection, in: M. Rosetti, R. Hill, B. Johansson, A. Dunkin, R. Ingalls (Eds.), Proceedings of the 2009 Winter Simulation Conference, 2009, pp. 528-539.
-
(2009)
Proceedings of the 2009 Winter Simulation Conference
, pp. 528-539
-
-
Chick, S.E.1
Frazier, P.I.2
|