-
1
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
Auer, P., N. Cesa-Bianchi, P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2-3) 235-256. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
2
-
-
0022409707
-
Optimal designs for clinical trials with dichotomous responses
-
Berry, D. A., L. M. Pearson. 1985. Optimal designs for clinical trials with dichotomous responses. Statist. Medicine 4(4) 497-508. (Pubitemid 16176681)
-
(1985)
Statistics in Medicine
, vol.4
, Issue.4
, pp. 497-508
-
-
Berry, D.A.1
Pearson, L.M.2
-
3
-
-
0343441515
-
Restless bandits, linear programming relaxations, and a primal-dual index heuristic
-
Bertsimas, D., J. Niño-Mora. 2000. Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1) 80-90.
-
(2000)
Oper. Res
, vol.48
, Issue.1
, pp. 80-90
-
-
Bertsimas, D.1
Niño-Mora, J.2
-
4
-
-
0009943101
-
Incomplete learning from endogenous data in dynamic allocation
-
Brezzi, M., T. L. Lai. 2000. Incomplete learning from endogenous data in dynamic allocation. Econometrica 68(6) 1511-1516.
-
(2000)
Econometrica
, vol.68
, Issue.6
, pp. 1511-1516
-
-
Brezzi, M.1
Lai, T.L.2
-
5
-
-
0036334330
-
Optimal learning and experimentation in bandit problems
-
Brezzi, M., T. L. Lai. 2002. Optimal learning and experimentation in bandit problems. J. Econom. Dynam. Control 27(1) 87-108.
-
(2002)
J. Econom. Dynam. Control
, vol.27
, Issue.1
, pp. 87-108
-
-
Brezzi, M.1
Lai, T.L.2
-
6
-
-
67649990621
-
Economic analysis of simulation selection problems
-
Chick, S. E., N. Gans. 2009. Economic analysis of simulation selection problems. Management Sci. 55(3) 421-437.
-
(2009)
Management Sci
, vol.55
, Issue.3
, pp. 421-437
-
-
Chick, S.E.1
Gans, N.2
-
7
-
-
77949359798
-
Sequential sampling to myopi-cally maximize the expected value of information
-
Chick, S. E., J. Branke, C. Schmidt. 2010. Sequential sampling to myopi-cally maximize the expected value of information. INFORMS J. Comput. 22(1) 71-80.
-
(2010)
INFORMS J. Comput
, vol.22
, Issue.1
, pp. 71-80
-
-
Chick, S.E.1
Branke, J.2
Schmidt, C.3
-
9
-
-
85152636797
-
Q-learning for bandit problems
-
Morgan Kaufmann, Tahoe City, CA
-
Duff, M. 1995. Q-learning for bandit problems. Proc. 12th Internat. Conf. Machine Learn., Morgan Kaufmann, Tahoe City, CA, 209-217.
-
(1995)
Proc. 12th Internat. Conf. Machine Learn.
, pp. 209-217
-
-
Duff, M.1
-
10
-
-
0001492860
-
Contributions to the two-armed bandit problem
-
Feldman, D. 1962. Contributions to the two-armed bandit problem. Ann. Math. Statist. 33(3) 847-856.
-
(1962)
Ann. Math. Statist
, vol.33
, Issue.3
, pp. 847-856
-
-
Feldman, D.1
-
11
-
-
55549135706
-
A knowledge gradient policy for sequential information collection
-
Frazier, P. I., W. B. Powell, S. Dayanik. 2008. A knowledge gradient policy for sequential information collection. SIAM J. Control Optim. 47(5) 2410-2439.
-
(2008)
SIAM J. Control Optim
, vol.47
, Issue.5
, pp. 2410-2439
-
-
Frazier, P.I.1
Powell, W.B.2
Dayanik, S.3
-
12
-
-
70449498873
-
The knowledge-gradient policy for correlated normal rewards
-
Frazier, P. I., W. B. Powell, S. Dayanik. 2009. The knowledge-gradient policy for correlated normal rewards. INFORMS J. Comput. 21(4) 599-613.
-
(2009)
INFORMS J. Comput
, vol.21
, Issue.4
, pp. 599-613
-
-
Frazier, P.I.1
Powell, W.B.2
Dayanik, S.3
-
15
-
-
84859609199
-
A dynamic allocation index for the sequential design of experiments
-
North-Holland, Amsterdam
-
Gittins, J. C., D. M. Jones. 1974. A dynamic allocation index for the sequential design of experiments. J. Gani, ed. Progress in Statistics. North-Holland, Amsterdam, 244-266.
-
(1974)
J. Gani, Ed. Progress in Statistics
, pp. 244-266
-
-
Gittins, J.C.1
Jones, D.M.2
-
16
-
-
0018709825
-
A dynamic allocation index for the discounted multiarmed bandit problem
-
DOI 10.2307/2335176
-
Gittins, J. C., D. M. Jones. 1979. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3) 561-565. (Pubitemid 10218405)
-
(1979)
Biometrika
, vol.66
, Issue.3
, pp. 561-565
-
-
Gittins, J.C.1
Jones, D.M.2
-
17
-
-
70349137085
-
The ratio index for budgeted learning, with applications
-
SIAM, Philadelphia
-
Goel, A., S. Khanna, B. Null. 2009. The ratio index for budgeted learning, with applications. Proc. 19th Annual ACM-SIAM Sympos. Discrete Algorithms, SIAM, Philadelphia, 18-27.
-
(2009)
Proc. 19th Annual ACM-SIAM Sympos. Discrete Algorithms
, pp. 18-27
-
-
Goel, A.1
Khanna, S.2
Null, B.3
-
18
-
-
0000511415
-
Bayesian look ahead one stage sampling allocations for selecting the largest normal mean
-
Gupta, S. S., K. J. Miescke. 1994. Bayesian look ahead one stage sampling allocations for selecting the largest normal mean. Statist. Papers 35(1) 169-177.
-
(1994)
Statist. Papers
, vol.35
, Issue.1
, pp. 169-177
-
-
Gupta, S.S.1
Miescke, K.J.2
-
19
-
-
0030590294
-
Bayesian look ahead one-stage sampling allocations for selection of the best population
-
DOI 10.1016/0378-3758(95)00169-7
-
Gupta, S. S., K. J. Miescke. 1996. Bayesian look ahead one-stage sampling allocations for selection of the best population. J. Statist. Planning Inference 54(2) 229-244. (Pubitemid 126161097)
-
(1996)
Journal of Statistical Planning and Inference
, vol.54
, Issue.2
, pp. 229-244
-
-
Gupta, S.S.1
Miescke, K.J.2
-
21
-
-
0023345261
-
The multi-armed bandit problem: Decomposition and computation
-
Katehakis, M. N., A. F. Veinott. 1987. The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2) 262-268.
-
(1987)
Math. Oper. Res
, vol.12
, Issue.2
, pp. 262-268
-
-
Katehakis, M.N.1
Veinott, A.F.2
-
22
-
-
0010948196
-
Further contributions to the two-armed bandit
-
Keener, R. 1985. Further contributions to the "two-armed bandit" problem. Ann. Statist. 13(1) 418-422.
-
(1985)
Problem. Ann. Statist
, vol.13
, Issue.1
, pp. 418-422
-
-
Keener, R.1
-
23
-
-
0000854435
-
Adaptive treatment allocation and the multi-armed bandit problem
-
Lai, T. L. 1987. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3) 1091-1114.
-
(1987)
Ann. Statist
, vol.15
, Issue.3
, pp. 1091-1114
-
-
Lai, T.L.1
-
24
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Lai, T. L., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1) 4-22.
-
(1985)
Adv. Appl. Math
, vol.6
, Issue.1
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
25
-
-
61449109791
-
Multi-armed bandit problems
-
A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds Springer, New York
-
Mahajan, A., D. Teneketzis. 2008. Multi-armed bandit problems. A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds. Foundations and Applications of Sensor Management. Springer, New York, 121-152.
-
(2008)
Foundations and Applications of Sensor Management
, pp. 121-152
-
-
Mahajan, A.1
Teneketzis, D.2
-
26
-
-
62949175102
-
A structured multiarmed bandit problem and the greedy policy
-
IEEE, Piscataway, NJ
-
Mersereau, A. J., P. Rusmevichientong, J. N. Tsitsiklis. 2008. A structured multiarmed bandit problem and the greedy policy. Proc. 47th IEEE Conf. Decision and Control, IEEE, Piscataway, NJ, 4945-4950.
-
(2008)
Proc. 47th IEEE Conf. Decision and Control
, pp. 4945-4950
-
-
Mersereau, A.J.1
Rusmevichientong, P.2
Tsitsiklis, J.N.3
-
28
-
-
34547966991
-
Multi-armed bandit problems with dependent arms
-
ACM International Proceedings Series, New York
-
Pandey, S., D. Chakrabarti, D. Agarwal. 2007. Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., ACM International Proceedings Series, New York, 721-728.
-
(2007)
Proc. 24th Internat. Conf. Machine Learn.
, pp. 721-728
-
-
Pandey, S.1
Chakrabarti, D.2
Agarwal, D.3
-
30
-
-
77951568757
-
A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies
-
M. D. Rosetti, R. R. Hill, B. Johansson, A. Dunkin, R. G. Ingalls, eds IEEE, Piscataway, NJ
-
Ryzhov, I. O., W. B. Powell. 2009a. A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies. M. D. Rosetti, R. R. Hill, B. Johansson, A. Dunkin, R. G. Ingalls, eds. Proc. 2009 Winter Simulation Conf., IEEE, Piscataway, NJ, 1492-1502.
-
(2009)
Proc. 2009 Winter Simulation Conf.
, pp. 1492-1502
-
-
Ryzhov, I.O.1
Powell, W.B.2
-
31
-
-
67650505320
-
The knowledge gradient algorithm for online subset selection
-
IEEE, Piscataway, NJ
-
Ryzhov, I. O., W. B. Powell. 2009b. The knowledge gradient algorithm for online subset selection. Proc. 2009 IEEE Sympos. Adaptive Dynam. Programming and Reinforcement Learn., IEEE, Piscataway, NJ, 137-144.
-
(2009)
Proc. 2009 IEEE Sympos. Adaptive Dynam. Programming and Reinforcement Learn.
, pp. 137-144
-
-
Ryzhov, I.O.1
Powell, W.B.2
-
32
-
-
79952942276
-
Information collection on a graph
-
Ryzhov, I. O., W. B. Powell. 2011. Information collection on a graph. Oper. Res. 59(1) 188-201.
-
(2011)
Oper. Res
, vol.59
, Issue.1
, pp. 188-201
-
-
Ryzhov, I.O.1
Powell, W.B.2
-
35
-
-
84898992015
-
On-line policy improvement using Monte Carlo search
-
M. C. Mozer, M. I. Jordan, T. Pesche, eds MIT Press, Cambridge, MA
-
Tesauro, G., G. R. Galperin. 1996. On-line policy improvement using Monte Carlo search. M. C. Mozer, M. I. Jordan, T. Pesche, eds. Advances in Neural Information Processing Systems, Vol. 9. MIT Press, Cambridge, MA, 1068-1074.
-
(1996)
Advances in Neural Information Processing Systems
, vol.9
, pp. 1068-1074
-
-
Tesauro, G.1
Galperin, G.R.2
-
36
-
-
85042938295
-
Optimistic linear programming gives logarithmic regret for irreducible MDPs
-
J. C. Platt, D. Koller, Y. Singer, S. Roweis, eds MIT Press, Cambridge, MA
-
Tewari, A., P. L. Bartlett. 2007. Optimistic linear programming gives logarithmic regret for irreducible MDPs. J. C. Platt, D. Koller, Y. Singer, S. Roweis, eds. Advances in Neural Information Processing Systems, Vol. 20. MIT Press, Cambridge, MA, 1505-1512.
-
(2007)
Advances in Neural Information Processing Systems
, vol.20
, pp. 1505-1512
-
-
Tewari, A.1
Bartlett, P.L.2
-
37
-
-
33646406807
-
Multi-armed bandit algorithms and empirical evaluation
-
SpringerVerlag, Berlin
-
Vermorel, J., M. Mohri. 2005. Multi-armed bandit algorithms and empirical evaluation. Proc. 16th Eur. Conf. Machine Learn., SpringerVerlag, Berlin, 437-448.
-
(2005)
Proc. 16th Eur. Conf. Machine Learn.
, pp. 437-448
-
-
Vermorel, J.1
Mohri, M.2
-
38
-
-
61449109791
-
Applications of multi-armed bandits to sensor management
-
A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds Springer, New York
-
Washburn, R. 2008. Applications of multi-armed bandits to sensor management. A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds. Foundations and Applications of Sensor Management. Springer, New York, 153-176.
-
(2008)
Foundations and Applications of Sensor Management
, pp. 153-176
-
-
Washburn, R.1
-
39
-
-
0000248624
-
Multi-armed bandits and the Gittins index
-
Whittle, P. 1980. Multi-armed bandits and the Gittins index. J. Royal Statist. Soc. B42(2) 143-149.
-
(1980)
J. Royal Statist. Soc. B
, vol.42
, Issue.2
, pp. 143-149
-
-
Whittle, P.1
-
40
-
-
67650362301
-
Some results on the Gittins index for a normal reward process
-
H. Ho, C. Ing, T. Lai, eds Institute of Mathematical Statistics, Beachwood, OH
-
Yao, Y. 2006. Some results on the Gittins index for a normal reward process. H. Ho, C. Ing, T. Lai, eds. Time Series and Related Topics: In Memory of Ching-Zong Wei. Institute of Mathematical Statistics, Beachwood, OH, 284-294.
-
(2006)
Time Series and Related Topics: In Memory of Ching-Zong Wei
, pp. 284-294
-
-
Yao, Y.1
|