SCOPUS 정보 검색 플랫폼

Operations Research

Volumn 60, Issue 1, 2012, Pages 180-195

The knowledge gradient algorithm for a general class of online learning problems

(3) Ryzhov, Ilya O a Powell, Warren B b Frazier, Peter I c

a UNIVERSITY OF MARYLAND (United States)

b Princeton University (United States)

c Cornell University (United States)

Author keywords

Gittins index; Index policy; Knowledge gradient; Multiarmed bandit; Online learning; Optimal learning

Indexed keywords

GITTINS INDEX; INDEX POLICIES; KNOWLEDGE GRADIENT; MULTI ARMED BANDIT; ONLINE LEARNING; OPTIMAL LEARNING;

OPTIMIZATION;

E-LEARNING;

EID: 84859621831 PISSN: 0030364X EISSN: 15265463 Source Type: Journal
DOI: 10.1287/opre.1110.0999 Document Type: Article

Times cited : (136)

References (40)

1
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- DOI 10.1023/A:1013689704352, Computational Learning Theory
- Auer, P., N. Cesa-Bianchi, P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2-3) 235-256. (Pubitemid 34126111)
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

2
- 0022409707
- Optimal designs for clinical trials with dichotomous responses
- Berry, D. A., L. M. Pearson. 1985. Optimal designs for clinical trials with dichotomous responses. Statist. Medicine 4(4) 497-508. (Pubitemid 16176681)
- (1985) Statistics in Medicine , vol.4 , Issue.4 , pp. 497-508
- Berry, D.A.¹ Pearson, L.M.²

3
- 0343441515
- Restless bandits, linear programming relaxations, and a primal-dual index heuristic
- Bertsimas, D., J. Niño-Mora. 2000. Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1) 80-90.
- (2000) Oper. Res , vol.48 , Issue.1 , pp. 80-90
- Bertsimas, D.¹ Niño-Mora, J.²

4
- 0009943101
- Incomplete learning from endogenous data in dynamic allocation
- Brezzi, M., T. L. Lai. 2000. Incomplete learning from endogenous data in dynamic allocation. Econometrica 68(6) 1511-1516.
- (2000) Econometrica , vol.68 , Issue.6 , pp. 1511-1516
- Brezzi, M.¹ Lai, T.L.²

5
- 0036334330
- Optimal learning and experimentation in bandit problems
- Brezzi, M., T. L. Lai. 2002. Optimal learning and experimentation in bandit problems. J. Econom. Dynam. Control 27(1) 87-108.
- (2002) J. Econom. Dynam. Control , vol.27 , Issue.1 , pp. 87-108
- Brezzi, M.¹ Lai, T.L.²

6
- 67649990621
- Economic analysis of simulation selection problems
- Chick, S. E., N. Gans. 2009. Economic analysis of simulation selection problems. Management Sci. 55(3) 421-437.
- (2009) Management Sci , vol.55 , Issue.3 , pp. 421-437
- Chick, S.E.¹ Gans, N.²

7
- 77949359798
- Sequential sampling to myopi-cally maximize the expected value of information
- Chick, S. E., J. Branke, C. Schmidt. 2010. Sequential sampling to myopi-cally maximize the expected value of information. INFORMS J. Comput. 22(1) 71-80.
- (2010) INFORMS J. Comput , vol.22 , Issue.1 , pp. 71-80
- Chick, S.E.¹ Branke, J.² Schmidt, C.³

8
- 0003759417
- John Wiley & Sons, Hoboken, NJ
- DeGroot, M. H. 1970. Optimal Statistical Decisions. John Wiley & Sons, Hoboken, NJ.
- (1970) Optimal Statistical Decisions
- Degroot, M.H.¹

9
- 85152636797
- Q-learning for bandit problems
- Morgan Kaufmann, Tahoe City, CA
- Duff, M. 1995. Q-learning for bandit problems. Proc. 12th Internat. Conf. Machine Learn., Morgan Kaufmann, Tahoe City, CA, 209-217.
- (1995) Proc. 12th Internat. Conf. Machine Learn. , pp. 209-217
- Duff, M.¹

10
- 0001492860
- Contributions to the two-armed bandit problem
- Feldman, D. 1962. Contributions to the two-armed bandit problem. Ann. Math. Statist. 33(3) 847-856.
- (1962) Ann. Math. Statist , vol.33 , Issue.3 , pp. 847-856
- Feldman, D.¹

11
- 55549135706
- A knowledge gradient policy for sequential information collection
- Frazier, P. I., W. B. Powell, S. Dayanik. 2008. A knowledge gradient policy for sequential information collection. SIAM J. Control Optim. 47(5) 2410-2439.
- (2008) SIAM J. Control Optim , vol.47 , Issue.5 , pp. 2410-2439
- Frazier, P.I.¹ Powell, W.B.² Dayanik, S.³

12
- 70449498873
- The knowledge-gradient policy for correlated normal rewards
- Frazier, P. I., W. B. Powell, S. Dayanik. 2009. The knowledge-gradient policy for correlated normal rewards. INFORMS J. Comput. 21(4) 599-613.
- (2009) INFORMS J. Comput , vol.21 , Issue.4 , pp. 599-613
- Frazier, P.I.¹ Powell, W.B.² Dayanik, S.³

13
- 0000532482
- Response surface bandits
- Ginebra, J., M. K. Clayton. 1995. Response surface bandits. J. Royal Statist. Soc. B57(4) 771-784.
- (1995) J. Royal Statist. Soc B , vol.57 , Issue.4 , pp. 771-784
- Ginebra, J.¹ Clayton, M.K.²

14
- 84891584370
- John Wiley & Sons, New York
- Gittins, J. C. 1989. Multi-Armed Bandit Allocation Indices. John Wiley & Sons, New York.
- (1989) Multi-Armed Bandit Allocation Indices
- Gittins, J.C.¹

15
- 84859609199
- A dynamic allocation index for the sequential design of experiments
- North-Holland, Amsterdam
- Gittins, J. C., D. M. Jones. 1974. A dynamic allocation index for the sequential design of experiments. J. Gani, ed. Progress in Statistics. North-Holland, Amsterdam, 244-266.
- (1974) J. Gani, Ed. Progress in Statistics , pp. 244-266
- Gittins, J.C.¹ Jones, D.M.²

16
- 0018709825
- A dynamic allocation index for the discounted multiarmed bandit problem
- DOI 10.2307/2335176
- Gittins, J. C., D. M. Jones. 1979. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3) 561-565. (Pubitemid 10218405)
- (1979) Biometrika , vol.66 , Issue.3 , pp. 561-565
- Gittins, J.C.¹ Jones, D.M.²

17
- 70349137085
- The ratio index for budgeted learning, with applications
- SIAM, Philadelphia
- Goel, A., S. Khanna, B. Null. 2009. The ratio index for budgeted learning, with applications. Proc. 19th Annual ACM-SIAM Sympos. Discrete Algorithms, SIAM, Philadelphia, 18-27.
- (2009) Proc. 19th Annual ACM-SIAM Sympos. Discrete Algorithms , pp. 18-27
- Goel, A.¹ Khanna, S.² Null, B.³

18
- 0000511415
- Bayesian look ahead one stage sampling allocations for selecting the largest normal mean
- Gupta, S. S., K. J. Miescke. 1994. Bayesian look ahead one stage sampling allocations for selecting the largest normal mean. Statist. Papers 35(1) 169-177.
- (1994) Statist. Papers , vol.35 , Issue.1 , pp. 169-177
- Gupta, S.S.¹ Miescke, K.J.²

19
- 0030590294
- Bayesian look ahead one-stage sampling allocations for selection of the best population
- DOI 10.1016/0378-3758(95)00169-7
- Gupta, S. S., K. J. Miescke. 1996. Bayesian look ahead one-stage sampling allocations for selection of the best population. J. Statist. Planning Inference 54(2) 229-244. (Pubitemid 126161097)
- (1996) Journal of Statistical Planning and Inference , vol.54 , Issue.2 , pp. 229-244
- Gupta, S.S.¹ Miescke, K.J.²

20
- 0004280606
- MIT Press, Cambridge, MA
- Kaelbling, L. P. 1993. Learning in Embedded Systems. MIT Press, Cambridge, MA.
- (1993) Learning in Embedded Systems
- Kaelbling, L.P.¹

21
- 0023345261
- The multi-armed bandit problem: Decomposition and computation
- Katehakis, M. N., A. F. Veinott. 1987. The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2) 262-268.
- (1987) Math. Oper. Res , vol.12 , Issue.2 , pp. 262-268
- Katehakis, M.N.¹ Veinott, A.F.²

22
- 0010948196
- Further contributions to the two-armed bandit
- Keener, R. 1985. Further contributions to the "two-armed bandit" problem. Ann. Statist. 13(1) 418-422.
- (1985) Problem. Ann. Statist , vol.13 , Issue.1 , pp. 418-422
- Keener, R.¹

23
- 0000854435
- Adaptive treatment allocation and the multi-armed bandit problem
- Lai, T. L. 1987. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3) 1091-1114.
- (1987) Ann. Statist , vol.15 , Issue.3 , pp. 1091-1114
- Lai, T.L.¹

24
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Lai, T. L., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1) 4-22.
- (1985) Adv. Appl. Math , vol.6 , Issue.1 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

25
- 61449109791
- Multi-armed bandit problems
- A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds Springer, New York
- Mahajan, A., D. Teneketzis. 2008. Multi-armed bandit problems. A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds. Foundations and Applications of Sensor Management. Springer, New York, 121-152.
- (2008) Foundations and Applications of Sensor Management , pp. 121-152
- Mahajan, A.¹ Teneketzis, D.²

26
- 62949175102
- A structured multiarmed bandit problem and the greedy policy
- IEEE, Piscataway, NJ
- Mersereau, A. J., P. Rusmevichientong, J. N. Tsitsiklis. 2008. A structured multiarmed bandit problem and the greedy policy. Proc. 47th IEEE Conf. Decision and Control, IEEE, Piscataway, NJ, 4945-4950.
- (2008) Proc. 47th IEEE Conf. Decision and Control , pp. 4945-4950
- Mersereau, A.J.¹ Rusmevichientong, P.² Tsitsiklis, J.N.³

27
- 72349091790
- A structured multiarmed bandit problem and the greedy policy
- Mersereau, A. J., P. Rusmevichientong, J. N. Tsitsiklis. 2009. A structured multiarmed bandit problem and the greedy policy. IEEE Trans. Automatic Control 54(12) 2787-2802.
- (2009) IEEE Trans. Automatic Control , vol.54 , Issue.12 , pp. 2787-2802
- Mersereau, A.J.¹ Rusmevichientong, P.² Tsitsiklis, J.N.³

28
- 34547966991
- Multi-armed bandit problems with dependent arms
- ACM International Proceedings Series, New York
- Pandey, S., D. Chakrabarti, D. Agarwal. 2007. Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., ACM International Proceedings Series, New York, 721-728.
- (2007) Proc. 24th Internat. Conf. Machine Learn. , pp. 721-728
- Pandey, S.¹ Chakrabarti, D.² Agarwal, D.³

29
- 47349092417
- John Wiley & Sons, New York
- Powell, W. B. 2007. Approximate Dynamic Programming: Solving the Curses of Dimensionality. John Wiley & Sons, New York.
- (2007) Approximate Dynamic Programming: Solving the Curses of Dimensionality
- Powell, W.B.¹

30
- 77951568757
- A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies
- M. D. Rosetti, R. R. Hill, B. Johansson, A. Dunkin, R. G. Ingalls, eds IEEE, Piscataway, NJ
- Ryzhov, I. O., W. B. Powell. 2009a. A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies. M. D. Rosetti, R. R. Hill, B. Johansson, A. Dunkin, R. G. Ingalls, eds. Proc. 2009 Winter Simulation Conf., IEEE, Piscataway, NJ, 1492-1502.
- (2009) Proc. 2009 Winter Simulation Conf. , pp. 1492-1502
- Ryzhov, I.O.¹ Powell, W.B.²

31
- 67650505320
- The knowledge gradient algorithm for online subset selection
- IEEE, Piscataway, NJ
- Ryzhov, I. O., W. B. Powell. 2009b. The knowledge gradient algorithm for online subset selection. Proc. 2009 IEEE Sympos. Adaptive Dynam. Programming and Reinforcement Learn., IEEE, Piscataway, NJ, 137-144.
- (2009) Proc. 2009 IEEE Sympos. Adaptive Dynam. Programming and Reinforcement Learn. , pp. 137-144
- Ryzhov, I.O.¹ Powell, W.B.²

32
- 79952942276
- Information collection on a graph
- Ryzhov, I. O., W. B. Powell. 2011. Information collection on a graph. Oper. Res. 59(1) 188-201.
- (2011) Oper. Res , vol.59 , Issue.1 , pp. 188-201
- Ryzhov, I.O.¹ Powell, W.B.²

33
- 0004307223
- Springer, New York
- Steele, J. M. 2000. Stochastic Calculus and Financial Applications. Springer, New York.
- (2000) Stochastic Calculus and Financial Applications
- Steele, J.M.¹

34
- 0004007508
- MIT Press, Cambridge, MA
- Sutton, R. S., A. G. Barto. 1998. Reinforcement Learning. MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

35
- 84898992015
- On-line policy improvement using Monte Carlo search
- M. C. Mozer, M. I. Jordan, T. Pesche, eds MIT Press, Cambridge, MA
- Tesauro, G., G. R. Galperin. 1996. On-line policy improvement using Monte Carlo search. M. C. Mozer, M. I. Jordan, T. Pesche, eds. Advances in Neural Information Processing Systems, Vol. 9. MIT Press, Cambridge, MA, 1068-1074.
- (1996) Advances in Neural Information Processing Systems , vol.9 , pp. 1068-1074
- Tesauro, G.¹ Galperin, G.R.²

36
- 85042938295
- Optimistic linear programming gives logarithmic regret for irreducible MDPs
- J. C. Platt, D. Koller, Y. Singer, S. Roweis, eds MIT Press, Cambridge, MA
- Tewari, A., P. L. Bartlett. 2007. Optimistic linear programming gives logarithmic regret for irreducible MDPs. J. C. Platt, D. Koller, Y. Singer, S. Roweis, eds. Advances in Neural Information Processing Systems, Vol. 20. MIT Press, Cambridge, MA, 1505-1512.
- (2007) Advances in Neural Information Processing Systems , vol.20 , pp. 1505-1512
- Tewari, A.¹ Bartlett, P.L.²

37
- 33646406807
- Multi-armed bandit algorithms and empirical evaluation
- SpringerVerlag, Berlin
- Vermorel, J., M. Mohri. 2005. Multi-armed bandit algorithms and empirical evaluation. Proc. 16th Eur. Conf. Machine Learn., SpringerVerlag, Berlin, 437-448.
- (2005) Proc. 16th Eur. Conf. Machine Learn. , pp. 437-448
- Vermorel, J.¹ Mohri, M.²

38
- 61449109791
- Applications of multi-armed bandits to sensor management
- A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds Springer, New York
- Washburn, R. 2008. Applications of multi-armed bandits to sensor management. A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds. Foundations and Applications of Sensor Management. Springer, New York, 153-176.
- (2008) Foundations and Applications of Sensor Management , pp. 153-176
- Washburn, R.¹

39
- 0000248624
- Multi-armed bandits and the Gittins index
- Whittle, P. 1980. Multi-armed bandits and the Gittins index. J. Royal Statist. Soc. B42(2) 143-149.
- (1980) J. Royal Statist. Soc. B , vol.42 , Issue.2 , pp. 143-149
- Whittle, P.¹

40
- 67650362301
- Some results on the Gittins index for a normal reward process
- H. Ho, C. Ing, T. Lai, eds Institute of Mathematical Statistics, Beachwood, OH
- Yao, Y. 2006. Some results on the Gittins index for a normal reward process. H. Ho, C. Ing, T. Lai, eds. Time Series and Related Topics: In Memory of Ching-Zong Wei. Institute of Mathematical Statistics, Beachwood, OH, 284-294.
- (2006) Time Series and Related Topics: In Memory of Ching-Zong Wei , pp. 284-294
- Yao, Y.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.