메뉴 건너뛰기




Volumn 60, Issue 1, 2012, Pages 180-195

The knowledge gradient algorithm for a general class of online learning problems

Author keywords

Gittins index; Index policy; Knowledge gradient; Multiarmed bandit; Online learning; Optimal learning

Indexed keywords

GITTINS INDEX; INDEX POLICIES; KNOWLEDGE GRADIENT; MULTI ARMED BANDIT; ONLINE LEARNING; OPTIMAL LEARNING;

EID: 84859621831     PISSN: 0030364X     EISSN: 15265463     Source Type: Journal    
DOI: 10.1287/opre.1110.0999     Document Type: Article
Times cited : (136)

References (40)
  • 1
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • Auer, P., N. Cesa-Bianchi, P. Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2-3) 235-256. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 2
    • 0022409707 scopus 로고
    • Optimal designs for clinical trials with dichotomous responses
    • Berry, D. A., L. M. Pearson. 1985. Optimal designs for clinical trials with dichotomous responses. Statist. Medicine 4(4) 497-508. (Pubitemid 16176681)
    • (1985) Statistics in Medicine , vol.4 , Issue.4 , pp. 497-508
    • Berry, D.A.1    Pearson, L.M.2
  • 3
    • 0343441515 scopus 로고    scopus 로고
    • Restless bandits, linear programming relaxations, and a primal-dual index heuristic
    • Bertsimas, D., J. Niño-Mora. 2000. Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 48(1) 80-90.
    • (2000) Oper. Res , vol.48 , Issue.1 , pp. 80-90
    • Bertsimas, D.1    Niño-Mora, J.2
  • 4
    • 0009943101 scopus 로고    scopus 로고
    • Incomplete learning from endogenous data in dynamic allocation
    • Brezzi, M., T. L. Lai. 2000. Incomplete learning from endogenous data in dynamic allocation. Econometrica 68(6) 1511-1516.
    • (2000) Econometrica , vol.68 , Issue.6 , pp. 1511-1516
    • Brezzi, M.1    Lai, T.L.2
  • 5
    • 0036334330 scopus 로고    scopus 로고
    • Optimal learning and experimentation in bandit problems
    • Brezzi, M., T. L. Lai. 2002. Optimal learning and experimentation in bandit problems. J. Econom. Dynam. Control 27(1) 87-108.
    • (2002) J. Econom. Dynam. Control , vol.27 , Issue.1 , pp. 87-108
    • Brezzi, M.1    Lai, T.L.2
  • 6
    • 67649990621 scopus 로고    scopus 로고
    • Economic analysis of simulation selection problems
    • Chick, S. E., N. Gans. 2009. Economic analysis of simulation selection problems. Management Sci. 55(3) 421-437.
    • (2009) Management Sci , vol.55 , Issue.3 , pp. 421-437
    • Chick, S.E.1    Gans, N.2
  • 7
    • 77949359798 scopus 로고    scopus 로고
    • Sequential sampling to myopi-cally maximize the expected value of information
    • Chick, S. E., J. Branke, C. Schmidt. 2010. Sequential sampling to myopi-cally maximize the expected value of information. INFORMS J. Comput. 22(1) 71-80.
    • (2010) INFORMS J. Comput , vol.22 , Issue.1 , pp. 71-80
    • Chick, S.E.1    Branke, J.2    Schmidt, C.3
  • 9
    • 85152636797 scopus 로고
    • Q-learning for bandit problems
    • Morgan Kaufmann, Tahoe City, CA
    • Duff, M. 1995. Q-learning for bandit problems. Proc. 12th Internat. Conf. Machine Learn., Morgan Kaufmann, Tahoe City, CA, 209-217.
    • (1995) Proc. 12th Internat. Conf. Machine Learn. , pp. 209-217
    • Duff, M.1
  • 10
    • 0001492860 scopus 로고
    • Contributions to the two-armed bandit problem
    • Feldman, D. 1962. Contributions to the two-armed bandit problem. Ann. Math. Statist. 33(3) 847-856.
    • (1962) Ann. Math. Statist , vol.33 , Issue.3 , pp. 847-856
    • Feldman, D.1
  • 11
    • 55549135706 scopus 로고    scopus 로고
    • A knowledge gradient policy for sequential information collection
    • Frazier, P. I., W. B. Powell, S. Dayanik. 2008. A knowledge gradient policy for sequential information collection. SIAM J. Control Optim. 47(5) 2410-2439.
    • (2008) SIAM J. Control Optim , vol.47 , Issue.5 , pp. 2410-2439
    • Frazier, P.I.1    Powell, W.B.2    Dayanik, S.3
  • 12
    • 70449498873 scopus 로고    scopus 로고
    • The knowledge-gradient policy for correlated normal rewards
    • Frazier, P. I., W. B. Powell, S. Dayanik. 2009. The knowledge-gradient policy for correlated normal rewards. INFORMS J. Comput. 21(4) 599-613.
    • (2009) INFORMS J. Comput , vol.21 , Issue.4 , pp. 599-613
    • Frazier, P.I.1    Powell, W.B.2    Dayanik, S.3
  • 15
    • 84859609199 scopus 로고
    • A dynamic allocation index for the sequential design of experiments
    • North-Holland, Amsterdam
    • Gittins, J. C., D. M. Jones. 1974. A dynamic allocation index for the sequential design of experiments. J. Gani, ed. Progress in Statistics. North-Holland, Amsterdam, 244-266.
    • (1974) J. Gani, Ed. Progress in Statistics , pp. 244-266
    • Gittins, J.C.1    Jones, D.M.2
  • 16
    • 0018709825 scopus 로고
    • A dynamic allocation index for the discounted multiarmed bandit problem
    • DOI 10.2307/2335176
    • Gittins, J. C., D. M. Jones. 1979. A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3) 561-565. (Pubitemid 10218405)
    • (1979) Biometrika , vol.66 , Issue.3 , pp. 561-565
    • Gittins, J.C.1    Jones, D.M.2
  • 18
    • 0000511415 scopus 로고
    • Bayesian look ahead one stage sampling allocations for selecting the largest normal mean
    • Gupta, S. S., K. J. Miescke. 1994. Bayesian look ahead one stage sampling allocations for selecting the largest normal mean. Statist. Papers 35(1) 169-177.
    • (1994) Statist. Papers , vol.35 , Issue.1 , pp. 169-177
    • Gupta, S.S.1    Miescke, K.J.2
  • 19
    • 0030590294 scopus 로고    scopus 로고
    • Bayesian look ahead one-stage sampling allocations for selection of the best population
    • DOI 10.1016/0378-3758(95)00169-7
    • Gupta, S. S., K. J. Miescke. 1996. Bayesian look ahead one-stage sampling allocations for selection of the best population. J. Statist. Planning Inference 54(2) 229-244. (Pubitemid 126161097)
    • (1996) Journal of Statistical Planning and Inference , vol.54 , Issue.2 , pp. 229-244
    • Gupta, S.S.1    Miescke, K.J.2
  • 21
    • 0023345261 scopus 로고
    • The multi-armed bandit problem: Decomposition and computation
    • Katehakis, M. N., A. F. Veinott. 1987. The multi-armed bandit problem: Decomposition and computation. Math. Oper. Res. 12(2) 262-268.
    • (1987) Math. Oper. Res , vol.12 , Issue.2 , pp. 262-268
    • Katehakis, M.N.1    Veinott, A.F.2
  • 22
    • 0010948196 scopus 로고
    • Further contributions to the two-armed bandit
    • Keener, R. 1985. Further contributions to the "two-armed bandit" problem. Ann. Statist. 13(1) 418-422.
    • (1985) Problem. Ann. Statist , vol.13 , Issue.1 , pp. 418-422
    • Keener, R.1
  • 23
    • 0000854435 scopus 로고
    • Adaptive treatment allocation and the multi-armed bandit problem
    • Lai, T. L. 1987. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3) 1091-1114.
    • (1987) Ann. Statist , vol.15 , Issue.3 , pp. 1091-1114
    • Lai, T.L.1
  • 24
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T. L., H. Robbins. 1985. Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1) 4-22.
    • (1985) Adv. Appl. Math , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 25
    • 61449109791 scopus 로고    scopus 로고
    • Multi-armed bandit problems
    • A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds Springer, New York
    • Mahajan, A., D. Teneketzis. 2008. Multi-armed bandit problems. A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds. Foundations and Applications of Sensor Management. Springer, New York, 121-152.
    • (2008) Foundations and Applications of Sensor Management , pp. 121-152
    • Mahajan, A.1    Teneketzis, D.2
  • 28
    • 34547966991 scopus 로고    scopus 로고
    • Multi-armed bandit problems with dependent arms
    • ACM International Proceedings Series, New York
    • Pandey, S., D. Chakrabarti, D. Agarwal. 2007. Multi-armed bandit problems with dependent arms. Proc. 24th Internat. Conf. Machine Learn., ACM International Proceedings Series, New York, 721-728.
    • (2007) Proc. 24th Internat. Conf. Machine Learn. , pp. 721-728
    • Pandey, S.1    Chakrabarti, D.2    Agarwal, D.3
  • 30
    • 77951568757 scopus 로고    scopus 로고
    • A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies
    • M. D. Rosetti, R. R. Hill, B. Johansson, A. Dunkin, R. G. Ingalls, eds IEEE, Piscataway, NJ
    • Ryzhov, I. O., W. B. Powell. 2009a. A Monte Carlo knowledge gradient method for learning abatement potential of emissions reduction technologies. M. D. Rosetti, R. R. Hill, B. Johansson, A. Dunkin, R. G. Ingalls, eds. Proc. 2009 Winter Simulation Conf., IEEE, Piscataway, NJ, 1492-1502.
    • (2009) Proc. 2009 Winter Simulation Conf. , pp. 1492-1502
    • Ryzhov, I.O.1    Powell, W.B.2
  • 32
    • 79952942276 scopus 로고    scopus 로고
    • Information collection on a graph
    • Ryzhov, I. O., W. B. Powell. 2011. Information collection on a graph. Oper. Res. 59(1) 188-201.
    • (2011) Oper. Res , vol.59 , Issue.1 , pp. 188-201
    • Ryzhov, I.O.1    Powell, W.B.2
  • 35
    • 84898992015 scopus 로고    scopus 로고
    • On-line policy improvement using Monte Carlo search
    • M. C. Mozer, M. I. Jordan, T. Pesche, eds MIT Press, Cambridge, MA
    • Tesauro, G., G. R. Galperin. 1996. On-line policy improvement using Monte Carlo search. M. C. Mozer, M. I. Jordan, T. Pesche, eds. Advances in Neural Information Processing Systems, Vol. 9. MIT Press, Cambridge, MA, 1068-1074.
    • (1996) Advances in Neural Information Processing Systems , vol.9 , pp. 1068-1074
    • Tesauro, G.1    Galperin, G.R.2
  • 36
    • 85042938295 scopus 로고    scopus 로고
    • Optimistic linear programming gives logarithmic regret for irreducible MDPs
    • J. C. Platt, D. Koller, Y. Singer, S. Roweis, eds MIT Press, Cambridge, MA
    • Tewari, A., P. L. Bartlett. 2007. Optimistic linear programming gives logarithmic regret for irreducible MDPs. J. C. Platt, D. Koller, Y. Singer, S. Roweis, eds. Advances in Neural Information Processing Systems, Vol. 20. MIT Press, Cambridge, MA, 1505-1512.
    • (2007) Advances in Neural Information Processing Systems , vol.20 , pp. 1505-1512
    • Tewari, A.1    Bartlett, P.L.2
  • 37
    • 33646406807 scopus 로고    scopus 로고
    • Multi-armed bandit algorithms and empirical evaluation
    • SpringerVerlag, Berlin
    • Vermorel, J., M. Mohri. 2005. Multi-armed bandit algorithms and empirical evaluation. Proc. 16th Eur. Conf. Machine Learn., SpringerVerlag, Berlin, 437-448.
    • (2005) Proc. 16th Eur. Conf. Machine Learn. , pp. 437-448
    • Vermorel, J.1    Mohri, M.2
  • 38
    • 61449109791 scopus 로고    scopus 로고
    • Applications of multi-armed bandits to sensor management
    • A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds Springer, New York
    • Washburn, R. 2008. Applications of multi-armed bandits to sensor management. A. O. Hero, D. A. Castanon, D. Cochran, K. Kastella, eds. Foundations and Applications of Sensor Management. Springer, New York, 153-176.
    • (2008) Foundations and Applications of Sensor Management , pp. 153-176
    • Washburn, R.1
  • 39
    • 0000248624 scopus 로고
    • Multi-armed bandits and the Gittins index
    • Whittle, P. 1980. Multi-armed bandits and the Gittins index. J. Royal Statist. Soc. B42(2) 143-149.
    • (1980) J. Royal Statist. Soc. B , vol.42 , Issue.2 , pp. 143-149
    • Whittle, P.1
  • 40
    • 67650362301 scopus 로고    scopus 로고
    • Some results on the Gittins index for a normal reward process
    • H. Ho, C. Ing, T. Lai, eds Institute of Mathematical Statistics, Beachwood, OH
    • Yao, Y. 2006. Some results on the Gittins index for a normal reward process. H. Ho, C. Ing, T. Lai, eds. Time Series and Related Topics: In Memory of Ching-Zong Wei. Institute of Mathematical Statistics, Beachwood, OH, 284-294.
    • (2006) Time Series and Related Topics: In Memory of Ching-Zong Wei , pp. 284-294
    • Yao, Y.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.