메뉴 건너뛰기




Volumn 41, Issue 3, 2013, Pages 1516-1541

Kullback–leibler upper confidence bounds for optimal sequential allocation

Author keywords

Key words; Kullback Leibler divergence; Phrases. Multi armed bandit problems; Sequential testing; Upper confidence bound

Indexed keywords


EID: 84898949562     PISSN: 00905364     EISSN: None     Source Type: Journal    
DOI: 10.1214/13-AOS1119     Document Type: Article
Times cited : (356)

References (36)
  • 1
    • 0000616723 scopus 로고
    • Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
    • MR1358906
    • AGRAWAL, R. (1995). Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. in Appl. Probab. 27 1054–1078. MR1358906
    • (1995) Adv. in Appl. Probab. , vol.27 , pp. 1054-1078
    • Agrawal, R.1
  • 2
    • 78649420293 scopus 로고    scopus 로고
    • Regret bounds and minimax policies under partial monitoring
    • MR2738783
    • AUDIBERT, J.-Y. and BUBECK, S. (2010). Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 2785–2836. MR2738783
    • (2010) J. Mach. Learn. Res. , vol.11 , pp. 2785-2836
    • Audibert, J.-Y.1    Bubeck, S.2
  • 3
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • MR2514714
    • AUDIBERT, J.-Y., MUNOS, R. and SZEPESVÁRI, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410 1876–1902. MR2514714
    • (2009) Theoret. Comput. Sci. , vol.410 , pp. 1876-1902
    • Audibert, J.-Y.1    Munos, R.2    Szepesvári, C.3
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • AUER, P., CESA-BIANCHI, N. and FISCHER, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 235–256.
    • (2002) Machine Learning , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 84874045238 scopus 로고    scopus 로고
    • Regret analysis of stochastic and nonstochastic multiarmed bandit problems
    • BUBECK, S. and CESA-BIANCHI, N. (2012). Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends in Machine Learning 5 1–122.
    • (2012) Foundations and Trends in Machine Learning , vol.5 , pp. 1-122
    • Bubeck, S.1    Cesa-Bianchi, N.2
  • 6
    • 0030159874 scopus 로고    scopus 로고
    • Optimal adaptive policies for sequential allocation problems
    • MR1390571
    • BURNETAS, A. N. and KATEHAKIS, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Adv. in Appl. Math. 17 122–142. MR1390571
    • (1996) Adv. in Appl. Math. , vol.17 , pp. 122-142
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 7
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for Markov decision processes
    • MR1436581
    • BURNETAS, A. N. and KATEHAKIS, M. N. (1997). Optimal adaptive policies for Markov decision processes. Math. Oper. Res. 22 222–255. MR1436581
    • (1997) Math. Oper. Res. , vol.22 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 8
    • 0037249547 scopus 로고    scopus 로고
    • Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem
    • MR1959385
    • BURNETAS, A. N. and KATEHAKIS, M. N. (2003). Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem. Probab. Engrg. Inform. Sci. 17 53–82. MR1959385
    • (2003) Probab. Engrg. Inform. Sci. , vol.17 , pp. 53-82
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 11
    • 0009953451 scopus 로고
    • Optimal stopping and dynamic allocation
    • MR0914595
    • CHANG, F. and LAI, T. L. (1987). Optimal stopping and dynamic allocation. Adv. in Appl. Probab. 19 829–853. MR0914595
    • (1987) Adv. in Appl. Probab. , vol.19 , pp. 829-853
    • Chang, F.1    Lai, T.L.2
  • 13
    • 0003836047 scopus 로고    scopus 로고
    • 2nd ed. Applications of Mathematics New York Springer, New York. MR1619036
    • DEMBO, A. and ZEITOUNI, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Applications of Mathematics (New York) 38. Springer, New York. MR1619036
    • (1998) Large Deviations Techniques and Applications , pp. 38
    • Dembo, A.1    Zeitouni, O.2
  • 16
    • 0000169010 scopus 로고
    • Bandit processes and dynamic allocation indices (with discussion)
    • MR0547241
    • GITTINS, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 41 148–177. MR0547241
    • (1979) J. R. Stat. Soc. Ser. B Stat. Methodol. , vol.41 , pp. 148-177
    • Gittins, J.C.1
  • 18
    • 84947403595 scopus 로고
    • Probability inequalities for sums of bounded random variables
    • MR0144363
    • HOEFFDING, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30. MR0144363
    • (1963) J. Amer. Statist. Assoc. , vol.58 , pp. 13-30
    • Hoeffding, W.1
  • 20
    • 83155180573 scopus 로고    scopus 로고
    • An asymptotically optimal policy for finite support models in the multiarmed bandit problem
    • HONDA, J. and TAKEMURA, A. (2011). An asymptotically optimal policy for finite support models in the multiarmed bandit problem. Machine Learning 85 361–391.
    • (2011) Machine Learning , vol.85 , pp. 361-391
    • Honda, J.1    Takemura, A.2
  • 24
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • MR0776826
    • LAI, T. L. and ROBBINS, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22. MR0776826
    • (1985) Adv. in Appl. Math. , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 27
    • 34247553430 scopus 로고    scopus 로고
    • Concentration Inequalities and Model Selection
    • Springer, Berlin. MR2319879
    • MASSART, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin. MR2319879
    • (2007) Lecture Notes in Math , vol.1896
    • Massart, P.1
  • 28
    • 85057370061 scopus 로고    scopus 로고
    • Chapman & Hall/CRC, Boca Raton, FL
    • OWEN, A. B. (2001). Empirical Likelihood. Chapman & Hall/CRC, Boca Raton, FL.
    • (2001) Empirical Likelihood
    • Owen, A.B.1
  • 29
    • 84966203785 scopus 로고
    • Some aspects of the sequential design of experiments
    • MR0050246
    • ROBBINS, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58 527–535. MR0050246
    • (1952) Bull. Amer. Math. Soc. (N.S.) , vol.58 , pp. 527-535
    • Robbins, H.1
  • 30
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • THOMPSON, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 285–294.
    • (1933) Biometrika , vol.25 , pp. 285-294
    • Thompson, W.R.1
  • 31
    • 78650475463 scopus 로고
    • On the Theory of Apportionment
    • MR1507085
    • THOMPSON, W. R. (1935). On the Theory of Apportionment. Amer. J. Math. 57 450–456. MR1507085
    • (1935) Amer. J. Math. , vol.57 , pp. 450-456
    • Thompson, W.R.1
  • 34
    • 0000090155 scopus 로고
    • Sequential tests of statistical hypotheses
    • MR0013275
    • WALD, A. (1945). Sequential tests of statistical hypotheses. Ann. Math. Statist. 16 117–186. MR0013275
    • (1945) Ann. Math. Statist. , vol.16 , pp. 117-186
    • Wald, A.1
  • 35
    • 0001072450 scopus 로고
    • On the Gittins index for multiarmed bandits
    • MR1189430
    • WEBER, R. (1992). On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2 1024–1033. MR1189430
    • (1992) Ann. Appl. Probab. , vol.2 , pp. 1024-1033
    • Weber, R.1
  • 36
    • 0000248624 scopus 로고
    • Multi-armed bandits and the Gittins index
    • MR0583348
    • WHITTLE, P. (1980). Multi-armed bandits and the Gittins index. J. R. Stat. Soc. Ser. B Stat. Methodol. 42 143–149. MR0583348
    • (1980) J. R. Stat. Soc. Ser. B Stat. Methodol. , vol.42 , pp. 143-149
    • Whittle, P.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.