메뉴 건너뛰기




Volumn 19, Issue , 2011, Pages 497-514

A finite-time analysis of multi-armed bandits problems with Kullback-Leibler divergences

Author keywords

Finite time analysis; Kullback Leibler divergence; Multi armed bandit problem; Sanov's lemma

Indexed keywords

STATISTICS;

EID: 84874038864     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Conference Paper
Times cited : (87)

References (15)
  • 1
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation trade-off using variance estimates in multi-armed bandits
    • J-Y. Audibert, R. Munos, and C. Szepesvari. Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, 410:1876-1902, 2009.
    • (2009) Theoretical Computer Science , vol.410 , pp. 1876-1902
    • Audibert, J.-Y.1    Munos, R.2    Szepesvari, C.3
  • 2
    • 78649420293 scopus 로고    scopus 로고
    • Regret bounds and minimax policies under partial monitoring
    • J.Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11:2635-2686, 2010.
    • (2010) Journal of Machine Learning Research , vol.11 , pp. 2635-2686
    • Audibert, J.Y.1    Bubeck, S.2
  • 3
    • 77957337199 scopus 로고    scopus 로고
    • UCB revisited: Improved regret bounds for the stochastic multiarmed bandit problem
    • P. Auer and R. Ortner. UCB revisited: Improved regret bounds for the stochastic multiarmed bandit problem. Periodica Mathematica Hungarica, 61(1-2) :555, 2010.
    • (2010) Periodica Mathematica Hungarica , vol.61 , Issue.1-2 , pp. 555
    • Auer, P.1    Ortner, R.2
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 0030159874 scopus 로고    scopus 로고
    • Optimal adaptive policies for sequential allocation problems
    • DOI 10.1006/aama.1996.0007
    • A.N. Burnetas and M.N. Katehakis. Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2) :122-142, 1996. (Pubitemid 126160351)
    • (1996) Advances in Applied Mathematics , vol.17 , Issue.2 , pp. 122-142
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 9
    • 84863920694 scopus 로고    scopus 로고
    • The KL-UCB algorithm for bounded stochastic bandits and beyond
    • A. Garivier and O. Cappè. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of COLT, 2011.
    • (2011) Proceedings of COLT
    • Garivier, A.1    Cappè, O.2
  • 11
    • 84898077171 scopus 로고    scopus 로고
    • An asymptotically optimal bandit algorithm for bounded support models
    • J. Honda and A. Takemura. An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of COLT, pages 67-79, 2010a.
    • (2010) Proceedings of COLT , pp. 67-79
    • Honda, J.1    Takemura, A.2
  • 13
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 15


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.