메뉴 건너뛰기




Volumn 19, Issue , 2011, Pages 359-376

The KL-UCB algorithm for bounded stochastic bandits and beyond

Author keywords

[No Author keywords available]

Indexed keywords

STOCHASTIC SYSTEMS;

EID: 84898437076     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Conference Paper
Times cited : (455)

References (14)
  • 1
    • 0000616723 scopus 로고
    • Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
    • R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054-1078, 1995.
    • (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1054-1078
    • Agrawal, R.1
  • 2
    • 78649420293 scopus 로고    scopus 로고
    • Regret bounds and minimax policies under partial monitoring
    • J-Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Resaerch, 11:2785-2836, 2010.
    • (2010) Journal of Machine Learning Resaerch , vol.11 , pp. 2785-2836
    • Audibert, J.-Y.1    Bubeck, S.2
  • 3
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation trade-off using variance estimates in multi-armed bandits
    • J-Y. Audibert, R. Munos, and Cs. Szepesvári. Exploration- exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 2009.
    • (2009) Theoretical Computer Science , vol.410 , Issue.19
    • Audibert, J.-Y.1    Munos, R.2    Szepesvári, Cs.3
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235-256, 2002. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for markov decision processes
    • A.N. Burnetas and M.N. Katehakis. Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, pages 222-255, 1997. (Pubitemid 127621321)
    • (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 6
    • 0001072895 scopus 로고
    • The use of confidence of fiducial limits illustration in the case of the binomial
    • C.J. Clopper and E.S. Pearson. The use of confidence of fiducial limits illustration in the case of the binomial. Biometrika, 26:404-413, 1934.
    • (1934) Biometrika , vol.26 , pp. 404-413
    • Clopper, C.J.1    Pearson, E.S.2
  • 7
    • 84937398609 scopus 로고    scopus 로고
    • PAC bounds for multi-armed bandit and Markov decision processes
    • Lecture Notes in Comput. Sci., Springer, Berlin
    • E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and Markov decision processes. In Conf. Comput. Learning Theory (Sydney, Australia, 2002), volume 2375 of Lecture Notes in Comput. Sci., pages 255-270. Springer, Berlin, 2002.
    • (2002) Conf. Comput. Learning Theory (Sydney, Australia, 2002) , vol.2375 , pp. 255-270
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 11
    • 84898077171 scopus 로고    scopus 로고
    • An asymptotically optimal bandit algorithm for bounded support models
    • T. Kalai and M. Mohri, editors, Haifa, Israel
    • J. Honda and A. Takemura. An asymptotically optimal bandit algorithm for bounded support models. In T. Kalai and M. Mohri, editors, Conf. Comput. Learning Theory, Haifa, Israel, 2010.
    • (2010) Conf. Comput. Learning Theory
    • Honda, J.1    Takemura, A.2
  • 12
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4
    • Lai, T.L.1    Robbins, H.2
  • 13
    • 84874038864 scopus 로고    scopus 로고
    • A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences
    • Budapest, Hungary
    • O-A. Maillard, R. Memos, and G. Stoltz. A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences. In Conf. Comput. Learning Theory, Budapest, Hungary, 2011.
    • (2011) Conf. Comput. Learning Theory
    • Maillard, O.-A.1    Memos, R.2    Stoltz, G.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.