메뉴 건너뛰기




Volumn 61, Issue 1, 2010, Pages 55-65

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Author keywords

multi armed bandit problem; regret

Indexed keywords


EID: 77957337199     PISSN: 00315303     EISSN: 15882829     Source Type: Journal    
DOI: 10.1007/s10998-010-3055-6     Document Type: Article
Times cited : (287)

References (11)
  • 1
    • 0000616723 scopus 로고
    • Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
    • Rajeev Agrawal, Sample mean based index policies with O(log n) regret for the multi-armed bandit problem, Adv. in Appl. Probab., 27 (1995), 1054-1078.
    • (1995) Adv. In Appl. Probab. , vol.27 , pp. 1054-1078
    • Agrawal, R.1
  • 2
    • 84898079018 scopus 로고    scopus 로고
    • Jean-Yves Audibert and Sébastien Bubeck, Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory (COLT2009), 2009, 217-226.
  • 3
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • Jean-Yves Audibert, Rémi Munos and Csaba Szepesvári, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, Theor. Comput. Sci., 410 (2009), 1876-1902.
    • (2009) Theor. Comput. Sci. , vol.410 , pp. 1876-1902
    • Audibert, J.-Y.1    Munos, R.2    Szepesvári, C.3
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-Time Analysis of the Multi-Armed Bandit Problem
    • Peter Auer, Nicolò Cesa-Bianchi and Paul Fischer, Finite-Time Analysis of the Multi-Armed Bandit Problem, Mach. Learn., 47 (2002), 235-256.
    • (2002) Mach. Learn. , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 6
    • 33745295134 scopus 로고    scopus 로고
    • Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
    • Eyal Even-Dar, Shie Mannor and Yishay Mansour, Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, J. Mach. Learn. Res., 7 (2006), 1079-1105.
    • (2006) J. Mach. Learn. Res. , vol.7 , pp. 1079-1105
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 7
    • 84947403595 scopus 로고
    • Probability inequalities for sums of bounded random variables
    • Wassily Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), 13-30.
    • (1963) J. Amer. Statist. Assoc. , vol.58 , pp. 13-30
    • Hoeffding, W.1
  • 8
    • 84898981061 scopus 로고    scopus 로고
    • Robert D. Kleinberg, Nearly Tight Bounds for the Continuum-Armed Bandit Problem, Advances in Neural Information Processing Systems 17, MIT Press, 2005, 697-704.
  • 9
    • 0002899547 scopus 로고
    • Asymptotically Efficient Adaptive Allocation Rules
    • Tze Leung Lai and Herbert Robbins, Asymptotically Efficient Adaptive Allocation Rules, Adv. in Appl. Math., 6 (1985), 4-22.
    • (1985) Adv. In Appl. Math. , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 10
    • 30044441333 scopus 로고    scopus 로고
    • The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
    • Shie Mannor and John N. Tsitsiklis, The Sample Complexity of Exploration in the Multi-Armed Bandit Problem, J. Mach. Learn. Res., 5 (2004), 623-648.
    • (2004) J. Mach. Learn. Res. , vol.5 , pp. 623-648
    • Mannor, S.1    Tsitsiklis, J.N.2
  • 11
    • 77957327017 scopus 로고    scopus 로고
    • Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.