SCOPUS 정보 검색 플랫폼

Volumn 61, Issue 1, 2010, Pages 55-65

UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem

Author keywords

multi armed bandit problem; regret

Indexed keywords

EID: 77957337199 PISSN: 00315303 EISSN: 15882829 Source Type: Journal
DOI: 10.1007/s10998-010-3055-6 Document Type: Article

Times cited : (292)

References (11)

1
- 0000616723
- Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
- Rajeev Agrawal, Sample mean based index policies with O(log n) regret for the multi-armed bandit problem, Adv. in Appl. Probab., 27 (1995), 1054-1078.
- (1995) Adv. In Appl. Probab. , vol.27 , pp. 1054-1078
- Agrawal, R.¹

2
- 84898079018
- Jean-Yves Audibert and Sébastien Bubeck, Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory (COLT2009), 2009, 217-226.

3
- 62949181077
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- Jean-Yves Audibert, Rémi Munos and Csaba Szepesvári, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, Theor. Comput. Sci., 410 (2009), 1876-1902.
- (2009) Theor. Comput. Sci. , vol.410 , pp. 1876-1902
- Audibert, J.-Y.¹ Munos, R.² Szepesvári, C.³

4
- 0036568025
- Finite-Time Analysis of the Multi-Armed Bandit Problem
- Peter Auer, Nicolò Cesa-Bianchi and Paul Fischer, Finite-Time Analysis of the Multi-Armed Bandit Problem, Mach. Learn., 47 (2002), 235-256.
- (2002) Mach. Learn. , vol.47 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

5
- 0037709910
- The Nonstochastic Multiarmed Bandit Problem
- Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund and Robert E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM J. Comput., 32 (2002), 48-77.
- (2002) SIAM J. Comput. , vol.32 , pp. 48-77
- Auer, P.¹ Cesa-Bianchi, N.² Freund, Y.³ Schapire, R.E.⁴

6
- 33745295134
- Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
- Eyal Even-Dar, Shie Mannor and Yishay Mansour, Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, J. Mach. Learn. Res., 7 (2006), 1079-1105.
- (2006) J. Mach. Learn. Res. , vol.7 , pp. 1079-1105
- Even-Dar, E.¹ Mannor, S.² Mansour, Y.³

7
- 84947403595
- Probability inequalities for sums of bounded random variables
- Wassily Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), 13-30.
- (1963) J. Amer. Statist. Assoc. , vol.58 , pp. 13-30
- Hoeffding, W.¹

8
- 84898981061
- Robert D. Kleinberg, Nearly Tight Bounds for the Continuum-Armed Bandit Problem, Advances in Neural Information Processing Systems 17, MIT Press, 2005, 697-704.

9
- 0002899547
- Asymptotically Efficient Adaptive Allocation Rules
- Tze Leung Lai and Herbert Robbins, Asymptotically Efficient Adaptive Allocation Rules, Adv. in Appl. Math., 6 (1985), 4-22.
- (1985) Adv. In Appl. Math. , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

10
- 30044441333
- The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
- Shie Mannor and John N. Tsitsiklis, The Sample Complexity of Exploration in the Multi-Armed Bandit Problem, J. Mach. Learn. Res., 5 (2004), 623-648.
- (2004) J. Mach. Learn. Res. , vol.5 , pp. 623-648
- Mannor, S.¹ Tsitsiklis, J.N.²

11
- 77957327017
- Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.