SCOPUS 정보 검색 플랫폼

Volumn 19, Issue , 2011, Pages 359-376

The KL-UCB algorithm for bounded stochastic bandits and beyond

Author keywords

[No Author keywords available]

Indexed keywords

STOCHASTIC SYSTEMS;

BANDIT PROBLEMS; EXPONENTIAL FAMILY; FINITE-TIME ANALYSIS; INDEX POLICIES; LOWER BOUNDS; REGRET BOUNDS; SPECIFIC CLASS; TIME HORIZONS;

ALGORITHMS;

EID: 84898437076 PISSN: 15324435 EISSN: 15337928 Source Type: Journal
DOI: None Document Type: Conference Paper

Times cited : (455)

References (14)

1
- 0000616723
- Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
- R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054-1078, 1995.
- (1995) Advances in Applied Probability , vol.27 , Issue.4 , pp. 1054-1078
- Agrawal, R.¹

2
- 78649420293
- Regret bounds and minimax policies under partial monitoring
- J-Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Resaerch, 11:2785-2836, 2010.
- (2010) Journal of Machine Learning Resaerch , vol.11 , pp. 2785-2836
- Audibert, J.-Y.¹ Bubeck, S.²

3
- 62949181077
- Exploration-exploitation trade-off using variance estimates in multi-armed bandits
- J-Y. Audibert, R. Munos, and Cs. Szepesvári. Exploration- exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 2009.
- (2009) Theoretical Computer Science , vol.410 , Issue.19
- Audibert, J.-Y.¹ Munos, R.² Szepesvári, Cs.³

5
- 0031070051
- Optimal adaptive policies for markov decision processes
- A.N. Burnetas and M.N. Katehakis. Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, pages 222-255, 1997. (Pubitemid 127621321)
- (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
- Burnetas, A.N.¹ Katehakis, M.N.²

6
- 0001072895
- The use of confidence of fiducial limits illustration in the case of the binomial
- C.J. Clopper and E.S. Pearson. The use of confidence of fiducial limits illustration in the case of the binomial. Biometrika, 26:404-413, 1934.
- (1934) Biometrika , vol.26 , pp. 404-413
- Clopper, C.J.¹ Pearson, E.S.²

8
- 84898451125
- (in French). PhD thesis, Telecom ParisTech
- S. Filippi. Optimistic strategies in Reinforcement Learning (in French). PhD thesis, Telecom ParisTech, 2010. URL http://tel.archives-ouvertes.fr/tel- 00551401/.
- (2010) Optimistic Strategies in Reinforcement Learning
- Filippi, S.¹

10
- 0000169010
- Bandit processes and dynamic allocation indices
- J.C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society, Series B, 41(2):148-177, 1979.
- (1979) Journal of the Royal Statistical Society, Series B , vol.41 , Issue.2 , pp. 148-177
- Gittins, J.C.¹

12
- 0002899547
- Asymptotically efficient adaptive allocation rules
- T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4, 1985.
- (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4
- Lai, T.L.¹ Robbins, H.²

14
- 84898473682
- Springer, Berlin, Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003
- P. Massart. Concentration inequalities and model selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003.
- (2007) Concentration Inequalities and Model Selection, Volume 1896 of Lecture Notes in Mathematics
- Massart, P.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.