SCOPUS 정보 검색 플랫폼

Theoretical Computer Science

Volumn 410, Issue 19, 2009, Pages 1876-1902

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

(3) Audibert, Jean Yves a,b Munos, Rémi c Szepesvári, Csaba d

a UNIVERSITÉ PARIS EST (France)

b ECOLE NORMALE SUPÉRIEURE (France)

c INRIA (France)

d UNIVERSITY OF ALBERTA (Canada)

Author keywords

Bernstein's inequality; Exploration exploitation tradeoff; High probability bound; Multi armed bandits; Risk analysis

Indexed keywords

ALGORITHMS; AMBER; COMMUNICATION CHANNELS (INFORMATION THEORY); RISK ASSESSMENT; SAFETY FACTOR;

BANDIT PROBLEMS; BERNSTEIN'S INEQUALITY; COMPETING ALGORITHMS; DECISION MAKERS; EXPLORATION AND EXPLOITATIONS; EXPLORATION-EXPLOITATION TRADEOFF; HIGH-PROBABILITY BOUND; MULTI-ARMED BANDIT PROBLEMS; MULTI-ARMED BANDITS; UPPER CONFIDENCE BOUNDS; VARIANCE ESTIMATES;

RISK ANALYSIS;

EID: 62949181077 PISSN: 03043975 EISSN: None Source Type: Journal
DOI: 10.1016/j.tcs.2009.01.016 Document Type: Article

Times cited : (548)

References (13)

1
- 0000616723
- Sample mean based index policies with O (log n) regret for the multi-armed bandit problem
- Agrawal R. Sample mean based index policies with O (log n) regret for the multi-armed bandit problem. Advances in Applied Probability 27 (1995) 1054-1078
- (1995) Advances in Applied Probability , vol.27 , pp. 1054-1078
- Agrawal, R.¹

2
- 33645704704
- Ph.D. Thesis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris and Paris
- J.-Y. Audibert, PAC-Bayesian statistical learning theory, Ph.D. Thesis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7, 2004. http://certis.enpc.fr/~audibert/ThesePack.zip
- (2004) PAC-Bayesian statistical learning theory
- Audibert, J.-Y.¹

3
- 0036568025
- Finite time analysis of the multiarmed bandit problem
- Auer P., Cesa-Bianchi N., and Fischer P. Finite time analysis of the multiarmed bandit problem. Machine Learning 47 2-3 (2002) 235-256
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

4
- 38149083517
- Exploration versus exploitation challenge
- P. Auer, N. Cesa-Bianchi, J. Shawe-Taylor, Exploration versus exploitation challenge, in: 2nd PASCAL Challenges Workshop, Pascal Network, 2006
- (2006) 2nd PASCAL Challenges Workshop, Pascal Network
- Auer, P.¹ Cesa-Bianchi, N.² Shawe-Taylor, J.³

5
- 0002384441
- On tail probabilities for martingales
- Freedman D.A. On tail probabilities for martingales. The Annals of Probability 3 1 (1975) 100-118
- (1975) The Annals of Probability , vol.3 , Issue.1 , pp. 100-118
- Freedman, D.A.¹

6
- 34250659969
- Modification of UCT with patterns in Monte-Carlo go
- Technical Report, INRIA RR-6062
- S. Gelly, Y. Wang, R. Munos, O. Teytaud, Modification of UCT with patterns in Monte-Carlo go, Technical Report, INRIA RR-6062, 2006
- (2006)
- Gelly, S.¹ Wang, Y.² Munos, R.³ Teytaud, O.⁴

7
- 84891584370
- Wiley, Chichester, NY
- Gittins J.C. Multi-Armed Bandit Allocation Indices. Wiley-Interscience Series in Systems and Optimization (1989), Wiley, Chichester, NY
- (1989) Wiley-Interscience Series in Systems and Optimization
- Gittins, J.C.¹

8
- 84947403595
- Probability inequalities for sums of bounded random variables
- Hoeffding W. Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association 58 (1963) 13-30
- (1963) Journal of the American Statistical Association , vol.58 , pp. 13-30
- Hoeffding, W.¹

9
- 33750293964
- Bandit based Monte-Carlo planning
- L. Kocsis, Cs. Szepesvári, Bandit based Monte-Carlo planning, in: Proceedings of the 17th European Conference on Machine Learning, ECML-2006, 2006, pp.282-293
- Proceedings of the 17th European Conference on Machine Learning , vol.ECML-2006
- Kocsis, L.¹ Szepesvári, C.²

10
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Lai T.L., and Robbins H. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6 (1985) 4-22
- (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

11
- 0029344133
- Machine learning and nonparametric bandit theory
- Lai T.L., and Yakowitz S. Machine learning and nonparametric bandit theory. IEEE Transactions on Automatic Control 40 (1995) 1199-1209
- (1995) IEEE Transactions on Automatic Control , vol.40 , pp. 1199-1209
- Lai, T.L.¹ Yakowitz, S.²

12
- 84966203785
- Some aspects of the sequential design of experiments
- Robbins H. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society 58 (1952) 527-535
- (1952) Bulletin of the American Mathematical Society , vol.58 , pp. 527-535
- Robbins, H.¹

13
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- Thompson W.R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 (1933) 285-294
- (1933) Biometrika , vol.25 , pp. 285-294
- Thompson, W.R.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.