메뉴 건너뛰기




Volumn 410, Issue 19, 2009, Pages 1876-1902

Exploration-exploitation tradeoff using variance estimates in multi-armed bandits

Author keywords

Bernstein's inequality; Exploration exploitation tradeoff; High probability bound; Multi armed bandits; Risk analysis

Indexed keywords

ALGORITHMS; AMBER; COMMUNICATION CHANNELS (INFORMATION THEORY); RISK ASSESSMENT; SAFETY FACTOR;

EID: 62949181077     PISSN: 03043975     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.tcs.2009.01.016     Document Type: Article
Times cited : (548)

References (13)
  • 1
    • 0000616723 scopus 로고
    • Sample mean based index policies with O (log n) regret for the multi-armed bandit problem
    • Agrawal R. Sample mean based index policies with O (log n) regret for the multi-armed bandit problem. Advances in Applied Probability 27 (1995) 1054-1078
    • (1995) Advances in Applied Probability , vol.27 , pp. 1054-1078
    • Agrawal, R.1
  • 2
    • 33645704704 scopus 로고    scopus 로고
    • Ph.D. Thesis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris and Paris
    • J.-Y. Audibert, PAC-Bayesian statistical learning theory, Ph.D. Thesis, Laboratoire de Probabilités et Modèles Aléatoires, Universités Paris 6 and Paris 7, 2004. http://certis.enpc.fr/~audibert/ThesePack.zip
    • (2004) PAC-Bayesian statistical learning theory
    • Audibert, J.-Y.1
  • 3
    • 0036568025 scopus 로고    scopus 로고
    • Finite time analysis of the multiarmed bandit problem
    • Auer P., Cesa-Bianchi N., and Fischer P. Finite time analysis of the multiarmed bandit problem. Machine Learning 47 2-3 (2002) 235-256
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 0002384441 scopus 로고
    • On tail probabilities for martingales
    • Freedman D.A. On tail probabilities for martingales. The Annals of Probability 3 1 (1975) 100-118
    • (1975) The Annals of Probability , vol.3 , Issue.1 , pp. 100-118
    • Freedman, D.A.1
  • 6
    • 34250659969 scopus 로고    scopus 로고
    • Modification of UCT with patterns in Monte-Carlo go
    • Technical Report, INRIA RR-6062
    • S. Gelly, Y. Wang, R. Munos, O. Teytaud, Modification of UCT with patterns in Monte-Carlo go, Technical Report, INRIA RR-6062, 2006
    • (2006)
    • Gelly, S.1    Wang, Y.2    Munos, R.3    Teytaud, O.4
  • 10
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai T.L., and Robbins H. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6 (1985) 4-22
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 13
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • Thompson W.R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 (1933) 285-294
    • (1933) Biometrika , vol.25 , pp. 285-294
    • Thompson, W.R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.