메뉴 건너뛰기




Volumn 7568 LNAI, Issue , 2012, Pages 199-213

Thompson sampling: An asymptotically optimal finite-time analysis

Author keywords

[No Author keywords available]

Indexed keywords

ASYMPTOTIC RATE; ASYMPTOTICALLY OPTIMAL; BERNOULLI; FINITE-TIME ANALYSIS; LOWER BOUNDS; MULTI-ARMED BANDIT PROBLEM; NUMERICAL COMPARISON; OPTIMAL POLICIES; OPTIMALITY; THOMPSON;

EID: 84867888479     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-34106-9_18     Document Type: Conference Paper
Times cited : (486)

References (14)
  • 2
    • 78649420293 scopus 로고    scopus 로고
    • Regret bounds and minimax policies under partial monitoring
    • Audibert, J.-Y., Bubeck, S.: Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research 11, 2785-2836 (2010)
    • (2010) Journal of Machine Learning Research , vol.11 , pp. 2785-2836
    • Audibert, J.-Y.1    Bubeck, S.2
  • 3
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation trade-off using variance estimates in multi-armed bandits
    • Audibert, J.-Y., Munos, R., Szepesvaŕi, C.: Exploration- exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science 410(19), 1876-1902 (2009)
    • (2009) Theoretical Computer Science , vol.410 , Issue.19 , pp. 1876-1902
    • Audibert, J.-Y.1    Munos, R.2    Szepesvaŕi, C.3
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Auer, P., Cesa-Bianchi, N., Fischer, P.: Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2), 235-256 (2002)
    • (2002) Machine Learning , vol.47 , Issue.2 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 5
    • 85162416700 scopus 로고    scopus 로고
    • An empirical evaluation of thompson sampling
    • Chapelle, O., Li, L.: An empirical evaluation of thompson sampling. In: NIPS (2011)
    • (2011) NIPS
    • Chapelle, O.1    Li, L.2
  • 7
    • 78549244167 scopus 로고    scopus 로고
    • Solving two-armed bernoulli bandit problems using a bayesian learning automaton
    • Granmo, O.C.: Solving two-armed bernoulli bandit problems using a bayesian learning automaton. International Journal of Intelligent Computing and Cybernetics 3(2), 207-234 (2010)
    • (2010) International Journal of Intelligent Computing and Cybernetics , vol.3 , Issue.2 , pp. 207-234
    • Granmo, O.C.1
  • 9
    • 84867888879 scopus 로고    scopus 로고
    • On bayesian upper-confidence bounds for bandit problems
    • Kaufmann, E., Garivier, A., Cappé, O.: On bayesian upper-confidence bounds for bandit problems. In: AISTATS (2012)
    • (2012) AISTATS
    • Kaufmann, E.1    Garivier, A.2    Cappé, O.3
  • 10
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T.L., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1), 4-22 (1985)
    • (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 13
    • 80054114465 scopus 로고    scopus 로고
    • Deviations of Stochastic Bandit Regret
    • Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. Springer, Heidelberg
    • Salomon, A., Audibert, J.-Y.: Deviations of Stochastic Bandit Regret. In: Kivinen, J., Szepesvári, C., Ukkonen, E., Zeugmann, T. (eds.) ALT 2011. LNCS, vol. 6925, pp. 159-173. Springer, Heidelberg (2011)
    • (2011) LNCS , vol.6925 , pp. 159-173
    • Salomon, A.1    Audibert, J.-Y.2
  • 14
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • Thompson,W.R.: On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25, 285-294 (1933)
    • (1933) Biometrika , vol.25 , pp. 285-294
    • Thompson, W.R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.