메뉴 건너뛰기




Volumn 39, Issue 4, 2014, Pages 1221-1243

Learning to optimize via posterior sampling

Author keywords

Multiarmed bandits; Online optimization; Thompson sampling

Indexed keywords

RESEARCH;

EID: 84908172119     PISSN: 0364765X     EISSN: 15265471     Source Type: Journal    
DOI: 10.1287/moor.2014.0650     Document Type: Article
Times cited : (693)

References (36)
  • 3
    • 84908661477 scopus 로고    scopus 로고
    • Online-to-confidence-set conversions and application to sparse stochastic bandits
    • Lawrence ND, Girolami MA, eds. Proc. 15th Internat. Conf. Artificial Intelligence Statist. (AISTATS)
    • Abbasi-Yadkori Y, Pal D, Szepesvári C (2012) Online-to-confidence-set conversions and application to sparse stochastic bandits. Lawrence ND, Girolami MA, eds. Proc. 15th Internat. Conf. Artificial Intelligence Statist. (AISTATS), JMLR Workshop and Conference Proceedings, Vol. 22, 1-9.
    • (2012) JMLR Workshop and Conference Proceedings , vol.22 , pp. 1-9
    • Abbasi-Yadkori, Y.1    Pal, D.2    Szepesvári, C.3
  • 4
    • 84874084136 scopus 로고    scopus 로고
    • Analysis of Thompson sampling for the multi-armed bandit problem
    • Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Ann. Conf. Learn. Theory
    • Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Ann. Conf. Learn. Theory, JMLR Workshop and Conference Proceedings, Vol. 23, 39.1-39.26
    • (2012) JMLR Workshop and Conference Proceedings , vol.23 , pp. 391-3926
    • Agrawal, S.1    Goyal, N.2
  • 7
    • 84874098326 scopus 로고    scopus 로고
    • Bandits, query learning, and the haystack dimension
    • Kakade SM, von Luxburg U, eds. Proc. 24th Annual Conf. Learn. Theory (COLT)
    • Amin K, Kearns M, Syed U (2011) Bandits, query learning, and the haystack dimension. Kakade SM, von Luxburg U, eds. Proc. 24th Annual Conf. Learn. Theory (COLT), JMLR Workshop and Conference Proceedings, Vol. 19, 87-106.
    • (2011) JMLR Workshop and Conference Proceedings , vol.19 , pp. 87-106
    • Amin, K.1    Kearns, M.2    Syed, U.3
  • 9
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • Fischer P
    • Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235-256.
    • (2002) Machine Learn. , vol.47 , Issue.2 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2
  • 10
    • 84862295780 scopus 로고    scopus 로고
    • Contextual bandit algorithms with supervised learning guarantees
    • Proc. 14th Internat. Conf. Artificial Intelligence Statist. (AISTATS)
    • Beygelzimer A, Langford J, Li L, Reyzin L, Schapire RE (2011) Contextual bandit algorithms with supervised learning guarantees. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (AISTATS), JMLR Workshop and Conference Proceedings, Vol. 15, 19-26.
    • (2011) JMLR Workshop and Conference Proceedings , vol.15 , pp. 19-26
    • Beygelzimer, A.1    Langford, J.2    Li, L.3    Reyzin, L.4    Schapire, R.E.5
  • 14
    • 84898949562 scopus 로고    scopus 로고
    • Kullback-Leibler upper confidence bounds for optimal sequential allocation
    • Cappé O, Garivier A, Maillard O-A, Munos R, Stoltz G (2013) Kullback-Leibler upper confidence bounds for optimal sequential allocation. Ann. Statist. 41(3):1516-1541.
    • (2013) Ann. Statist. , vol.41 , Issue.3 , pp. 1516-1541
    • Cappé, O.1    Garivier, A.2    Maillard, O.-A.3    Munos, R.4    Stoltz, G.5
  • 19
    • 0018709825 scopus 로고
    • A dynamic allocation index for the discounted multiarmed bandit problem
    • Gittins JC, Jones DM (1979) A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3):561-565.
    • (1979) Biometrika , vol.66 , Issue.3 , pp. 561-565
    • Gittins, J.C.1    Jones, D.M.2
  • 22
    • 84867888479 scopus 로고    scopus 로고
    • Thompson sampling: An asymptotically optimal finite time analysis
    • Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds., Algorithmic Learn. (Springer-Verlag, Berlin, Heidelberg)
    • Kauffmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds., Algorithmic Learn. Theory, Lecture Notes in Computer Science, Vol. 7568 (Springer-Verlag, Berlin, Heidelberg), 586-594.
    • (2012) Theory, Lecture Notes in Computer Science , vol.7568 , pp. 586-594
    • Kauffmann, E.1    Korda, N.2    Munos, R.3
  • 24
    • 84898959192 scopus 로고    scopus 로고
    • Thompson sampling for one-dimensional exponential family bandits
    • Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds.
    • Korda N, Kaufmann E, Munos R (2013) Thompson sampling for one-dimensional exponential family bandits. Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, 1448-1456.
    • (2013) Adv. Neural Inform. Processing Systems , pp. 1448-1456
    • Korda, N.1    Kaufmann, E.2    Munos, R.3
  • 25
    • 0000854435 scopus 로고
    • Adaptive treatment allocation and the multi-armed bandit problem
    • Lai TL (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091-1114.
    • (1987) Ann. Statist. , vol.15 , Issue.3 , pp. 1091-1114
    • Lai, T.L.1
  • 26
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4-22.
    • (1985) Adv. Appl. Math. , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 29
    • 84900037074 scopus 로고    scopus 로고
    • Open problem: Regret bounds for Thompson sampling
    • Mannor S, Srebro B, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory (COLT)
    • Li L, Chapelle O (2012) Open problem: Regret bounds for Thompson sampling. Mannor S, Srebro B, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory (COLT), JMLR Workshop and Conference Proceedings, Vol. 23, 43.1-43.3.
    • (2012) JMLR Workshop and Conference Proceedings , vol.23 , pp. 431-433
    • Li, L.1    Chapelle, O.2
  • 30
    • 84864939787 scopus 로고    scopus 로고
    • Optimistic Bayesian sampling in contextual-bandit problems
    • May BC, Korda N, Lee A, Leslie DS (2012) Optimistic Bayesian sampling in contextual-bandit problems. J. Machine Learn. Res. 13(1):2069-2106.
    • (2012) J. Machine Learn. Res. , vol.13 , Issue.1 , pp. 2069-2106
    • May, B.C.1    Korda, N.2    Lee, A.3    Leslie, D.S.4
  • 32
    • 84859621831 scopus 로고    scopus 로고
    • The knowledge gradient algorithm for a general class of online learning problems
    • Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180-195.
    • (2012) Oper. Res. , vol.60 , Issue.1 , pp. 180-195
    • Ryzhov, I.O.1    Powell, W.B.2    Frazier, P.I.3
  • 33
    • 0001223006 scopus 로고
    • Computationally related problems
    • Sahni A (1974) Computationally related problems. SIAM J. Comput. 3(4):262-279.
    • (1974) SIAM J. Comput. , vol.3 , Issue.4 , pp. 262-279
    • Sahni, A.1
  • 34
    • 78650505735 scopus 로고    scopus 로고
    • A modern Bayesian look at the multi-armed bandit
    • Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639-658.
    • (2010) Appl. Stochastic Models Bus. Indust. , vol.26 , Issue.6 , pp. 639-658
    • Scott, S.L.1
  • 35
    • 84860236413 scopus 로고    scopus 로고
    • Information-theoretic regret bounds for Gaussian process optimization in the bandit setting
    • Srinivas N, Krause A, Kakade SM, Seeger M (2012) Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. Inform. Theory, IEEE Trans. 58(5):3250-3265.
    • (2012) Inform. Theory, IEEE Trans. , vol.58 , Issue.5 , pp. 3250-3265
    • Srinivas, N.1    Krause, A.2    Kakade, S.M.3    Seeger, M.4
  • 36
    • 84897517236 scopus 로고    scopus 로고
    • Stochastic simultaneous optimistic optimization
    • Dasgupta S, Mcallester D, eds. Proc. 30th Internat. Conf. Machine Learn. (ICML-13)
    • Valko M, Carpentier A, Munos R (2013) Stochastic simultaneous optimistic optimization. Dasgupta S, Mcallester D, eds. Proc. 30th Internat. Conf. Machine Learn. (ICML-13), JMLR Workshop and Conference Proceedings, Vol. 28, 19-27.
    • (2013) JMLR Workshop and Conference Proceedings , vol.28 , pp. 19-27
    • Valko, M.1    Carpentier, A.2    Munos, R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.