메뉴 건너뛰기




Volumn 12, Issue , 2011, Pages 1655-1695

X-armed bandits

Author keywords

Bandits with infinitely many arms; Minimax rates; Optimistic online optimization; Regret bounds

Indexed keywords

BANDITS WITH INFINITELY MANY ARMS; DECISION MAKERS; DISSIMILARITY FUNCTION; EUCLIDEAN SPACES; FINITE NUMBER; GLOBAL MAXIMUM; HYPERCUBE; LARGE CLASS; LIPSCHITZ; MEASURABLE SPACE; MINIMAX; ONLINE OPTIMIZATION; OPTIMALITY; REGRET BOUNDS; TIME STEP;

EID: 79960128338     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (414)

References (28)
  • 2
    • 0000616723 scopus 로고
    • Sample mean based index policies with o(logn) regret for the multi-armed bandit problem
    • R. Agrawal. Sample mean based index policies with o(logn) regret for the multi-armed bandit problem. Advances in Applied Mathematics, 27:1054-1078, 1995a.
    • (1995) Advances in Applied Mathematics , vol.27 , pp. 1054-1078
    • Agrawal, R.1
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning Journal, 47(2-3):235-256, 2002a. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 8
    • 77952027689 scopus 로고    scopus 로고
    • Online optimization in X-armed bandits
    • D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors
    • S. Bubeck, R. Munos, G. Stoltz, and Cs. Szepesvari. Online optimization in X-armed bandits. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 201-208, 2009.
    • (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 201-208
    • Bubeck, S.1    Munos, R.2    Stoltz, G.3    Szepesvari, Cs.4
  • 9
    • 79952624396 scopus 로고    scopus 로고
    • Pure exploration in multi-armed bandits problems
    • S. Bubeck, R.Munos, and G. Stoltz. Pure exploration in multi-armed bandits problems. Theoretical Computer Science, 412:1832-1852, 2011.
    • (2011) Theoretical Computer Science , vol.412 , pp. 1832-1852
    • Bubeck, S.1    Munos, R.2    Stoltz, G.3
  • 12
    • 67649577204 scopus 로고    scopus 로고
    • Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces
    • E. Cope. Regret and convergence bounds for immediate-reward reinforcement learning with continuous action spaces. IEEE Transactions on Automatic Control, 54(6):1243-1253, 2009.
    • (2009) IEEE Transactions on Automatic Control , vol.54 , Issue.6 , pp. 1243-1253
    • Cope, E.1
  • 19
    • 84891584370 scopus 로고
    • Wiley-Interscience Series in Systems and Optimization. Wiley, Chichester, NY
    • J. C. Gittins. Multi-armed Bandit Allocation Indices. Wiley-Interscience Series in Systems and Optimization. Wiley, Chichester, NY, 1989.
    • (1989) Multi-armed Bandit Allocation Indices
    • Gittins, J.C.1
  • 20
  • 25
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 26


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.