메뉴 건너뛰기




Volumn 85, Issue 3, 2011, Pages 361-391

An asymptotically optimal policy for finite support models in the multiarmed bandit problem

Author keywords

Bandit problems; Convex optimization; Finite time regret; MED policy

Indexed keywords

ASYMPTOTIC BOUNDS; ASYMPTOTICALLY OPTIMAL; BANDIT PROBLEMS; CONVEX OPTIMIZATION TECHNIQUES; EXPLORATION AND EXPLOITATION; FINITE SUPPORTS; FINITE TIME; FINITE-TIME REGRET; MULTI-ARMED BANDIT PROBLEM; MULTIPLE ARMS; UPPER AND LOWER BOUNDS; UPPER BOUND;

EID: 83155180573     PISSN: 08856125     EISSN: 15730565     Source Type: Journal    
DOI: 10.1007/s10994-011-5257-4     Document Type: Article
Times cited : (48)

References (23)
  • 2
    • 0000616723 scopus 로고
    • Sample mean based index policies with o (log n) regret for the multi-armed bandit problem
    • Agrawal, R. (1995b). Sample mean based index policies with o (log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27, 1054-1078.
    • (1995) Advances in Applied Probability , vol.27 , pp. 1054-1078
    • Agrawal, R.1
  • 3
    • 84898079018 scopus 로고    scopus 로고
    • Minimax policies for adversarial and stochastic bandits
    • Montreal: Omnipress
    • Audibert, J.-Y., & Bubeck, S. (2009). Minimax policies for adversarial and stochastic bandits. In Proceedings of COLT 2009. Montreal: Omnipress.
    • (2009) Proceedings of COLT 2009
    • Audibert, J.-Y.1    Bubeck, S.2
  • 4
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002a). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235-256. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 7
    • 0030159874 scopus 로고    scopus 로고
    • Optimal adaptive policies for sequential allocation problems
    • DOI 10.1006/aama.1996.0007
    • Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17, 122-142. (Pubitemid 126160351)
    • (1996) Advances in Applied Mathematics , vol.17 , Issue.2 , pp. 122-142
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 9
    • 84937398609 scopus 로고    scopus 로고
    • Pac bounds for multi-armed bandit and Markov decision processes
    • London: Springer
    • Even-Dar, E., Mannor, S., & Mansour, Y. (2002). Pac bounds for multi-armed bandit and Markov decision processes. In Proceedings of COLT 2002 (pp. 255-270). London: Springer.
    • (2002) Proceedings of COLT 2002 , pp. 255-270
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 12
    • 84898077171 scopus 로고    scopus 로고
    • An asymptotically optimal bandit algorithm for bounded support models
    • Haifa, Israel
    • Honda, J., & Takemura, A. (2010). An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of COLT 2010, Haifa, Israel (pp. 67-79).
    • (2010) Proceedings of COLT 2010 , pp. 67-79
    • Honda, J.1    Takemura, A.2
  • 15
    • 84898981061 scopus 로고    scopus 로고
    • Nearly tight bounds for the continuum-armed bandit problem
    • New York: MIT Press
    • Kleinberg, R. (2005). Nearly tight bounds for the continuum-armed bandit problem. In Proceedings of NIPS 2005 (pp. 697-704). New York: MIT Press.
    • (2005) Proceedings of NIPS 2005 , pp. 697-704
    • Kleinberg, R.1
  • 16
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T. L., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6, 4-22.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 17
    • 0032679082 scopus 로고    scopus 로고
    • Exploration of multi-state environments: Local measures and backpropagation of uncertainty
    • Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local measures and backpropagation of uncertainty. Machine Learning, 35, 117-154.
    • (1999) Machine Learning , vol.35 , pp. 117-154
    • Meuleau, N.1    Bourgine, P.2
  • 19
  • 20
    • 14344258433 scopus 로고    scopus 로고
    • A Bayesian framework for reinforcement learning
    • San Francisco: Kaufmann
    • Strens, M. (2000). A Bayesian framework for reinforcement learning. In Proceedings of ICML 2000 (pp. 943-950). San Francisco: Kaufmann.
    • (2000) Proceedings of ICML 2000 , pp. 943-950
    • Strens, M.1
  • 21
    • 33646406807 scopus 로고    scopus 로고
    • Multi-armed bandit algorithms and empirical evaluation
    • Porto, Portugal, Berlin: Springer
    • Vermorel, J., & Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In Proceedings of ECML 2005, Porto, Portugal (pp. 437-448). Berlin: Springer.
    • (2005) Proceedings of ECML 2005 , pp. 437-448
    • Vermorel, J.1    Mohri, M.2
  • 22
    • 0008954974 scopus 로고    scopus 로고
    • Doctoral dissertation, Department of Artificial Intelligence, University of Edinburgh
    • Wyatt, J. (1997). Exploration and inference in learning from reinforcement. Doctoral dissertation, Department of Artificial Intelligence, University of Edinburgh.
    • (1997) Exploration and Inference in Learning From Reinforcement
    • Wyatt, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.