메뉴 건너뛰기




Volumn , Issue , 2011, Pages

Improved algorithms for linear stochastic bandits

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; PROBABILITY;

EID: 85162561761     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (1966)

References (27)
  • 2
    • 0344118814 scopus 로고    scopus 로고
    • Reinforcement learning with immediate rewards and linear hypotheses
    • N. Abe, A. W. Biermann, and P. M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37:263293, 2003.
    • (2003) Algorithmica , vol.37 , pp. 263293
    • Abe, N.1    Biermann, A.W.2    Long, P.M.3
  • 4
    • 62949181077 scopus 로고    scopus 로고
    • Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    • J.-Y. Audibert, R. Munos, and Csaba Szepesvári. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876-1902, 2009.
    • (2009) Theoretical Computer Science , vol.410 , Issue.19 , pp. 1876-1902
    • Audibert, J.-Y.1    Munos, R.2    Szepesvári, C.3
  • 5
    • 0034497786 scopus 로고    scopus 로고
    • Using upper confidence bounds for online learning
    • P. Auer. Using upper confidence bounds for online learning. In FOCS, pages 270-279, 2000.
    • (2000) FOCS , pp. 270-279
    • Auer, P.1
  • 6
    • 84860601617 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • P. Auer. Using confidence bounds for exploitation-exploration trade-offs. JMLR, 2002.
    • (2002) JMLR
    • Auer, P.1
  • 7
    • 0036568025 scopus 로고    scopus 로고
    • Finite time analysis of the multiarmed bandit problem
    • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002.
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 8
    • 77952027689 scopus 로고    scopus 로고
    • Online optimization in X-armed bandits
    • S. Bubeck, R. Munos, G. Stoltz, and Cs. Szepesvári. Online optimization in X-armed bandits. In NIPS, pages 201-208, 2008.
    • (2008) NIPS , pp. 201-208
    • Bubeck, S.1    Munos, R.2    Stoltz, G.3    Szepesvári, C.4
  • 10
    • 84888139304 scopus 로고    scopus 로고
    • Contextual bandits with linear payoff functions
    • W. Chu, L. Li, L. Reyzin, and R. E. Schapire. Contextual bandits with linear payoff functions. In AISTATS, 2011.
    • (2011) AISTATS
    • Chu, W.1    Li, L.2    Reyzin, L.3    Schapire, R.E.4
  • 11
    • 70349275222 scopus 로고    scopus 로고
    • Bandit algorithms for tree search
    • P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In UAI, 2007.
    • (2007) UAI
    • Coquelin, P.-A.1    Munos, R.2
  • 12
    • 84898072179 scopus 로고    scopus 로고
    • Stochastic linear optimization under bandit feedback
    • Rocco Servedio and Tong Zhang, editors
    • V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under bandit feedback. In Rocco Servedio and Tong Zhang, editors, COLT, pages 355-366, 2008.
    • (2008) COLT , pp. 355-366
    • Dani, V.1    Hayes, T.P.2    Kakade, S.M.3
  • 13
    • 4544277579 scopus 로고    scopus 로고
    • Self-normalized processes: Exponential inequalities, moment bounds and iterated logarithm laws
    • V. H. de la Peña, M. J. Klass, and T. L. Lai. Self-normalized processes: exponential inequalities, moment bounds and iterated logarithm laws. Annals of Probability, 32(3):1902-1933, 2004.
    • (2004) Annals of Probability , vol.32 , Issue.3 , pp. 1902-1933
    • De La Peña, V.H.1    Klass, M.J.2    Lai, T.L.3
  • 15
    • 84875634609 scopus 로고    scopus 로고
    • Robust selective sampling from single and multiple teachers
    • O. Dekel, C. Gentile, and K. Sridharan. Robust selective sampling from single and multiple teachers. In COLT, 2010.
    • (2010) COLT
    • Dekel, O.1    Gentile, C.2    Sridharan, K.3
  • 16
    • 0002384441 scopus 로고
    • On tail probabilities for martingales
    • D. A. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100-118, 1975.
    • (1975) The Annals of Probability , vol.3 , Issue.1 , pp. 100-118
    • Freedman, D.A.1
  • 17
    • 70049104217 scopus 로고    scopus 로고
    • On upper-confidence bound policies for non-stationary bandit problems
    • A. Garivier and E. Moulines. On upper-confidence bound policies for non-stationary bandit problems. Technical report, LTCI, 2008.
    • (2008) Technical Report LTCI
    • Garivier, A.1    Moulines, E.2
  • 19
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
    • (1985) Advances in Applied Mathematics , vol.6 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 20
    • 0000258837 scopus 로고
    • Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems
    • T. L. Lai and C. Z. Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1):154-166, 1982.
    • (1982) The Annals of Statistics , vol.10 , Issue.1 , pp. 154-166
    • Lai, T.L.1    Wei, C.Z.2
  • 27
    • 79958846996 scopus 로고    scopus 로고
    • Exploring compact reinforcement-learning representations with linear regression
    • AUAI Press
    • T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. Exploring compact reinforcement-learning representations with linear regression. In UAI, pages 591-598. AUAI Press, 2009
    • (2009) UAI , pp. 591-598
    • Walsh, T.J.1    Szita, I.2    Diuk, C.3    Littman, M.L.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.