메뉴 건너뛰기




Volumn 6, Issue , 2017, Pages 4133-4148

Why is posterior sampling better than optimism for reinforcement learning?

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; MARKOV PROCESSES; REINFORCEMENT LEARNING;

EID: 85048551505     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (68)

References (34)
  • 5
    • 0041965975 scopus 로고    scopus 로고
    • R-max - A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, Ronen I. and Tennenholtz, Moshe. R-max - A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 7
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for markov decision processes
    • Burnetas, Apostolos N and Katehakis, Michael N. Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1):222-255, 1997.
    • (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 8
    • 84898072179 scopus 로고    scopus 로고
    • Stochastic linear optimization under bandit feedback
    • Dani, Varsha, Hayes, Thomas P., and Kakadc, Sham M. Stochastic linear optimization under bandit feedback. In COLT, pp. 355-366, 2008.
    • (2008) COLT , pp. 355-366
    • Dani, V.1    Hayes, T.P.2    Kakadc, S.M.3
  • 14
    • 77951952841 scopus 로고    scopus 로고
    • Near- optimal regret bounds for reinforcement learning
    • Jaksch, Thomas, Ortner, Ronald, and Auer, Peter. Near- optimal regret bounds for reinforcement learning. Journal of Machine Learning Research, 11:1563-1600, 2010.
    • (2010) Journal of Machine Learning Research , vol.11 , pp. 1563-1600
    • Jaksch, T.1    Ortner, R.2    Auer, P.3
  • 16
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Kearns, Michael J. and Singh, Satinder P. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 18
  • 25
    • 84899019264 scopus 로고    scopus 로고
    • (More) Efficient reinforcement learning via posterior sampling
    • Curran Associates, Inc
    • Osband, Ian, Russo, Daniel, and Van Roy, Benjamin. (More) Efficient reinforcement learning via posterior sampling. In NIPS, pp. 3003-3011. Curran Associates, Inc., 2013.
    • (2013) NIPS , pp. 3003-3011
    • Osband, I.1    Russo, D.2    Van Roy Benjamin3
  • 27
    • 84908172119 scopus 로고    scopus 로고
    • Learning to optimize via posterior sampling
    • Russo, Daniel and Van Roy, Benjamin. Learning to optimize via posterior sampling. Mathematics of Operations Research, 39(4): 1221-1243, 2014.
    • (2014) Mathematics of Operations Research , vol.39 , Issue.4 , pp. 1221-1243
    • Russo, D.1    Van Roy, B.2
  • 30
    • 14344258433 scopus 로고    scopus 로고
    • A Bayesian framework for reinforcement learning
    • Strens, Malcolm J. A. A Bayesian framework for reinforcement learning. In ICML, pp. 943-950, 2000.
    • (2000) ICML , pp. 943-950
    • Strens, M.J.A.1
  • 33
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • Thompson, W.R. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4):285-294, 1933.
    • (1933) Biometrika , vol.25 , Issue.3-4 , pp. 285-294
    • Thompson, W.R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.