메뉴 건너뛰기




Volumn , Issue , 2011, Pages 19-26

Learning is planning: Near Bayes-optimal reinforcement learning via Monte-Carlo tree search

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES; MONTE CARLO METHODS;

EID: 80053158617     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (33)

References (27)
  • 3
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47, 235-256. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 4
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R. I., & Tennenholtz, M. (2002). R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 6
  • 8
    • 23244466805 scopus 로고    scopus 로고
    • Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London
    • Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
    • (2003) On the Sample Complexity of Reinforcement Learning
    • Kakade, S.M.1
  • 15
    • 77950032550 scopus 로고    scopus 로고
    • Markov chain sampling methods for dirichlet process mixture models
    • Neal, R. M. (2000). Markov chain sampling methods for dirichlet process mixture models. Journal of Computational and Graphical Statistics, Vol. 9, pp. 249-265.
    • (2000) Journal of Computational and Graphical Statistics , vol.9 , pp. 249-265
    • Neal, R.M.1
  • 21
    • 55549110436 scopus 로고    scopus 로고
    • An analysis of modelbased interval estimation for Markov decision processes
    • Special Issue on Learning Theory
    • Strehl, A. L., & Littman, M. L. (2008). An analysis of modelbased interval estimation for Markov decision processes. Journal of Computer and System Sciences, 74, 1309-1331. Special Issue on Learning Theory.
    • (2008) Journal of Computer and System Sciences , vol.74 , pp. 1309-1331
    • Strehl, A.L.1    Littman, M.L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.