메뉴 건너뛰기




Volumn 48, Issue , 2013, Pages 841-883

Scalable and efficient bayes-adaptive reinforcement learning based on Monte-Carlo tree search

Author keywords

[No Author keywords available]

Indexed keywords

BAYESIAN NETWORKS; LEARNING ALGORITHMS; MACHINE LEARNING; MONTE CARLO METHODS; UNCERTAINTY ANALYSIS;

EID: 84893049023     PISSN: None     EISSN: 10769757     Source Type: Journal    
DOI: 10.1613/jair.4117     Document Type: Article
Times cited : (71)

References (56)
  • 5
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2), 235-256. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 6
    • 84966211467 scopus 로고
    • The theory of dynamic programming
    • Bellman, R. (1954). The theory of dynamic programming. Bull. Amer. Math. Soc, 60(6), 503-515.
    • (1954) Bull. Amer. Math. Soc , vol.60 , Issue.6 , pp. 503-515
    • Bellman, R.1
  • 7
    • 0041965975 scopus 로고    scopus 로고
    • R-max-a general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R., & Tennenholtz, M. (2003). R-max-a general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3, 213-231.
    • (2003) The Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.1    Tennenholtz, M.2
  • 15
    • 0030260201 scopus 로고    scopus 로고
    • Exploration bonuses and dual control
    • Dayan, P., & Sejnowski, T. (1996). Exploration bonuses and dual control. Machine Learning, 25(1), 5-22. (Pubitemid 126724387)
    • (1996) Machine Learning , vol.25 , Issue.1 , pp. 5-22
    • Dayan, P.1    Sejnowski, T.J.2
  • 31
    • 0037840849 scopus 로고    scopus 로고
    • On the undecidability of probabilistic planning and related stochastic optimization problems
    • Madani, O., Hanks, S., & Condon, A. (2003). On the undecidability of probabilistic planning and related stochastic optimization problems. Artificial Intelligence, 147(1), 5-34.
    • (2003) Artificial Intelligence , vol.147 , Issue.1 , pp. 5-34
    • Madani, O.1    Hanks, S.2    Condon, A.3
  • 33
    • 0032679082 scopus 로고    scopus 로고
    • Exploration of multi-state environments: Local measures and back-propagation of uncertainty
    • Meuleau, N., & Bourgine, P. (1999). Exploration of multi-state environments: Local measures and back-propagation of uncertainty. Machine Learning, 35(2), 117-154.
    • (1999) Machine Learning , vol.35 , Issue.2 , pp. 117-154
    • Meuleau, N.1    Bourgine, P.2
  • 37
    • 79960110381 scopus 로고    scopus 로고
    • A Bayesian approach for learning and planning in Partially Observable Markov Decision Processes
    • Ross, S., Pineau, J., Chaib-draa, B., & Kreitmann, P. (2011). A Bayesian approach for learning and planning in Partially Observable Markov Decision Processes. Journal of Machine Learning Research, 12, 1729-1770.
    • (2011) Journal of Machine Learning Research , vol.12 , pp. 1729-1770
    • Ross, S.1    Pineau, J.2    Chaib-draa, B.3    Kreitmann, P.4
  • 46
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Citeseer
    • Sutton, R. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, Vol. 216, p. 224. Citeseer.
    • (1990) Proceedings of the Seventh International Conference on Machine Learning , vol.216 , pp. 224
    • Sutton, R.1
  • 48
    • 77955824148 scopus 로고    scopus 로고
    • Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers
    • Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers.
    • (2010) Algorithms for Reinforcement Learning
    • Szepesvári, C.1
  • 49
    • 0001395850 scopus 로고
    • On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
    • Thompson, W. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3/4), 285-294.
    • (1933) Biometrika , vol.25 , Issue.3-4 , pp. 285-294
    • Thompson, W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.