메뉴 건너뛰기




Volumn 2, Issue , 2012, Pages 1135-1142

Monte Carlo Bayesian reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

CROSS PRODUCT; FINITE SET; GENERAL APPROACH; GUARANTEED PERFORMANCE; MODEL PARAMETERS; MONTE CARLO; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; POINT-BASED APPROXIMATION; PRIOR KNOWLEDGE; STATE SPACE;

EID: 84867122397     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (23)

References (25)
  • 1
    • 78649507911 scopus 로고    scopus 로고
    • A Bayesian sampling approach to exploration in reinforcement learning
    • Asmuth, J., Li, L., Littman, M. L., Nouri, A., and Wingate, D. A Bayesian sampling approach to exploration in reinforcement learning. In UAI, 2009.
    • (2009) UAI
    • Asmuth, J.1    Li, L.2    Littman, M.L.3    Nouri, A.4    Wingate, D.5
  • 3
    • 70349431917 scopus 로고    scopus 로고
    • Using linear programming for Bayesian exploration in Markov Decision Processes
    • Castro, P. S. and Precup, D. Using linear programming for Bayesian exploration in Markov Decision Processes. In IJCAI, 2007.
    • (2007) IJCAI
    • Castro, P.S.1    Precup, D.2
  • 5
  • 7
    • 0032073263 scopus 로고    scopus 로고
    • Planning and acting in partially observable stochastic domains
    • Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99-134, 1998.
    • (1998) Artificial Intelligence , vol.101 , pp. 99-134
    • Kaelbling, L.P.1    Littman, M.L.2    Cassandra, A.R.3
  • 8
    • 84970332367 scopus 로고
    • Evolution of learning among pavlov strategies in a competitive environment with noise
    • Kraines, D. and Kraines, V. Evolution of learning among pavlov strategies in a competitive environment with noise. Journal of Conflict Resolution, 39:439-466, 1995.
    • (1995) Journal of Conflict Resolution , vol.39 , pp. 439-466
    • Kraines, D.1    Kraines, V.2
  • 9
    • 77955779940 scopus 로고    scopus 로고
    • SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces
    • Kurniawati, H., Hsu, D., and Lee, W. S. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In RSS, 2008.
    • (2008) RSS
    • Kurniawati, H.1    Hsu, D.2    Lee, W.S.3
  • 10
    • 57649134413 scopus 로고    scopus 로고
    • A perception driven autonomous urban vehicle
    • Leonard, J., How, J., and Teller, S. A perception driven autonomous urban vehicle. Journal of Field Robotics, 25(10): 727-774, 2008.
    • (2008) Journal of Field Robotics , vol.25 , Issue.10 , pp. 727-774
    • Leonard, J.1    How, J.2    Teller, S.3
  • 13
    • 47849106249 scopus 로고    scopus 로고
    • Human driver model and driver decision making for intersection driving
    • Liu, Y. and Ozguner, U. Human driver model and driver decision making for intersection driving. IEEE Intelligent Vehicles Symposium, pp. 642-647, 2007.
    • (2007) IEEE Intelligent Vehicles Symposium , pp. 642-647
    • Liu, Y.1    Ozguner, U.2
  • 14
    • 0141819580 scopus 로고    scopus 로고
    • PEGASUS: A policy search method for large MDPs and POMDPs
    • Ng, A. and Jordan, M. PEGASUS: A policy search method for large MDPs and POMDPs. In UAI, pp. 406-415, 2000.
    • (2000) UAI , pp. 406-415
    • Ng, A.1    Jordan, M.2
  • 15
    • 0027336968 scopus 로고
    • A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game
    • Nowak, M. and Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game. Nature, 364, 1993.
    • (1993) Nature , vol.364
    • Nowak, M.1    Sigmund, K.2
  • 16
    • 77954049897 scopus 로고    scopus 로고
    • Planning under uncertainty for robotic tasks with mixed observability
    • Ong, S. C. W., Png, S. W., Hsu, D., and Lee, W. S. Planning under uncertainty for robotic tasks with mixed observability. IJRR, 29(8):1053-1068, 2010.
    • (2010) IJRR , vol.29 , Issue.8 , pp. 1053-1068
    • Ong, S.C.W.1    Png, S.W.2    Hsu, D.3    Lee, W.S.4
  • 17
    • 84880772945 scopus 로고    scopus 로고
    • Point-based value iteration: An anytime algorithm for POMDPs
    • Pineau, J., Gordon, G., and Thrun, S. Point-based value iteration: An anytime algorithm for POMDPs. In IJCAI, pp. 1025-1032, 2003.
    • (2003) IJCAI , pp. 1025-1032
    • Pineau, J.1    Gordon, G.2    Thrun, S.3
  • 19
    • 77950356463 scopus 로고    scopus 로고
    • Model-based Bayesian reinforcement learning in partially observable domains
    • Poupart, P. and Vlassis, N. Model-based Bayesian reinforcement learning in partially observable domains. In ISAIM, 2008.
    • (2008) ISAIM
    • Poupart, P.1    Vlassis, N.2
  • 20
    • 34250730267 scopus 로고    scopus 로고
    • An analytic solution to discrete Bayesian reinforcement learning
    • Poupart, P., Vlassis, N., Hoey, J., and Regan, K. An analytic solution to discrete Bayesian reinforcement learning. In ICML, pp. 697-704, 2006.
    • (2006) ICML , pp. 697-704
    • Poupart, P.1    Vlassis, N.2    Hoey, J.3    Regan, K.4
  • 21
    • 77955213275 scopus 로고    scopus 로고
    • Model-based Bayesian reinforcement learning in large structured domains
    • Ross, S. and Pineau, J. Model-based Bayesian reinforcement learning in large structured domains. In UAI, 2008.
    • (2008) UAI
    • Ross, S.1    Pineau, J.2
  • 23
    • 85115971428 scopus 로고    scopus 로고
    • On some winning strategies for the Iterated Prisoner's Dilemma or Mr. Nice Guy and the Cosa Nostra
    • Slany, W. and Kienreich, W. On some winning strategies for the Iterated Prisoner's Dilemma or Mr. Nice Guy and the Cosa Nostra. In The Iterated Prisoners' Dilemma: 20 Years On, 2007.
    • (2007) The Iterated Prisoners' Dilemma: 20 Years on
    • Slany, W.1    Kienreich, W.2
  • 24
    • 80053262864 scopus 로고    scopus 로고
    • Point-based POMDP algorithms: Improved analysis and implementation
    • Smith, T. and Simmons, R. G. Point-based POMDP algorithms: Improved analysis and implementation. In UAI, pp. 542-547, 2005.
    • (2005) UAI , pp. 542-547
    • Smith, T.1    Simmons, R.G.2
  • 25
    • 31844436266 scopus 로고    scopus 로고
    • Bayesian sparse sampling for on-line reward optimization
    • Wang, T., Lizotte, D., Bowling, M., and Schuurmans, D. Bayesian sparse sampling for on-line reward optimization. In ICML, 2005.
    • (2005) ICML
    • Wang, T.1    Lizotte, D.2    Bowling, M.3    Schuurmans, D.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.