SCOPUS 정보 검색 플랫폼

Proceedings of the 29th International Conference on Machine Learning, ICML 2012

Volumn 2, Issue , 2012, Pages 1135-1142

Monte Carlo Bayesian reinforcement learning

(4) Wang, Yi a Won, Kok Sung a Hsu, David a Lee, Wee Sun a

a NATIONAL UNIVERSITY OF SINGAPORE (Singapore)

Author keywords

[No Author keywords available]

Indexed keywords

CROSS PRODUCT; FINITE SET; GENERAL APPROACH; GUARANTEED PERFORMANCE; MODEL PARAMETERS; MONTE CARLO; PARTIALLY OBSERVABLE MARKOV DECISION PROCESS; POINT-BASED APPROXIMATION; PRIOR KNOWLEDGE; STATE SPACE;

APPROXIMATION ALGORITHMS; MONTE CARLO METHODS; PROBABILITY DISTRIBUTIONS;

REINFORCEMENT LEARNING;

EID: 84867122397 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (23)

References (25)

1
- 78649507911
- A Bayesian sampling approach to exploration in reinforcement learning
- Asmuth, J., Li, L., Littman, M. L., Nouri, A., and Wingate, D. A Bayesian sampling approach to exploration in reinforcement learning. In UAI, 2009.
- (2009) UAI
- Asmuth, J.¹ Li, L.² Littman, M.L.³ Nouri, A.⁴ Wingate, D.⁵

2
- 84936824515
- Basic Books, New York, NY
- Axelrod, R. The Evolution of Cooperation. Basic Books, New York, NY, 1984.
- (1984) The Evolution of Cooperation
- Axelrod, R.¹

3
- 70349431917
- Using linear programming for Bayesian exploration in Markov Decision Processes
- Castro, P. S. and Precup, D. Using linear programming for Bayesian exploration in Markov Decision Processes. In IJCAI, 2007.
- (2007) IJCAI
- Castro, P.S.¹ Precup, D.²

4
- 0031619316
- Bayesian Q-learning
- Dearden, R., Friedman, N., and Russell, S. Bayesian Q-learning. In AAAI, pp. 761-768, 1998.
- (1998) AAAI , pp. 761-768
- Dearden, R.¹ Friedman, N.² Russell, S.³

5
- 1142281527
- Model-based Bayesian exploration
- Dearden, R., Friedman, N., and Andre, D. Model-based Bayesian exploration. In UAI, 1999.
- (1999) UAI
- Dearden, R.¹ Friedman, N.² Andre, D.³

6
- 1942450858
- PhD thesis, University of Massachusetts Amherst
- Duff, M. O. Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes. PhD thesis, University of Massachusetts Amherst, 2002.
- (2002) Optimal Learning: Computational Procedures for Bayes-Adaptive Markov Decision Processes
- Duff, M.O.¹

7
- 0032073263
- Planning and acting in partially observable stochastic domains
- Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101:99-134, 1998.
- (1998) Artificial Intelligence , vol.101 , pp. 99-134
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

8
- 84970332367
- Evolution of learning among pavlov strategies in a competitive environment with noise
- Kraines, D. and Kraines, V. Evolution of learning among pavlov strategies in a competitive environment with noise. Journal of Conflict Resolution, 39:439-466, 1995.
- (1995) Journal of Conflict Resolution , vol.39 , pp. 439-466
- Kraines, D.¹ Kraines, V.²

9
- 77955779940
- SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces
- Kurniawati, H., Hsu, D., and Lee, W. S. SARSOP: Efficient point-based POMDP planning by approximating optimally reachable belief spaces. In RSS, 2008.
- (2008) RSS
- Kurniawati, H.¹ Hsu, D.² Lee, W.S.³

10
- 57649134413
- A perception driven autonomous urban vehicle
- Leonard, J., How, J., and Teller, S. A perception driven autonomous urban vehicle. Journal of Field Robotics, 25(10): 727-774, 2008.
- (2008) Journal of Field Robotics , vol.25 , Issue.10 , pp. 727-774
- Leonard, J.¹ How, J.² Teller, S.³

11
- 85115980122
- How to design a strategy to win an IPD tournament
- Li, J. How to design a strategy to win an IPD tournament. In The Iterated Prisoners' Dilemma: 20 Years On, 2007.
- (2007) The Iterated Prisoners' Dilemma: 20 Years on
- Li, J.¹

12
- 22344437403
- Leading best-response strategies in repeated games
- Littman, M. L. and Stone, P. Leading best-response strategies in repeated games. In IJCAI Workshop on Economic Agents, Models, and Mechanisms, 2001.
- IJCAI Workshop on Economic Agents, Models, and Mechanisms, 2001
- Littman, M.L.¹ Stone, P.²

13
- 47849106249
- Human driver model and driver decision making for intersection driving
- Liu, Y. and Ozguner, U. Human driver model and driver decision making for intersection driving. IEEE Intelligent Vehicles Symposium, pp. 642-647, 2007.
- (2007) IEEE Intelligent Vehicles Symposium , pp. 642-647
- Liu, Y.¹ Ozguner, U.²

14
- 0141819580
- PEGASUS: A policy search method for large MDPs and POMDPs
- Ng, A. and Jordan, M. PEGASUS: A policy search method for large MDPs and POMDPs. In UAI, pp. 406-415, 2000.
- (2000) UAI , pp. 406-415
- Ng, A.¹ Jordan, M.²

15
- 0027336968
- A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game
- Nowak, M. and Sigmund, K. A strategy of win-stay, lose-shift that outperforms tit-for-tat in the prisoner's dilemma game. Nature, 364, 1993.
- (1993) Nature , vol.364
- Nowak, M.¹ Sigmund, K.²

16
- 77954049897
- Planning under uncertainty for robotic tasks with mixed observability
- Ong, S. C. W., Png, S. W., Hsu, D., and Lee, W. S. Planning under uncertainty for robotic tasks with mixed observability. IJRR, 29(8):1053-1068, 2010.
- (2010) IJRR , vol.29 , Issue.8 , pp. 1053-1068
- Ong, S.C.W.¹ Png, S.W.² Hsu, D.³ Lee, W.S.⁴

17
- 84880772945
- Point-based value iteration: An anytime algorithm for POMDPs
- Pineau, J., Gordon, G., and Thrun, S. Point-based value iteration: An anytime algorithm for POMDPs. In IJCAI, pp. 1025-1032, 2003.
- (2003) IJCAI , pp. 1025-1032
- Pineau, J.¹ Gordon, G.² Thrun, S.³

18
- 0003539521
- Doubleday, New York, NY
- Poundstone, W. Prisoner's Dilemma. Doubleday, New York, NY, 1992.
- (1992) Prisoner's Dilemma
- Poundstone, W.¹

19
- 77950356463
- Model-based Bayesian reinforcement learning in partially observable domains
- Poupart, P. and Vlassis, N. Model-based Bayesian reinforcement learning in partially observable domains. In ISAIM, 2008.
- (2008) ISAIM
- Poupart, P.¹ Vlassis, N.²

20
- 34250730267
- An analytic solution to discrete Bayesian reinforcement learning
- Poupart, P., Vlassis, N., Hoey, J., and Regan, K. An analytic solution to discrete Bayesian reinforcement learning. In ICML, pp. 697-704, 2006.
- (2006) ICML , pp. 697-704
- Poupart, P.¹ Vlassis, N.² Hoey, J.³ Regan, K.⁴

21
- 77955213275
- Model-based Bayesian reinforcement learning in large structured domains
- Ross, S. and Pineau, J. Model-based Bayesian reinforcement learning in large structured domains. In UAI, 2008.
- (2008) UAI
- Ross, S.¹ Pineau, J.²

22
- 84858778653
- Bayes-adaptive POMDPs
- Ross, S., Chaib-draa, B., and Pineau, J. Bayes-adaptive POMDPs. In NIPS, 2007.
- (2007) NIPS
- Ross, S.¹ Chaib-draa, B.² Pineau, J.³

23
- 85115971428
- On some winning strategies for the Iterated Prisoner's Dilemma or Mr. Nice Guy and the Cosa Nostra
- Slany, W. and Kienreich, W. On some winning strategies for the Iterated Prisoner's Dilemma or Mr. Nice Guy and the Cosa Nostra. In The Iterated Prisoners' Dilemma: 20 Years On, 2007.
- (2007) The Iterated Prisoners' Dilemma: 20 Years on
- Slany, W.¹ Kienreich, W.²

24
- 80053262864
- Point-based POMDP algorithms: Improved analysis and implementation
- Smith, T. and Simmons, R. G. Point-based POMDP algorithms: Improved analysis and implementation. In UAI, pp. 542-547, 2005.
- (2005) UAI , pp. 542-547
- Smith, T.¹ Simmons, R.G.²

25
- 31844436266
- Bayesian sparse sampling for on-line reward optimization
- Wang, T., Lizotte, D., Bowling, M., and Schuurmans, D. Bayesian sparse sampling for on-line reward optimization. In ICML, 2005.
- (2005) ICML
- Wang, T.¹ Lizotte, D.² Bowling, M.³ Schuurmans, D.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.