SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 76, Issue 2-3, 2009, Pages 243-256

Hybrid least-squares algorithms for approximate policy evaluation

(3) Johns, Jeff a Petrik, Marek a Mahadevan, Sridhar a

a University of Massachusetts Amherst (United States)

Author keywords

Markov decision processes; Reinforcement learning

Indexed keywords

FIXED POINT METHODS; FIXED-POINT ALGORITHMS; GEOMETRIC INTERPRETATION; HYBRID ALGORITHMS; LARGE DOMAIN; LEAST-SQUARES ALGORITHMS; MARKOV DECISION PROCESSES; OPTIMIZATION CRITERIA; POLICY EVALUATION; POLICY ITERATION; RESIDUAL METHOD; TARGET VALUES;

CIRCUIT THEORY; MARKOV PROCESSES; OPTIMIZATION; REINFORCEMENT; REINFORCEMENT LEARNING;

ALGORITHMS;

EID: 68949099445 PISSN: 08856125 EISSN: 15730565 Source Type: Journal
DOI: 10.1007/s10994-009-5128-4 Document Type: Conference Paper

Times cited : (12)

References (15)

1
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- A. Antos C. Szepesvári R. Munos 2008 Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path Machine Learning 71 1 89 129
- (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
- Antos, A.¹ Szepesvári, C.² Munos, R.³

2
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th international conference on machine learning (pp. 30-37).
- (1995) Proceedings of the 12th International Conference on Machine Learning , pp. 30-37
- Baird, L.¹

3
- 0038595396
- Least-squares temporal difference learning
- Boyan, J. (1999). Least-squares temporal difference learning. In Proceedings of the 16th international conference on machine learning (pp. 49-56).
- (1999) Proceedings of the 16th International Conference on Machine Learning , pp. 49-56
- Boyan, J.¹

4
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. Bradtke A. Barto 1996 Linear least-squares algorithms for temporal difference learning Machine Learning 22 1-3 33 57
- (1996) Machine Learning , vol.22 , Issue.13 , pp. 33-57
- Bradtke, S.¹ Barto, A.²

5
- 0010359703
- Morgan Kaufmann San Mateo
- Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 326-334). San Mateo: Morgan Kaufmann.
- (2000) Policy Iteration for Factored MDPs , pp. 326-334
- Koller, D.¹ Parr, R.²

6
- 4644323293
- Least-squares policy iteration
- M. Lagoudakis R. Parr 2003 Least-squares policy iteration Journal of Machine Learning Research 4 1107 1149
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.¹ Parr, R.²

7
- 35048819671
- Least-squares methods in reinforcement learning for control
- Lagoudakis, M., Parr, R., & Littman, M. (2002). Least-squares methods in reinforcement learning for control. In Proceedings of the 2nd Hellenic conference on artificial intelligence (pp. 249-260).
- (2002) Proceedings of the 2nd Hellenic Conference on Artificial Intelligence , pp. 249-260
- Lagoudakis, M.¹ Parr, R.² Littman, M.³

8
- 56449125197
- A worst-case comparison between temporal difference and residual gradient with linear function approximation
- Li, L. (2008). A worst-case comparison between temporal difference and residual gradient with linear function approximation. In Proceedings of the 25th international conference on machine learning (pp. 560-567).
- (2008) Proceedings of the 25th International Conference on Machine Learning , pp. 560-567
- Li, L.¹

9
- 34547966269
- Representation policy iteration
- Mahadevan, S. (2005). Representation policy iteration. In Proceedings of the 21st conference on uncertainty in artificial intelligence (pp. 372-379).
- (2005) Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence , pp. 372-379
- Mahadevan, S.¹

10
- 1942516880
- Error bounds for approximate policy iteration
- Munos, R. (2003). Error bounds for approximate policy iteration. In Proceedings of the 20th international conference on machine learning (pp. 560-567).
- (2003) Proceedings of the 20th International Conference on Machine Learning , pp. 560-567
- Munos, R.¹

11
- 85102627959
- Wiley New York
- Puterman, M. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.
- (1994)
- Puterman, M.¹

12
- 84899025152
- Optimality of reinforcement learning algorithms with linear function approximation
- Schoknecht, R. (2003). Optimality of reinforcement learning algorithms with linear function approximation. In Advances in neural information processing systems (Vol. 15, pp. 1555-1562).
- (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1555-1562
- Schoknecht, R.¹

13
- 0000273218
- Generalized polynomial approximations in Markovian decision processes
- P. Schweitzer A. Seidmann 1985 Generalized polynomial approximations in Markovian decision processes Journal of Mathematical Analysis and Applications 110 568 582
- (1985) Journal of Mathematical Analysis and Applications , vol.110 , pp. 568-582
- Schweitzer, P.¹ Seidmann, A.²

14
- 33847202724
- Learning to predict by the methods of temporal differences
- R. Sutton 1988 Learning to predict by the methods of temporal differences Machine Learning 3 9 44
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.¹

15
- 68949102319
- MIT Press Cambridge
- Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
- (1998)
- Sutton, R.¹ Barto, A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.