SCOPUS 정보 검색 플랫폼

Volumn , Issue PART 2, 2013, Pages 1344-1352

Safe policy iteration

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; LEARNING SYSTEMS;

CARD GAMES; GREEDY POLICY; LOWER BOUNDS; POLICY EVALUATION; POLICY ITERATION; STATE-OF-THE-ART APPROACH;

ITERATIVE METHODS;

EID: 84897496610 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (41)

References (18)

1
- 84870922246
- Dynamic policy programming
- Azar, M. Gheshlaghi, Gómez, V., and Kappen, H. J. Dynamic policy programming. Journal of Machine Learning Research, 13(Nov):3207-3245, 2012.
- (2012) Journal of Machine Learning Research , vol.13 , Issue.NOV , pp. 3207-3245
- Azar, M.G.¹ Gómez, V.² Kappen, H.J.³

2
- 79960439729
- Approximate policy iteration: A survey and some new methods
- Bertsekas, D.P. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3):310-335, 2011.
- (2011) Journal of Control Theory and Applications , vol.9 , Issue.3 , pp. 310-335
- Bertsekas, D.P.¹

4
- 80053437853
- Classication-based policy iteration with a critic
- Gabillon, V., Lazaric, A., Ghavamzadeh, M., and Scherrer, B. Classication-based policy iteration with a critic. In Proceedings of ICML, pp. 1049-1056, 2011.
- (2011) Proceedings of ICML , pp. 1049-1056
- Gabillon, V.¹ Lazaric, A.² Ghavamzadeh, M.³ Scherrer, B.⁴

6
- 0003644124
- Howard, R.A. Dynamic programming and Markov processes. 1960.
- (1960) Dynamic Programming and Markov Processes
- Howard, R.A.¹

7
- 33646243319
- A natural policy gradient
- Kakade, S.M. A natural policy gradient. NIPS, 14: 1531-1538, 2001.
- (2001) NIPS , vol.14 , pp. 1531-1538
- Kakade, S.M.¹

8
- 23244466805
- PhD thesis, PhD thesis, University College London
- Kakade, S.M. On the sample complexity of reinforcement learning. PhD thesis, PhD thesis, University College London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.M.¹

9
- 1942514728
- Approximately optimal approximate reinforcement learning
- Kakade, S.M. and Langford, J. Approximately optimal approximate reinforcement learning. In Proceedings of ICML, pp. 267-274, 2002.
- (2002) Proceedings of ICML , pp. 267-274
- Kakade, S.M.¹ Langford, J.²

11
- 4644323293
- Least-squares policy iteration
- Lagoudakis, M.G. and Parr, R. Least-squares policy iteration. Journal of Machine Learning Research, 4: 1107-1149, 2003.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

12
- 77956523230
- Analysis of a classication-based policy iteration algorithm
- Lazaric, A., Ghavamzadeh, M., and Munos, R. Analysis of a classication-based policy iteration algorithm. In Proceedings of ICML, pp. 607-614, 2010.
- (2010) Proceedings of ICML , pp. 607-614
- Lazaric, A.¹ Ghavamzadeh, M.² Munos, R.³

13
- 84937824096
- Error bounds for approximate value iteration
- Munos, R. Error bounds for approximate value iteration. In Proceedings of AAAI, volume 20, pp. 1006, 2005.
- (2005) Proceedings of AAAI , vol.20 , pp. 1006
- Munos, R.¹

14
- 22944468429
- A convergent form of approximate policy iteration
- Perkins, T.J. and Precup, D. A convergent form of approximate policy iteration. NIPS, 15:1595-1602, 2002.
- (2002) NIPS , vol.15 , pp. 1595-1602
- Perkins, T.J.¹ Precup, D.²

17
- 85162533411
- A reinterpretation of the policy oscillation phenomenon in approximate policy iteration
- Wagner, P. A reinterpretation of the policy oscillation phenomenon in approximate policy iteration. In NIPS, 2011.
- (2011) NIPS
- Wagner, P.¹

18
- 81855211901
- The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate
- Ye, Y. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4):593-603, 2011.
- (2011) Mathematics of Operations Research , vol.36 , Issue.4 , pp. 593-603
- Ye, Y.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.