메뉴 건너뛰기




Volumn , Issue PART 2, 2013, Pages 1344-1352

Safe policy iteration

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; LEARNING SYSTEMS;

EID: 84897496610     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (41)

References (18)
  • 2
    • 79960439729 scopus 로고    scopus 로고
    • Approximate policy iteration: A survey and some new methods
    • Bertsekas, D.P. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications, 9(3):310-335, 2011.
    • (2011) Journal of Control Theory and Applications , vol.9 , Issue.3 , pp. 310-335
    • Bertsekas, D.P.1
  • 5
    • 0042706683 scopus 로고
    • Perturbation bounds for the stationary probabilities of a finite markov chain
    • ISSN 00018678
    • Haviv, Moshe and Heyden, Ludo Van Der. Perturbation bounds for the stationary probabilities of a finite markov chain. Advances in Applied Probability, 16(4):pp. 804-818, 1984. ISSN 00018678. URL http://www.jstor.org/ stable/1427341.
    • (1984) Advances in Applied Probability , vol.16 , Issue.4 , pp. 804-818
    • Haviv, M.1    Van Der Heyden, L.2
  • 7
    • 33646243319 scopus 로고    scopus 로고
    • A natural policy gradient
    • Kakade, S.M. A natural policy gradient. NIPS, 14: 1531-1538, 2001.
    • (2001) NIPS , vol.14 , pp. 1531-1538
    • Kakade, S.M.1
  • 9
    • 1942514728 scopus 로고    scopus 로고
    • Approximately optimal approximate reinforcement learning
    • Kakade, S.M. and Langford, J. Approximately optimal approximate reinforcement learning. In Proceedings of ICML, pp. 267-274, 2002.
    • (2002) Proceedings of ICML , pp. 267-274
    • Kakade, S.M.1    Langford, J.2
  • 10
    • 0010359703 scopus 로고    scopus 로고
    • Policy Iteration for Factored MDPs
    • San Francisco, CA, USA, Morgan Kaufmann Publishers Inc. ISBN 1-55860-709-9
    • Koller, Daphne and Parr, Ronald. Policy Iteration for Factored MDPs. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence, pp. 326-334, San Francisco, CA, USA, 2000. Morgan Kaufmann Publishers Inc. ISBN 1-55860-709-9.
    • (2000) Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence , pp. 326-334
    • Koller, D.1    Parr, R.2
  • 12
    • 77956523230 scopus 로고    scopus 로고
    • Analysis of a classication-based policy iteration algorithm
    • Lazaric, A., Ghavamzadeh, M., and Munos, R. Analysis of a classication-based policy iteration algorithm. In Proceedings of ICML, pp. 607-614, 2010.
    • (2010) Proceedings of ICML , pp. 607-614
    • Lazaric, A.1    Ghavamzadeh, M.2    Munos, R.3
  • 13
    • 84937824096 scopus 로고    scopus 로고
    • Error bounds for approximate value iteration
    • Munos, R. Error bounds for approximate value iteration. In Proceedings of AAAI, volume 20, pp. 1006, 2005.
    • (2005) Proceedings of AAAI , vol.20 , pp. 1006
    • Munos, R.1
  • 14
    • 22944468429 scopus 로고    scopus 로고
    • A convergent form of approximate policy iteration
    • Perkins, T.J. and Precup, D. A convergent form of approximate policy iteration. NIPS, 15:1595-1602, 2002.
    • (2002) NIPS , vol.15 , pp. 1595-1602
    • Perkins, T.J.1    Precup, D.2
  • 16
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • MIT Press
    • Sutton, R.S., McAllester, D., Singh, S., and Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 12, pp. 1057-1063. MIT Press, 2000.
    • (2000) NIPS , vol.12 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 17
    • 85162533411 scopus 로고    scopus 로고
    • A reinterpretation of the policy oscillation phenomenon in approximate policy iteration
    • Wagner, P. A reinterpretation of the policy oscillation phenomenon in approximate policy iteration. In NIPS, 2011.
    • (2011) NIPS
    • Wagner, P.1
  • 18
    • 81855211901 scopus 로고    scopus 로고
    • The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate
    • Ye, Y. The simplex and policy-iteration methods are strongly polynomial for the markov decision problem with a fixed discount rate. Mathematics of Operations Research, 36(4):593-603, 2011.
    • (2011) Mathematics of Operations Research , vol.36 , Issue.4 , pp. 593-603
    • Ye, Y.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.