메뉴 건너뛰기




Volumn 76, Issue 2-3, 2009, Pages 243-256

Hybrid least-squares algorithms for approximate policy evaluation

Author keywords

Markov decision processes; Reinforcement learning

Indexed keywords

FIXED POINT METHODS; FIXED-POINT ALGORITHMS; GEOMETRIC INTERPRETATION; HYBRID ALGORITHMS; LARGE DOMAIN; LEAST-SQUARES ALGORITHMS; MARKOV DECISION PROCESSES; OPTIMIZATION CRITERIA; POLICY EVALUATION; POLICY ITERATION; RESIDUAL METHOD; TARGET VALUES;

EID: 68949099445     PISSN: 08856125     EISSN: 15730565     Source Type: Journal    
DOI: 10.1007/s10994-009-5128-4     Document Type: Conference Paper
Times cited : (12)

References (15)
  • 1
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • A. Antos C. Szepesvári R. Munos 2008 Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path Machine Learning 71 1 89 129
    • (2008) Machine Learning , vol.71 , Issue.1 , pp. 89-129
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 4
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. Bradtke A. Barto 1996 Linear least-squares algorithms for temporal difference learning Machine Learning 22 1-3 33 57
    • (1996) Machine Learning , vol.22 , Issue.13 , pp. 33-57
    • Bradtke, S.1    Barto, A.2
  • 5
    • 0010359703 scopus 로고    scopus 로고
    • Morgan Kaufmann San Mateo
    • Koller, D., & Parr, R. (2000). Policy iteration for factored MDPs. In Proceedings of the 16th conference on uncertainty in artificial intelligence (pp. 326-334). San Mateo: Morgan Kaufmann.
    • (2000) Policy Iteration for Factored MDPs , pp. 326-334
    • Koller, D.1    Parr, R.2
  • 8
    • 56449125197 scopus 로고    scopus 로고
    • A worst-case comparison between temporal difference and residual gradient with linear function approximation
    • Li, L. (2008). A worst-case comparison between temporal difference and residual gradient with linear function approximation. In Proceedings of the 25th international conference on machine learning (pp. 560-567).
    • (2008) Proceedings of the 25th International Conference on Machine Learning , pp. 560-567
    • Li, L.1
  • 11
    • 85102627959 scopus 로고
    • Wiley New York
    • Puterman, M. (1994). Markov decision processes: discrete stochastic dynamic programming. New York: Wiley.
    • (1994)
    • Puterman, M.1
  • 12
    • 84899025152 scopus 로고    scopus 로고
    • Optimality of reinforcement learning algorithms with linear function approximation
    • Schoknecht, R. (2003). Optimality of reinforcement learning algorithms with linear function approximation. In Advances in neural information processing systems (Vol. 15, pp. 1555-1562).
    • (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1555-1562
    • Schoknecht, R.1
  • 14
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. Sutton 1988 Learning to predict by the methods of temporal differences Machine Learning 3 9 44
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.1
  • 15
    • 68949102319 scopus 로고    scopus 로고
    • MIT Press Cambridge
    • Sutton, R., & Barto, A. (1998). Reinforcement learning. Cambridge: MIT Press.
    • (1998)
    • Sutton, R.1    Barto, A.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.