메뉴 건너뛰기




Volumn 13, Issue 1-2, 2003, Pages 79-110

Least squares policy evaluation algorithms with linear function approximation

Author keywords

[No Author keywords available]

Indexed keywords

CONVERGENCE OF NUMERICAL METHODS; DYNAMIC PROGRAMMING; FUNCTION EVALUATION; GRADIENT METHODS; LEAST SQUARES APPROXIMATIONS; LINEAR PROGRAMMING; PROBABILITY DISTRIBUTIONS; PROBLEM SOLVING;

EID: 0037288398     PISSN: 09246703     EISSN: None     Source Type: Journal    
DOI: 10.1023/A:1022192903948     Document Type: Article
Times cited : (159)

References (21)
  • 2
    • 0000268954 scopus 로고
    • A counterexample to temporal differences learning
    • Bertsekas, D. P. 1995. A counterexample to temporal differences learning. Neural Computation 7: 270-279.
    • (1995) Neural Computation , vol.7 , pp. 270-279
    • Bertsekas, D.P.1
  • 3
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and application in neuro-dynamic programming
    • Lab. for Info. and Decision Systems Report LIDS-P-2349. Cambridge, MA: MIT
    • Bertsekas, D. P., and Ioffe, S. 1996. Temporal differences-based policy iteration and application in neuro-dynamic programming. Lab. for Info. and Decision Systems Report LIDS-P-2349. Cambridge, MA: MIT.
    • (1996)
    • Bertsekas, D.P.1    Ioffe, S.2
  • 7
    • 0034389611 scopus 로고    scopus 로고
    • Gradient convergence in gradient methods with errors
    • Bertsekas, D. P., and Tsitsiklis, J. N. 2000. Gradient convergence in gradient methods with errors. SIAM J. Optim. 10: 627-642.
    • (2000) SIAM J. Optim. , vol.10 , pp. 627-642
    • Bertsekas, D.P.1    Tsitsiklis, J.N.2
  • 8
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • Boyan, J. A. 2002. Technical update: least-squares temporal difference learning. To appear in Machine Learning, 49.
    • (2002) Machine Learning , vol.49
    • Boyan, J.A.1
  • 9
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S. J., and Barto, A. G. 1996. Linear least-squares algorithms for temporal difference learning. Machine Learning 22: 33-57.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 10
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • Dayan, P., and Sejnowski, T. J. 1994. TD(λ) converges with probability 1. Machine Learning 14: 295-301.
    • (1994) Machine Learning , vol.14 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.J.2
  • 14
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M. I., and Singh S. P. 1994. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6: 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 19
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 20
    • 0035283402 scopus 로고    scopus 로고
    • On the convergence of temporal-difference learning with linear function approximation
    • Tadić, V. 2001. On the convergence of temporal-difference learning with linear function approximation. Machine Learning 42: 241-267.
    • (2001) Machine Learning , vol.42 , pp. 241-267
    • Tadić, V.1
  • 21
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N., and Van Roy, B. 1997. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42: 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.