메뉴 건너뛰기




Volumn , Issue , 2004, Pages 235-259

Improved temporal difference methods with linear function approximation

Author keywords

Argon; Convergence; Eigenvalues and eigenfunctions; Function approximation; Markov processes; Trajectory; Vectors

Indexed keywords

APPROXIMATION ALGORITHMS; ARGON; COST FUNCTIONS; DYNAMIC PROGRAMMING; EIGENVALUES AND EIGENFUNCTIONS; MARKOV PROCESSES; TRAJECTORIES; VECTORS;

EID: 85036496976     PISSN: None     EISSN: None     Source Type: Book    
DOI: 10.1109/9780470544785.ch9     Document Type: Chapter
Times cited : (46)

References (19)
  • 2
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and applications in neuro-dynamic programming
    • MIT, Cambridge, MA
    • D. P. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, Lab. for Info, and Decision Systems Report LIDS-P-2349, MIT, Cambridge, MA, 1996.
    • (1996) Lab. For Info, and Decision Systems Report LIDS-P-2349
    • Bertsekas, D.P.1    Ioffe, S.2
  • 5
    • 0034389611 scopus 로고    scopus 로고
    • Gradient convergence in gradient methods with errors
    • D. P. Bertsekas and J. N. Tsitsiklis, Gradient convergence in gradient methods with errors, SIAM Journal Optimization, vol. 10, pp. 627-642, 2000.
    • (2000) SIAM Journal Optimization , vol.10 , pp. 627-642
    • Bertsekas, D.P.1    Tsitsiklis, J.N.2
  • 6
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • J. A. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, vol. 49, pp. 1-15,2002.
    • (2002) Machine Learning , vol.49 , pp. 1-15
    • Boyan, J.A.1
  • 7
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol. 22, pp. 33-57, 1996.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 8
    • 0000430514 scopus 로고
    • The convergence of TD(A) for general A
    • P. D. Dayan, The convergence of TD(A) for general A, Machine Learning, vol. 8, pp. 341-362,1992.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.D.1
  • 9
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal-difference learning, Journal of Optimization Theory and Applications, vol. 105,2000.
    • (2000) Journal of Optimization Theory and Applications , vol.105
    • De Farias, D.P.1    Van Roy, B.2
  • 11
    • 85036579695 scopus 로고    scopus 로고
    • The asymptotic mean squared error of temporal difference learning, Unpublished Report
    • MIT, Cambridge, MA
    • V. R. Konda and J. N. Tsitsiklis, The asymptotic mean squared error of temporal difference learning, Unpublished Report, Lab. for Information and Decision Systems, MIT, Cambridge, MA, 2003.
    • (2003) Lab. For Information and Decision Systems
    • Konda, V.R.1    Tsitsiklis, J.N.2
  • 12
    • 0042758707 scopus 로고    scopus 로고
    • Ph.D. Thesis, Dept, of Electrical Engineering and Computer Science, MIT, Cambridge, MA
    • V. R. Konda, Actor-Critic Algorithms, Ph.D. Thesis, Dept, of Electrical Engineering and Computer Science, MIT, Cambridge, MA, 2002.
    • (2002) Actor-Critic Algorithms
    • Konda, V.R.1
  • 14
    • 0003276733 scopus 로고    scopus 로고
    • Mean-field analysis for batched TD(A)
    • F. Pineda, Mean-field analysis for batched TD(A), Neural Computation, pp.1403-1419,1997.
    • (1997) Neural Computation , pp. 1403-1419
    • Pineda, F.1
  • 17
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol. 3, pp. 9-44,1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 19
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Trans, on Automatic Control, vol. 42, pp. 674-690,1997.
    • (1997) IEEE Trans, on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.