메뉴 건너뛰기




Volumn 105, Issue 3, 2000, Pages 589-608

On the existence of fixed points for approximate value iteration and temporal-difference learning

Author keywords

Dynamic programming; Neurodynamic programming; Reinforcement learning; Temporal difference learning; Value iteration

Indexed keywords

DYNAMIC PROGRAMMING; ITERATIVE METHODS; ORDINARY DIFFERENTIAL EQUATIONS;

EID: 0034342516     PISSN: 00223239     EISSN: None     Source Type: Journal    
DOI: 10.1023/A:1004641123405     Document Type: Article
Times cited : (67)

References (11)
  • 2
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • SUTTON, R. S., Learning to Predict by the Method of Temporal Differences, Machine Learning, Vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 4
    • 0003276733 scopus 로고    scopus 로고
    • Mean-field analysis for batched TD(λ)
    • PINEDA, F., Mean-Field Analysis for Batched TD(λ), Neural Computation, Vol. 9, pp. 1403-1419, 1997.
    • (1997) Neural Computation , vol.9 , pp. 1403-1419
    • Pineda, F.1
  • 5
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • TSITSIKLIS, J. N., and VAN ROY, B., An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control, Vol. 42, pp. 674-690, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 6
    • 0000430514 scopus 로고
    • The Convergence of TD(λ) for General λ
    • DAYAN, P. D., The Convergence of TD(λ) for General λ, Machine Learning, Vol. 8, pp. 341-362, 1992.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.D.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.