메뉴 건너뛰기




Volumn 42, Issue 3, 2001, Pages 241-267

On the convergence of temporal-difference learning with linear function approximation

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; ASYMPTOTIC STABILITY; CONVERGENCE OF NUMERICAL METHODS; DYNAMIC PROGRAMMING; ERROR ANALYSIS; FUNCTION EVALUATION; MARKOV PROCESSES; STATE SPACE METHODS;

EID: 0035283402     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1023/A:1007609817671     Document Type: Article
Times cited : (52)

References (21)
  • 5
    • 0011780422 scopus 로고
    • Necessary and sufficient conditions for the Robbins-Monro method
    • Clark, D. S. (1984). Necessary and sufficient conditions for the Robbins-Monro method. Stochastic Processes and their Applications, 17, 359-367.
    • (1984) Stochastic Processes and Their Applications , vol.17 , pp. 359-367
    • Clark, D.S.1
  • 7
    • 0003077340 scopus 로고
    • On positive Harris recurrence of multiclass queueing networks: A unified approach via fluid limit models
    • Dai, J. G. (1995). On positive Harris recurrence of multiclass queueing networks: A unified approach via fluid limit models. Annals of Applied Probability, 5, 49-77.
    • (1995) Annals of Applied Probability , vol.5 , pp. 49-77
    • Dai, J.G.1
  • 8
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • Dayan, P. D. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8, 341-362.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.D.1
  • 9
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • Dayan, P. D. & Sejnowski, T. J. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
    • (1994) Machine Learning , vol.14 , pp. 295-301
    • Dayan, P.D.1    Sejnowski, T.J.2
  • 10
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 11
    • 0030109229 scopus 로고    scopus 로고
    • An alternative proof for convergence of stochastic approximation algorithms
    • Kulkarni, S. R. & Horn, C. S. (1996). An alternative proof for convergence of stochastic approximation algorithms. IEEE Transactions of Automatic Control, 41, 419-424.
    • (1996) IEEE Transactions of Automatic Control , vol.41 , pp. 419-424
    • Kulkarni, S.R.1    Horn, C.S.2
  • 17
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal-differences
    • Sutton, R. S. (1988). Learning to predict by the methods of temporal-differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 20
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J. N. & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 21
    • 0001055484 scopus 로고    scopus 로고
    • Equivalent and sufficient conditions on noise sequences for stochastic approximation algorithms
    • Wang, I.-J., Chong, E. K. P., & Kulkarni, S. R. (1996). Equivalent and sufficient conditions on noise sequences for stochastic approximation algorithms. Advances in Applied Probability, 28, 784-801.
    • (1996) Advances in Applied Probability , vol.28 , pp. 784-801
    • Wang, I.-J.1    Chong, E.K.P.2    Kulkarni, S.R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.