SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 42, Issue 3, 2001, Pages 241-267

On the convergence of temporal-difference learning with linear function approximation

(1) Tadić, Vladislav a

a UNIVERSITY OF MELBOURNE (Australia)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; ASYMPTOTIC STABILITY; CONVERGENCE OF NUMERICAL METHODS; DYNAMIC PROGRAMMING; ERROR ANALYSIS; FUNCTION EVALUATION; MARKOV PROCESSES; STATE SPACE METHODS;

LINEAR FUNCTION APPROXIMATION; NEURODYNAMIC PROGRAMMING; POSITIVE HARRIS RECURRENCE; REINFORCEMENT LEARNING; TEMPORAL DIFFERENCE LEARNING;

LEARNING ALGORITHMS;

EID: 0035283402 PISSN: 08856125 EISSN: None Source Type: Journal
DOI: 10.1023/A:1007609817671 Document Type: Article

Times cited : (52)

References (21)

1
- 0004241943
- New York: Wiley
- Asmussen, S. (1987). Applied Probability and Queues. New York: Wiley.
- (1987) Applied Probability and Queues
- Asmussen, S.¹

2
- 0003778897
- Berlin: Springer Verlag
- Benveniste, A., Metivier, M., Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximation. Berlin: Springer Verlag.
- (1990) Adaptive Algorithms and Stochastic Approximation
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

3
- 55949131592
- New York: Academic Press
- Bertsekas, D. P. (1976). Dynamic Programming and Optimal Control. New York: Academic Press.
- (1976) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

4
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas, D. P. & Tsitsiklis, J. N. (1996). Neuro-Dynamic Programming. Belmont, MA: Athena Scientific.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

5
- 0011780422
- Necessary and sufficient conditions for the Robbins-Monro method
- Clark, D. S. (1984). Necessary and sufficient conditions for the Robbins-Monro method. Stochastic Processes and their Applications, 17, 359-367.
- (1984) Stochastic Processes and Their Applications , vol.17 , pp. 359-367
- Clark, D.S.¹

6
- 0003892991
- Basel: Birkhäuser Verlag
- Chen, H. F. & Guo, L. (1991). Identification and Stochastic Adaptive Control. Basel: Birkhäuser Verlag.
- (1991) Identification and Stochastic Adaptive Control
- Chen, H.F.¹ Guo, L.²

7
- 0003077340
- On positive Harris recurrence of multiclass queueing networks: A unified approach via fluid limit models
- Dai, J. G. (1995). On positive Harris recurrence of multiclass queueing networks: A unified approach via fluid limit models. Annals of Applied Probability, 5, 49-77.
- (1995) Annals of Applied Probability , vol.5 , pp. 49-77
- Dai, J.G.¹

8
- 0000430514
- The convergence of TD(λ) for general λ
- Dayan, P. D. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8, 341-362.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.D.¹

9
- 0028388685
- TD(λ) converges with probability 1
- Dayan, P. D. & Sejnowski, T. J. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
- (1994) Machine Learning , vol.14 , pp. 295-301
- Dayan, P.D.¹ Sejnowski, T.J.²

10
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakola, T.¹ Jordan, M.I.² Singh, S.P.³

11
- 0030109229
- An alternative proof for convergence of stochastic approximation algorithms
- Kulkarni, S. R. & Horn, C. S. (1996). An alternative proof for convergence of stochastic approximation algorithms. IEEE Transactions of Automatic Control, 41, 419-424.
- (1996) IEEE Transactions of Automatic Control , vol.41 , pp. 419-424
- Kulkarni, S.R.¹ Horn, C.S.²

12
- 0003691637
- Englewood Cliffs, NJ: Prentice Hall
- Kumar, P. R. & Varaiya, P. (1986). Stochastic Systems: Estimation, Identification and Adaptive Control. Englewood Cliffs, NJ: Prentice Hall.
- (1986) Stochastic Systems: Estimation, Identification and Adaptive Control
- Kumar, P.R.¹ Varaiya, P.²

13
- 0003452601
- Berlin: Springer Verlag
- Kushner, H. J. & Clark, D. S. (1978). Stochastic Approximation Methods for Constrained and Unconstrained Systems. Berlin: Springer Verlag.
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems
- Kushner, H.J.¹ Clark, D.S.²

14
- 0003746249
- Basel: Birkhäuser Verlag
- Ljung, L., Pflug, G., & Walk, H. (1992). Stochastic Approximation and Optimization of Random Systems. Basel: Birkhäuser Verlag.
- (1992) Stochastic Approximation and Optimization of Random Systems
- Ljung, L.¹ Pflug, G.² Walk, H.³

15
- 0003637131
- Berlin: Springer-Verlag
- Meyn, S. P. & Tweedie, R. L. (1993). Markov Chains and Stochastic Stability. Berlin: Springer-Verlag.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

16
- 0004130648
- Englewood Cliffs, NJ: Prentice Hall
- Solo, V. & Kong, X. (1995). Adaptive Signal Processing Algorithms: Stability and Performance. Englewood Cliffs, NJ: Prentice Hall.
- (1995) Adaptive Signal Processing Algorithms: Stability and Performance
- Solo, V.¹ Kong, X.²

17
- 33847202724
- Learning to predict by the methods of temporal-differences
- Sutton, R. S. (1988). Learning to predict by the methods of temporal-differences. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

18
- 0004102479
- Cambridge, MA: MIT Press
- Sutton, R. S. & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press.
- (1998) Reinforcement Learning: an Introduction
- Sutton, R.S.¹ Barto, A.G.²

19
- 0031384060
- Convergence of stochastic approximation under general noise and stability conditions
- Tadić, V. (1997). Convergence of stochastic approximation under general noise and stability conditions. In Proceedings of the 36 IEEE Conference on Decision and Control.
- (1997) Proceedings of the 36 IEEE Conference on Decision and Control
- Tadić, V.¹

20
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J. N. & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

21
- 0001055484
- Equivalent and sufficient conditions on noise sequences for stochastic approximation algorithms
- Wang, I.-J., Chong, E. K. P., & Kulkarni, S. R. (1996). Equivalent and sufficient conditions on noise sequences for stochastic approximation algorithms. Advances in Applied Probability, 28, 784-801.
- (1996) Advances in Applied Probability , vol.28 , pp. 784-801
- Wang, I.-J.¹ Chong, E.K.P.² Kulkarni, S.R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.