SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 22, Issue 1-3, 1996, Pages 33-57

Linear least-squares algorithms for temporal difference learning

(1) Bradtke, Steven J a

a GTE Data Services (United States)

Author keywords

Least squares; Markov decision problems; Reinforcement learning; Temporal difference methods

Indexed keywords

EID: 0001771345 PISSN: 08856125 EISSN: None Source Type: Journal
DOI: 10.1007/BF00114723 Document Type: Article

Times cited : (672)

References (24)

1
- 0003997198
- Strategy learning with multilayer connectionist representations
- GTE Laboratories Incorporated. Computer and Intelligent Systems Laboratory, 40 Sylvan Road Waltham, MA 02254
- Anderson, C. W. (1988). Strategy learning with multilayer connectionist representations Technical Report 87-509.3. GTE Laboratories Incorporated. Computer and Intelligent Systems Laboratory, 40 Sylvan Road Waltham, MA 02254.
- (1988) Technical Report 87-509.3
- Anderson, C.W.¹

2
- 0020970738
- Netuonlike elements that can solve difficult learning control problems
- Barto, A. G., Sutton, R. S. & Anderson, C. W. (1983). Netuonlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13:835-846.
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , pp. 835-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

3
- 6344250104
- Incremental Dynamic Programming for On-Line Adaptive Optimal Control
- PhD thesis, University of Massachusetts, Computer Science Dept.
- Bradtke, S. J., (1994). Incremental Dynamic Programming for On-Line Adaptive Optimal Control. PhD thesis, University of Massachusetts, Computer Science Dept. Technical Report 94-62.
- (1994) Technical Report , vol.94 , Issue.62
- Bradtke, S.J.¹

4
- 84996565038
- Learning rate schedules for faster stochastic gradient search
- Proceedings of the 1992 IEEE Workshop. IEEE Press
- Darken, C. Chang, I. & Moody, J., (1992) Learning rate schedules for faster stochastic gradient search. In Neural Networks for Signal Processing 2 - Proceedings of the 1992 IEEE Workshop. IEEE Press.
- (1992) Neural Networks for Signal Processing , vol.2
- Darken, C.¹ Chang, I.² Moody, J.³

5
- 0000430514
- The convergence of TP(λ) for general λ
- Dayan, P., (1992) The convergence of TP(λ) for general λ. Machine Learning, 8:341-362.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.¹

6
- 0028388685
- TD(λ): Convergence with probability I
- Dayan, P. & Sejnowski, T.J., (1994). TD(λ): Convergence with probability I. Machine Learning.
- (1994) Machine Learning
- Dayan, P.¹ Sejnowski, T.J.²

7
- 0003473120
- Prentice-Hall, Englewood Cliffs, N.J
- Goodwin, G.C. & Sin, K.S., (1984). Adaptive Filtering Prediction and Control. Prentice-Hall, Englewood Cliffs, N.J.
- (1984) Adaptive Filtering Prediction and Control
- Goodwin, G.C.¹ Sin, K.S.²

8
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T. Jordan, M.I & Singh, S. P., (1994). On the convergence of stochastic iterative dynamic programming algorithms Neural Computation, 6(6).
- (1994) Neural Computation , vol.6 , Issue.6
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

9
- 0003979966
- Springer-Verlag, New York
- Kemeny, J.G. & Snell, J.L., (1976). Finite Markov Chains. Springer-Verlag, New York.
- (1976) Finite Markov Chains
- Kemeny, J.G.¹ Snell, J.L.²

10
- 0003723679
- MIT Press, Cambridge. MA
- Ljung, L. & Söderström T., (1983). Theory and Practice of Recursive Identification MIT Press, Cambridge. MA.
- (1983) Theory and Practice of Recursive Identification
- Ljung, L.¹ Söderström, T.²

11
- 27144479240
- Expectation driven learning with an associative memory
- Lukes, G., Thompson, B. & Werbos, P., (1990) Expectation driven learning with an associative memory. In Proceedings of the International Joint Conference on Neural Networks, pages 1:521-524.
- (1990) Proceedings of the International Joint Conference on Neural Networks , vol.1 , pp. 521-524
- Lukes, G.¹ Thompson, B.² Werbos, P.³

12
- 0000016172
- A stochastic approximation method
- Robbins, H & Monro, S., (1951). A stochastic approximation method. Annals of Mathematical Statistics, 22:400-407.
- (1951) Annals of Mathematical Statistics , vol.22 , pp. 400-407
- Robbins, H.¹ Monro, S.²

13
- 0003540196
- Springer Verlag. Berlin
- Söderström, T. & Stoica, P.G., (1983). Instrumental Variable Methods for System Identification. Springer Verlag. Berlin.
- (1983) Instrumental Variable Methods for System Identification
- Söderström, T.¹ Stoica, P.G.²

14
- 0003617454
- PhD thesis, Department of Computer and Information Science, University of Massachusetts at Amherst, Amherst, MA 01003
- Sutton A.S., (1984). Temporal Credit Assignment in Reinforcement Learning. PhD thesis, Department of Computer and Information Science, University of Massachusetts at Amherst, Amherst, MA 01003.
- (1984) Temporal Credit Assignment in Reinforcement Learning
- Sutton, A.S.¹

15
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton, R.S., (1988) Learning to predict by the method of temporal differences. Machine Learning. 3:9-44.
- (1988) Machine Learning. , vol.3 , pp. 9-44
- Sutton, R.S.¹

16
- 0001046225
- Practical issues in temporal difference learning
- Tesauro, G.J., (1992). Practical issues in temporal difference learning Machine Learning 8(3/4):257-277.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
- Tesauro, G.J.¹

17
- 27144547178
- Asynchronous stochastic approximation and Q-learning
- Laboratory for Information and Decision Systems. MIT. Cambridge, MA
- Tsitsiklis, J. N. (1993). Asynchronous stochastic approximation and Q-learning. Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems. MIT. Cambridge, MA.
- (1993) Technical Report LIDS-P-2172
- Tsitsiklis, J.N.¹

18
- 0004049893
- PhD thesis, Cambridge University Cambridge, England
- Watkins, C. J. C. H. (1989) Learning from Delayed Rewards. PhD thesis, Cambridge University Cambridge, England.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

19
- 34249833101
- Q-learning
- May 1992
- Watkins, C. J. C. H. & Dayan, P., (1992). Q-learning. Machine Learning, 8(3/4):257-277, May 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
- Watkins, C.J.C.H.¹ Dayan, P.²

20
- 0023169119
- Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research
- Werbos, P.J., (1987). Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man. and Cybernetics, 17(1):7-20.
- (1987) IEEE Transactions on Systems, Man. and Cybernetics , vol.17 , Issue.1 , pp. 7-20
- Werbos, P.J.¹

21
- 0000903748
- Generalization of backpropagation with application to a recurrent gas market model
- 1988
- Werbos, P.J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4):339-356, 1988.
- (1988) Neural Networks , vol.1 , Issue.4 , pp. 339-356
- Werbos, P.J.¹

22
- 0025229247
- Consistency of HDP applied to a simple reinforcement learning problem
- Werbos, P.J. (1990) Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3(2):179-190.
- (1990) Neural Networks , vol.3 , Issue.2 , pp. 179-190
- Werbos, P.J.¹

23
- 0002031779
- Approximate dynamic programming for real time control and neural modeling
- D. A. White and D. A. Sofge, editors, Van Nostrand Reinhold. New York
- Werbos, P.J. (1992) Approximate dynamic programming for real time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control Neural, Fuzzy, and Adaptive Approaches. pages 493-525. Van Nostrand Reinhold. New York.
- (1992) Handbook of Intelligent Control Neural, Fuzzy, and Adaptive Approaches. , pp. 493-525
- Werbos, P.J.¹

24
- 0003792865
- Springer-Verlag
- Young, P., (1984). Recursive Estimation and Time-series Analysis. Springer-Verlag.
- (1984) Recursive Estimation and Time-series Analysis
- Young, P.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.