메뉴 건너뛰기




Volumn 22, Issue 1-3, 1996, Pages 33-57

Linear least-squares algorithms for temporal difference learning

Author keywords

Least squares; Markov decision problems; Reinforcement learning; Temporal difference methods

Indexed keywords


EID: 0001771345     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1007/BF00114723     Document Type: Article
Times cited : (672)

References (24)
  • 1
    • 0003997198 scopus 로고
    • Strategy learning with multilayer connectionist representations
    • GTE Laboratories Incorporated. Computer and Intelligent Systems Laboratory, 40 Sylvan Road Waltham, MA 02254
    • Anderson, C. W. (1988). Strategy learning with multilayer connectionist representations Technical Report 87-509.3. GTE Laboratories Incorporated. Computer and Intelligent Systems Laboratory, 40 Sylvan Road Waltham, MA 02254.
    • (1988) Technical Report 87-509.3
    • Anderson, C.W.1
  • 3
    • 6344250104 scopus 로고
    • Incremental Dynamic Programming for On-Line Adaptive Optimal Control
    • PhD thesis, University of Massachusetts, Computer Science Dept.
    • Bradtke, S. J., (1994). Incremental Dynamic Programming for On-Line Adaptive Optimal Control. PhD thesis, University of Massachusetts, Computer Science Dept. Technical Report 94-62.
    • (1994) Technical Report , vol.94 , Issue.62
    • Bradtke, S.J.1
  • 4
    • 84996565038 scopus 로고
    • Learning rate schedules for faster stochastic gradient search
    • Proceedings of the 1992 IEEE Workshop. IEEE Press
    • Darken, C. Chang, I. & Moody, J., (1992) Learning rate schedules for faster stochastic gradient search. In Neural Networks for Signal Processing 2 - Proceedings of the 1992 IEEE Workshop. IEEE Press.
    • (1992) Neural Networks for Signal Processing , vol.2
    • Darken, C.1    Chang, I.2    Moody, J.3
  • 5
    • 0000430514 scopus 로고
    • The convergence of TP(λ) for general λ
    • Dayan, P., (1992) The convergence of TP(λ) for general λ. Machine Learning, 8:341-362.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 8
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T. Jordan, M.I & Singh, S. P., (1994). On the convergence of stochastic iterative dynamic programming algorithms Neural Computation, 6(6).
    • (1994) Neural Computation , vol.6 , Issue.6
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 14
    • 0003617454 scopus 로고
    • PhD thesis, Department of Computer and Information Science, University of Massachusetts at Amherst, Amherst, MA 01003
    • Sutton A.S., (1984). Temporal Credit Assignment in Reinforcement Learning. PhD thesis, Department of Computer and Information Science, University of Massachusetts at Amherst, Amherst, MA 01003.
    • (1984) Temporal Credit Assignment in Reinforcement Learning
    • Sutton, A.S.1
  • 15
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R.S., (1988) Learning to predict by the method of temporal differences. Machine Learning. 3:9-44.
    • (1988) Machine Learning. , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 16
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • Tesauro, G.J., (1992). Practical issues in temporal difference learning Machine Learning 8(3/4):257-277.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
    • Tesauro, G.J.1
  • 17
    • 27144547178 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Laboratory for Information and Decision Systems. MIT. Cambridge, MA
    • Tsitsiklis, J. N. (1993). Asynchronous stochastic approximation and Q-learning. Technical Report LIDS-P-2172, Laboratory for Information and Decision Systems. MIT. Cambridge, MA.
    • (1993) Technical Report LIDS-P-2172
    • Tsitsiklis, J.N.1
  • 19
  • 20
    • 0023169119 scopus 로고
    • Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research
    • Werbos, P.J., (1987). Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research. IEEE Transactions on Systems, Man. and Cybernetics, 17(1):7-20.
    • (1987) IEEE Transactions on Systems, Man. and Cybernetics , vol.17 , Issue.1 , pp. 7-20
    • Werbos, P.J.1
  • 21
    • 0000903748 scopus 로고
    • Generalization of backpropagation with application to a recurrent gas market model
    • 1988
    • Werbos, P.J. (1988). Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4):339-356, 1988.
    • (1988) Neural Networks , vol.1 , Issue.4 , pp. 339-356
    • Werbos, P.J.1
  • 22
    • 0025229247 scopus 로고
    • Consistency of HDP applied to a simple reinforcement learning problem
    • Werbos, P.J. (1990) Consistency of HDP applied to a simple reinforcement learning problem. Neural Networks, 3(2):179-190.
    • (1990) Neural Networks , vol.3 , Issue.2 , pp. 179-190
    • Werbos, P.J.1
  • 23
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real time control and neural modeling
    • D. A. White and D. A. Sofge, editors, Van Nostrand Reinhold. New York
    • Werbos, P.J. (1992) Approximate dynamic programming for real time control and neural modeling. In D. A. White and D. A. Sofge, editors, Handbook of Intelligent Control Neural, Fuzzy, and Adaptive Approaches. pages 493-525. Van Nostrand Reinhold. New York.
    • (1992) Handbook of Intelligent Control Neural, Fuzzy, and Adaptive Approaches. , pp. 493-525
    • Werbos, P.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.