메뉴 건너뛰기




Volumn 72, Issue 1-2, 1995, Pages 81-138

Learning to act using real-time dynamic programming

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; ARTIFICIAL INTELLIGENCE; CONTROL THEORY; DYNAMIC PROGRAMMING; REAL TIME SYSTEMS;

EID: 0029210635     PISSN: 00043702     EISSN: None     Source Type: Journal    
DOI: 10.1016/0004-3702(94)00011-O     Document Type: Article
Times cited : (744)

References (98)
  • 5
    • 0020970738 scopus 로고
    • Neuronlike elements that can solve difficult learning control problems
    • reprinted in:, Anderson J.A. Rosenfeld E. Neurocomputing: Foundations of Research 1988 MIT Press Cambridge, MA
    • (1983) IEEE Trans. Syst. Man Cybern. , vol.13 , pp. 835-846
    • Barto1    Sutton2    Anderson3
  • 9
    • 84968468700 scopus 로고
    • Polynomial approximation—a new computational technique in dynamic programming: allocation processes
    • (1973) Math. Comp. , vol.17 , pp. 155-161
    • Bellman1    Kalaba2    Kotkin3
  • 15
    • 0002227762 scopus 로고
    • Penquins can make cake
    • (1989) AI Mag. , vol.10 , pp. 45-50
    • Chapman1
  • 17
  • 18
    • 0041541978 scopus 로고
    • A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations
    • H.A. Meyer, Wiley, New York
    • (1954) Symposium on Monte Carlo Methods , pp. 191-233
    • Curtiss1
  • 22
    • 84916483603 scopus 로고
    • Reinforcing connectionism: learning the statistical way
    • University of Edinburgh, Edinburgh, Scotland
    • (1991) Ph.D. Thesis
    • Dayan1
  • 23
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • (1992) Mach. Learn. , vol.8 , pp. 341-362
    • Dayan1
  • 25
    • 0000104548 scopus 로고
    • Contraction mappings in the theory underlying dynamic programming
    • (1967) SIAM Review , vol.9 , pp. 165-177
    • Denardo1
  • 28
    • 0024885107 scopus 로고
    • Universal planning: an (almost) universally bad idea
    • (1989) AI Mag. , vol.10 , pp. 40-44
    • Ginsberg1
  • 38
    • 0003900353 scopus 로고
    • Brain function and adaptive systems—a heterostatic theory
    • Air Force Cambridge Research Laboratories, Bedford, MA
    • (1972) Tech. Report AFCRL-72-0164
    • Klopf1
  • 48
    • 85151437138 scopus 로고
    • Programming robots using reinforcement learning and teaching
    • Anaheim, CA
    • (1991) Proceedings AAAI-91 , pp. 781-786
    • Lin1
  • 51
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • (1992) Mach. Learn. , vol.8 , pp. 293-321
    • Lin1
  • 56
    • 0013500961 scopus 로고
    • Theory of neural-analog reinforcement systems and its application to the brain-model problem
    • Princeton University, Princeton, NJ
    • (1954) Ph.D. Thesis
    • Minsky1
  • 58
    • 0003442587 scopus 로고
    • Efficient memory-based learning for robot control
    • University of Cambridge, Cambridge, England
    • (1990) Ph.D. Thesis
    • Moore1
  • 66
    • 0344252216 scopus 로고
    • Adaptive confidence and adaptive curiosity
    • Institut für Informatik, Technische Universität München, 800 München 2, Germany
    • (1991) Tech. Report FKI-149-91
    • Schmidhuber1
  • 68
    • 0008487586 scopus 로고
    • In defense of reaction plans as caches
    • (1989) AI Mag. , vol.10 , pp. 51-60
    • Schoppers1
  • 69
    • 0028497385 scopus 로고
    • An upper bound on the loss from approximate optimal value functions. technical note
    • (1994) Mach. Learn. , vol.16 , pp. 227-233
    • Singh1    Yee2
  • 70
    • 0003617454 scopus 로고
    • Temporal credit assignment in reinforcement learning
    • University of Massachusetts, Amherst, MA
    • (1984) Ph.D. Thesis
    • Sutton1
  • 71
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • (1988) Mach. Learn. , vol.3 , pp. 9-44
    • Sutton1
  • 74
    • 0010714713 scopus 로고
    • A Special Issue of Machine Learning on Reinforcement Learning
    • (1992) Mach. Learn. , vol.8
    • Sutton1
  • 76
    • 0019537951 scopus 로고
    • Toward a modern theory of adaptive networks: expectation and prediction
    • (1981) Psychol. Rev. , vol.88 , pp. 135-170
    • Sutton1    Barto2
  • 81
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • (1992) Mach. Learn. , vol.8 , pp. 257-277
    • Tesauro1
  • 85
    • 0004049893 scopus 로고
    • Learning from delayed rewards
    • 3d ed., Cambridge University, Cambridge, England
    • (1989) Ph.D. Thesis
    • Watkins1
  • 88
    • 0003529238 scopus 로고
    • Beyond regression: new tools for prediction and analysis in the behavioral sciences
    • 3d ed., Harvard University, Cambridge, MA
    • (1974) Ph.D. Thesis
    • Werbos1
  • 92
    • 0000903748 scopus 로고
    • Generalization of back propagation with applications to a recurrent gas market model
    • (1988) Neural Networks , vol.1 , pp. 339-356
    • Werbos1
  • 96
    • 0017524329 scopus 로고
    • An adaptive optimal controller for discrete-time Markov environments
    • (1977) Infor. Control , vol.34 , pp. 286-295
    • Witten1
  • 98
    • 5844332810 scopus 로고
    • Abstraction in control learning
    • 3d ed., Department of Computer Science, University of Massachusetts, Amherst, MA
    • (1992) Tech. Report 92-16
    • Yee1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.