SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

Artificial Intelligence

Volumn 72, Issue 1-2, 1995, Pages 81-138

Learning to act using real-time dynamic programming

(3) Barto, Andrew G a Bradtke, Steven J a Singh, Satinder P a

a UNIVERSITY OF MASSACHUSETTS (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; ARTIFICIAL INTELLIGENCE; CONTROL THEORY; DYNAMIC PROGRAMMING; REAL TIME SYSTEMS;

Q-LEARNING ALGORITHM;

LEARNING SYSTEMS;

EID: 0029210635 PISSN: 00043702 EISSN: None Source Type: Journal
DOI: 10.1016/0004-3702(94)00011-O Document Type: Article

Times cited : (744)

References (98)

1
- 0003997198
- Strategy learning with multilayer connectionist representations
- GTE Laboratories, Incorporated, Waltham, MA
- (1987) Tech. Report TR87-509.3
- Anderson¹

2
- 0344154963
- Strategy Learning with Multilayer Connectionist Representations
- this is a corrected version of the report published in:, Irvine, CA
- (1987) Proceedings Fourth International Conference on Machine Learning , pp. 103-114
- Anderson¹

3
- 0002283578
- Reinforcement learning and adaptive critic methods
- D.A. White, D.A. Sofge, Van Nostrand Reinhold, New York
- (1992) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , pp. 469-491
- Barto¹

4
- 1642382927
- On the computational economics of reinforcement learning
- D.S. Touretzky, J.L. Elman, T.J. Sejnowski, G.E. Hinton, Morgan Kaufmann, San Mateo, CA
- (1991) Connectionist Models: Proceedings of the 1990 Summer School , pp. 35-44
- Barto¹ Singh²

5
- 0020970738
- Neuronlike elements that can solve difficult learning control problems
- reprinted in:, Anderson J.A. Rosenfeld E. Neurocomputing: Foundations of Research 1988 MIT Press Cambridge, MA
- (1983) IEEE Trans. Syst. Man Cybern. , vol.13 , pp. 835-846
- Barto¹ Sutton² Anderson³

6
- 0008840282
- Sequential decision problems and neural networks
- D.S. Touretzky, Morgan Kaufmann, San Mateo, CA
- (1990) Advances in Neural Information Processing Systems , vol.2 , pp. 686-693
- Barto¹ Sutton² Watkins³

7
- 0002201501
- Learning and sequential decision making
- M. Gabriel, J. Moore, MIT Press, Cambridge, MA
- (1990) Learning and Computational Neuroscience: Foundations of Adaptive Networks , pp. 539-602
- Barto¹ Sutton² Watkins³

8
- 84968519017
- Functional approximations and dynamic programming
- (1959) Mathematical Tables and Other Aids to Computation , vol.13 , pp. 247-251
- Bellman¹ Dreyfus²

9
- 84968468700
- Polynomial approximation—a new computational technique in dynamic programming: allocation processes
- (1973) Math. Comp. , vol.17 , pp. 155-161
- Bellman¹ Kalaba² Kotkin³

10
- 0003787146
- Princeton University Press, Princeton, NJ
- (1957) Dynamic Programming
- Bellman¹

11
- 0020138998
- Distributed dynamic programming
- (1982) IEEE Transactions on Automatic Control , vol.27 , pp. 610-616
- Bertsekas¹

12
- 0003565779
- Prentice-Hall, Englewood Cliffs, NJ
- (1987) Dynamic Programming: Deterministic and Stochastic Models
- Bertsekas¹

13
- 0003636164
- Prentice-Hall, Englewood Cliffs, NJ
- (1989) Parallel and Distributed Computation: Numerical Methods
- Bertsekas¹ Tsitsiklis²

14
- 0000859970
- Reinforcement learning applied to linear quadratic regulation
- C.L. Giles, S.J. Hanson, J.D. Cowan, Morgan Kaufmann, San Mateo, CA
- (1993) Advances in Neural Information Processing , vol.5 , pp. 295-302
- Bradtke¹

15
- 0002227762
- Penquins can make cake
- (1989) AI Mag. , vol.10 , pp. 45-50
- Chapman¹

16
- 0002192119
- Input generalization in delayed reinforcement learning: an algorithm and performance comparisons
- Sydney, NSW
- (1991) Proceedings IJCAI-91
- Chapman¹ Kaelbling²

17
- 85168770830
- A unified theory of heuristic evaluation functions and its application to learning
- Philadelphia, PA
- (1986) Proceedings AMI-86 , pp. 148-152
- Christensen¹ Korf²

18
- 0041541978
- A theoretical comparison of the efficiencies of two classical methods and a Monte Carlo method for computing one component of the solution of a set of linear algebraic equations
- H.A. Meyer, Wiley, New York
- (1954) Symposium on Monte Carlo Methods , pp. 191-233
- Curtiss¹

19
- 0016951139
- Splines and efficiency in dynamic programming
- (1976) J. Math. Anal. Appl. , vol.54 , pp. 402-407
- Daniel¹

20
- 0000595242
- Note on learning rate schedule for stochastic optimization
- R.P. Lippmann, J.E. Moody, D.S. Touretzky, Morgan Kaufmann, San Mateo, CA
- (1991) Advances in Neural Information Processing Systems , vol.3 , pp. 832-838
- Darken¹ Moody²

21
- 0010211208
- Navigating through temporal difference
- R.P. Lippmann, J.E. Moody, D.S. Touretzky, Morgan Kaufmann, San Mateo, CA
- (1991) Advances in Neural Information Processing Systems , vol.3 , pp. 464-470
- Dayan¹

22
- 84916483603
- Reinforcing connectionism: learning the statistical way
- University of Edinburgh, Edinburgh, Scotland
- (1991) Ph.D. Thesis
- Dayan¹

23
- 0000430514
- The convergence of TD(λ) for general λ
- (1992) Mach. Learn. , vol.8 , pp. 341-362
- Dayan¹

24
- 0004240515
- Morgan Kaufmann, San Mateo, CA
- (1991) Planning and Control
- Dean¹ Wellman²

25
- 0000104548
- Contraction mappings in the theory underlying dynamic programming
- (1967) SIAM Review , vol.9 , pp. 165-177
- Denardo¹

26
- 0347653210
- Mathematical games
- (1973) Scientific American , vol.228 , pp. 108
- Gardner¹

27
- 0005047432
- *
- (1977) Artif. Intell. , vol.8 , pp. 69-76
- Gelperin¹

28
- 0024885107
- Universal planning: an (almost) universally bad idea
- (1989) AI Mag. , vol.10 , pp. 40-44
- Ginsberg¹

29
- 0002884379
- Birkhauser, Boston, MA
- (1989) Connectionist Problem Solving: Computational Aspects of Biological Learning
- Hampson¹

30
- 84899829959
- A formal basis for the heuristic determination of minimum cost paths
- (1968) IEEE Transactions on Systems Science and Cybernetics , vol.4 , pp. 100-107
- Hart¹ Nilsson² Raphael³

31
- 0000746883
- Escaping brittleness: the possibility of general-purpose learning algorithms applied to rule-based systems
- R.S. Michalski, J.G. Carbonell, T.M. Mitchell, Morgan Kaufmann, San Mateo, CA
- (1986) Machine Learning: An Artificial Intelligence Approach , vol.2 , pp. 593-623
- Holland¹

32
- 0004291983
- Elsevier, New York
- (1970) Differential Dynamic Programming
- Jacobson¹ Mayne²

33
- 0024936372
- Computationally efficient adaptive control algorithms for Markov chains
- Tampa, FL
- (1989) Proceedings 28th Conference on Decision and Control , pp. 1283-1288
- Jalali¹ Ferguson²

34
- 0000676676
- Learning to control an unstable system with forward modeling
- D.S. Touretzky, Morgan Kaufmann, San Mateo, CA
- (1990) Advances in Neural Information Processing Systems , vol.2
- Jordan¹ Jacobs²

35
- 0004280606
- MIT Press, Cambridge, MA
- (1991) Learning in Embedded Systems
- Kaelbling¹

36
- 84916497226
- revised version of:
- (1990) Teleos Research TR-90-04
- Kaelbling¹

37
- 26444479778
- Optimization by simulated annealing
- (1983) Sci. , vol.220 , pp. 671-680
- Kirkpatrick¹ Gelatt² Vecchi³

38
- 0003900353
- Brain function and adaptive systems—a heterostatic theory
- Air Force Cambridge Research Laboratories, Bedford, MA
- (1972) Tech. Report AFCRL-72-0164
- Klopf¹

39
- 84916478650
- Proceedings International Conference on Systems, Man, and Cybernetics
- a summary appears in:
- (1974) Proceedings International Conference on Systems, Man, and Cybernetics
- Klopf¹

40
- 0003607885
- Hemishere, Washington, DC
- (1982) The Hedonistic Neuron: A Theory of Memory, Learning, and Intelligence
- Klopf¹

41
- 0025400088
- Real-time heuristic search
- (1990) Artif. Intell. , vol.42 , pp. 189-211
- Korf¹

42
- 0022062142
- A survey of some results in stochastic adaptive control
- (1985) SIAM J. Control Optimization , vol.23 , pp. 329-380
- Kumar¹

43
- 2042447527
- The CDP: a unifying formulation for heuristic search, dynamic programming, and branch-and-bound
- L.N. Kanal, V. Kumar, Springer-Verlag, Berlin
- (1988) Search in Artificial Intelligence , pp. 1-37
- Kumar¹ Kanal²

44
- 0003924011
- Springer-Verlag, New York
- (1992) Numerical Methods for Stochastic Control Problems in Continuous Time
- Kushner¹ Dupuis²

45
- 0017549069
- A modified quadratic cost problem and feedback stabilization of a linear system
- (1977) IEEE Transactions on Automatic Control , vol.22 , pp. 838-842
- Kwon¹ Pearson²

46
- 23944436740
- A theoretical framework for back-propagation
- D. Touretzky, G. Hinton, T. Sejnowski, Morgan Kaufmann, San Mateo, CA
- (1988) Proceedings 1988 Connectionist Models Summer School , pp. 21-28
- le Cun¹

47
- 0026388814
- Real-time optimal path planning using a distributed computing paradigm
- Boston, MA
- (1991) Proceedings American Control Conference
- Lemmon¹

48
- 85151437138
- Programming robots using reinforcement learning and teaching
- Anaheim, CA
- (1991) Proceedings AAAI-91 , pp. 781-786
- Lin¹

49
- 85074045754
- Self-improvement based on reinforcement learning, planning and teaching
- L.A. Birnbaum, G.C. Collins, Morgan Kaufmann, San Mateo, CA
- (1991) Maching Learning: Proceedings Eighth International Workshop , pp. 323-327
- Lin¹

50
- 0344050557
- Self-improving reactive agents: case studies of reinforcement learning frameworks
- Cambridge, MA
- (1991) From Animals to Animats: Proceedings First International Conference on Simulation of Adaptive Behavior , pp. 297-305
- Lin¹

51
- 0000123778
- Self-improving reactive agents based on reinforcement learning, planning and teaching
- (1992) Mach. Learn. , vol.8 , pp. 293-321
- Lin¹

52
- 0026880130
- Automatic programming of behavior-based robots using reinforcement learning
- (1992) Artif. Intell. , vol.55 , pp. 311-365
- Mahadevan¹ Connell²

53
- 0025462720
- Receding horizon control of nonlinear systems
- (1990) IEEE Trans. Autom. Control , vol.35 , pp. 814-824
- Mayne¹ Michalska²

54
- 0004899145
- A heuristic search algorithm with modifiable estimate
- (1984) Artificial Intelligence , vol.23 , pp. 13-27
- Méro˜¹

55
- 0000827179
- BOXES: an experiment in adaptive control
- E. Dale, D. Michie, Oliver and Boyd, Edinburgh
- (1968) Machine Intelligence , vol.2 , pp. 137-152
- Michie¹ Chambers²

56
- 0013500961
- Theory of neural-analog reinforcement systems and its application to the brain-model problem
- Princeton University, Princeton, NJ
- (1954) Ph.D. Thesis
- Minsky¹

57
- 84937350040
- Steps toward artificial intelligence
- reprinted in:, Feigenbaum E.A. Feldman J. Computers and Thought 1963 McGraw-Hill New York 406 450
- (1961) Proceedings Institute of Radio Engineers , vol.49 , pp. 8-30
- Minsky¹

58
- 0003442587
- Efficient memory-based learning for robot control
- University of Cambridge, Cambridge, England
- (1990) Ph.D. Thesis
- Moore¹

59
- 33747997674
- Variable resolution dynamic programming: efficiently learning action maps in multivariate real-valued state-spaces
- L.A. Birnb, G.C. Collins, Morgan Kaufmann, San Mateo, CA
- (1991) Maching Learning: Proceedings Eighth International Workshop , pp. 333-337
- Moore¹

60
- 84916521733
- Memory-based reinforcement learning: efficient computation with prioritized sweeping
- S.J. Hanson, J.D. Cowan, C.L. Giles, Morgan Kaufmann, San Mateo, CA
- (1993) Advances in Neural Information Processing , vol.5
- Moore¹ Atkeson²

61
- 84977063352
- Efficient learning and planning within the dyna framework
- (1993) Adaptive Behavior , vol.2 , pp. 437-454
- Peng¹ Williams²

62
- 0037581251
- Modified policy iteration algorithms for discounted Markov decision problems
- (1978) Management Science , vol.24 , pp. 1127-1137
- Puterman¹ Shin²

63
- 0004038871
- Academic Press, New York
- (1983) Introduction to Stochastic Dynamic Programming
- Ross¹

64
- 0001201756
- Some studies in machine learning using the game of checkers
- reprinted in:, Feigenbaum E.A. Feldman J. Computers and Thought 1963 McGraw-Hill New York
- (1959) IBM Journal of Research and Development , pp. 210-229
- Samuel¹

65
- 0001201757
- Some studies in machine learning using the game of checkers II—Recent progress
- (1967) IBM Journal of Research and Development , pp. 601-617
- Samuel¹

66
- 0344252216
- Adaptive confidence and adaptive curiosity
- Institut für Informatik, Technische Universität München, 800 München 2, Germany
- (1991) Tech. Report FKI-149-91
- Schmidhuber¹

67
- 0001871991
- Universal plans for reactive robots in unpredictable environments
- Milan, Italy
- (1987) Proceedings IJCAI-87 , pp. 1039-1046
- Schoppers¹

68
- 0008487586
- In defense of reaction plans as caches
- (1989) AI Mag. , vol.10 , pp. 51-60
- Schoppers¹

69
- 0028497385
- An upper bound on the loss from approximate optimal value functions. technical note
- (1994) Mach. Learn. , vol.16 , pp. 227-233
- Singh¹ Yee²

70
- 0003617454
- Temporal credit assignment in reinforcement learning
- University of Massachusetts, Amherst, MA
- (1984) Ph.D. Thesis
- Sutton¹

71
- 33847202724
- Learning to predict by the method of temporal differences
- (1988) Mach. Learn. , vol.3 , pp. 9-44
- Sutton¹

72
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Morgan Kaufmann, San Mateo, CA
- (1990) Proceedings Seventh International Conference on Machine Learning , pp. 216-224
- Sutton¹

73
- 85152618928
- Planning by incremental dynamic programming
- L.A. Birnbaum, G.C. Collins, Morgan Kaufmann, San Mateo, CA
- (1991) Maching Learning: Proceedings Eighth International Workshop , pp. 353-357
- Sutton¹

74
- 0010714713
- A Special Issue of Machine Learning on Reinforcement Learning
- (1992) Mach. Learn. , vol.8
- Sutton¹

75
- 0004007508
- also published as:, Kluwer Academic Press, Boston, MA
- (1992) Reinforcement Learning
- Sutton¹

76
- 0019537951
- Toward a modern theory of adaptive networks: expectation and prediction
- (1981) Psychol. Rev. , vol.88 , pp. 135-170
- Sutton¹ Barto²

77
- 0000580224
- A temporal-difference model of classical conditioning
- Seattle, WA
- (1987) Proceedings Ninth Annual Conference of the Cognitive Science Society
- Sutton¹ Barto²

78
- 0003066891
- Time-derivative models of pavlovian reinforcement
- M. Gabriel, J. Moore, MIT Press, Cambridge, MA
- (1990) Learning and Computational Neuroscience: Foundations of Adaptive Networks , pp. 497-537
- Sutton¹ Barto²

79
- 0026385066
- Reinforcement learning is direct adaptive optimal control
- Boston, MA
- (1991) Proceedings American Control Conference , pp. 2143-2146
- Sutton¹ Barto² Williams³

80
- 84916497406
- Learning a cost-sensitive internal representation for reinforcement learning
- L.A. Birnbaum, G.C. Collins, Morgan Kaufmann, San Mateo, CA
- (1991) Maching Learning: Proceedings Eighth International Workshop , pp. 358-362
- Tan¹

81
- 0001046225
- Practical issues in temporal difference learning
- (1992) Mach. Learn. , vol.8 , pp. 257-277
- Tesauro¹

82
- 0002210775
- The role of exploration in learning control
- 3d ed., D.A. White, D.A. Sofge, Van Nostrand Reinhold, New York
- (1992) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , pp. 527-559
- Thrun¹

83
- 0001546350
- Active exploration in dynamic environments
- 3d ed., J.E. Moody, S.J. Hanson, R.P. Lippmann, Morgan Kaufmann, San Mateo, CA
- (1992) Advances in Neural Information Processing Systems , vol.4
- Thrun¹ Möller²

84
- 0008861422
- Two kinds of training information for evaluation function learning
- Anaheim, CA
- (1991) Proceedings AAAI-91 , pp. 596-600
- Utgoff¹ Clouse²

85
- 0004049893
- Learning from delayed rewards
- 3d ed., Cambridge University, Cambridge, England
- (1989) Ph.D. Thesis
- Watkins¹

86
- 34249833101
- Q-learning
- (1992) Mach. Learn. , vol.8 , pp. 279-292
- Watkins¹ Dayan²

87
- 0002031779
- Approximate dynamic programming for real-time control and neural modeling
- 3d ed., D.A. White, D.A. Sofge, Van Nostrand Reinhold, New York
- (1992) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , pp. 493-525
- Werbos¹

88
- 0003529238
- Beyond regression: new tools for prediction and analysis in the behavioral sciences
- 3d ed., Harvard University, Cambridge, MA
- (1974) Ph.D. Thesis
- Werbos¹

89
- 0002557583
- Advanced forecasting methods for global crisis warning and models of intelligence
- (1977) General Systems Yearbook , vol.22 , pp. 25-38
- Werbos¹

90
- 0001773535
- Applications of advances in nonlinear sensitivity analysis
- R.F. Drenick, F. Kosin, Springer-Verlag, Berlin
- (1982) System Modeling an Optimization
- Werbos¹

91
- 0023169119
- Building and understanding adaptive systems: a statistical/numerical approach to factory automation and brain research
- (1987) IEEE Trans. Syst. Man Cybern.
- Werbos¹

92
- 0000903748
- Generalization of back propagation with applications to a recurrent gas market model
- (1988) Neural Networks , vol.1 , pp. 339-356
- Werbos¹

93
- 0011889743
- Optimal control: a foundation for intelligent control
- 3d ed., D.A. White, D.A. Sofge, Van Nostrand Reinhold, New York
- (1992) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches , pp. 185-214
- White¹ Jordan²

94
- 85152652126
- Complexity and cooperation in Q-learning
- L.A. Birnbaum, G.C. Collins, Morgan Kaufmann, San Mateo, CA
- (1991) Maching Learning: Proceedings Eighth International Workshop , pp. 363-367
- Whitehead¹

95
- 0342455390
- A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming
- New Haven, CT
- (1990) Proceedings Sixth Yale Workshop on Adaptive and Learning Systems , pp. 96-101
- Williams¹ Baird²

96
- 0017524329
- An adaptive optimal controller for discrete-time Markov environments
- (1977) Infor. Control , vol.34 , pp. 286-295
- Witten¹

97
- 0017549934
- Exploring, modelling and controlling discrete sequential environments
- (1977) Int. J. Man-Mach. Stud. , vol.9 , pp. 715-735
- Witten¹

98
- 5844332810
- Abstraction in control learning
- 3d ed., Department of Computer Science, University of Massachusetts, Amherst, MA
- (1992) Tech. Report 92-16
- Yee¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.