-
2
-
-
0039816976
-
Using local trajectory optimizers to speed up global optimization in dynamic programming
-
J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
-
Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing system, 6 (pp. 663-670). San Mateo, CA: Morgan Kaufmann.
-
(1994)
Advances in Neural Information Processing System
, vol.6
, pp. 663-670
-
-
Atkeson, C.G.1
-
3
-
-
0004370245
-
Advantage updating
-
Wright Laboratory, Wright-Patterson Air Force Base, OH
-
Baird, L. C. (1993). Advantage updating (Tech. Rep. No. WL-TR-93-1146). Wright Laboratory, Wright-Patterson Air Force Base, OH.
-
(1993)
Tech. Rep. No. WL-TR-93-1146
-
-
Baird, L.C.1
-
4
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
A. Prieditis & S. Russel (Eds.), San Mateo, CA: Morgan Kaufmann
-
Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis & S. Russel (Eds.), Machine learning: Proceedings of the Twelfth International Conference. San Mateo, CA: Morgan Kaufmann.
-
(1995)
Machine Learning: Proceedings of the Twelfth International Conference
-
-
Baird, L.C.1
-
5
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 834-846.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.13
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
7
-
-
0000859970
-
Reinforcement learning applied to linear quadratic regulation
-
C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), San Mateo, CA: Morgan Kaufmann
-
Bradtke, S. J. (1993). Reinforcement learning applied to linear quadratic regulation. In C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), Advances in neural information processing systems, 5 (pp. 295-302). San Mateo, CA: Morgan Kaufmann.
-
(1993)
Advances in Neural Information Processing Systems
, vol.5
, pp. 295-302
-
-
Bradtke, S.J.1
-
8
-
-
85150714688
-
Reinforcement learning methods for continuous-time Markov decision problems
-
G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
-
Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 393-400). Cambridge, MA: MIT Press.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 393-400
-
-
Bradtke, S.J.1
Duff, M.O.2
-
9
-
-
85156187730
-
Improving elevator performance using reinforcement learning
-
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
-
Crites, R. H., & Barto, A. G. (1996). Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1017-1023). Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1017-1023
-
-
Crites, R.H.1
Barto, A.G.2
-
10
-
-
0040409498
-
Improving policies without measuring merits
-
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
-
Dayan, P., & Singh, S. P. (1996). Improving policies without measuring merits. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1059-1065). Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1059-1065
-
-
Dayan, P.1
Singh, S.P.2
-
11
-
-
85156231814
-
Temporal difference learning in continuous time and space
-
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
-
Doya, K. (1996). Temporal difference learning in continuous time and space. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1073-1079). Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1073-1079
-
-
Doya, K.1
-
12
-
-
0000406101
-
Efficient nonlinear control with actor-tutor architecture
-
M. C. Mozer, M. I. Jordan, & T. P. Petsche (Eds.), Cambridge, MA: MIT Press
-
Doya, K. (1997). Efficient nonlinear control with actor-tutor architecture. In M. C. Mozer, M. I. Jordan, & T. P. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 1012-1018). Cambridge, MA: MIT Press.
-
(1997)
Advances in Neural Information Processing Systems
, vol.9
, pp. 1012-1018
-
-
Doya, K.1
-
14
-
-
84880694195
-
Stable function approximation in dynamic programming
-
A. Prieditis & S. Russel (Eds.), San Mateo, CA: Morgan Kaufmann
-
Gordon, G. J. (1995). Stable function approximation in dynamic programming. In A. Prieditis & S. Russel (Eds.), Machine learning: Proceedings of the Twelfth International Conference. San Mateo, CA: Morgan Kaufmann.
-
(1995)
Machine Learning: Proceedings of the Twelfth International Conference
-
-
Gordon, G.J.1
-
15
-
-
85156203891
-
Stable fitted reinforcement learning
-
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
-
Gordon, G. J. (1996). Stable fitted reinforcement learning. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1052-1058). Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1052-1058
-
-
Gordon, G.J.1
-
16
-
-
0025600638
-
A stochastic reinforcement learning algorithm for learning real-valued functions
-
Gullapalli, V. (1990). A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks, 3, 671-192.
-
(1990)
Neural Networks
, vol.3
, pp. 671-1192
-
-
Gullapalli, V.1
-
17
-
-
79957749002
-
Reinforcement learning applied to a differential game
-
Harmon, M. E., Baird, III, L. C., & Klopf, A. H. (1996). Reinforcement learning applied to a differential game. Adaptive Behavior, 4, 3-28.
-
(1996)
Adaptive Behavior
, vol.4
, pp. 3-28
-
-
Harmon, M.E.1
Baird L.C. III2
Klopf, A.H.3
-
18
-
-
0004469897
-
Neurons with graded response have collective computational properties like those of two-state neurons
-
Hopfield, J. J. (1984). Neurons with graded response have collective computational properties like those of two-state neurons. Proceedings of National Academy of Science, 81, 3088-3092.
-
(1984)
Proceedings of National Academy of Science
, vol.81
, pp. 3088-3092
-
-
Hopfield, J.J.1
-
19
-
-
0029679044
-
Reinforcement learning: A survey
-
Kaelbling, L. P., Littman, M. L., & Moore, A. W. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237-285.
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
21
-
-
0006488247
-
The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
-
J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), San Mateo, CA: Morgan Kaufmann
-
Moore, A. W. (1994). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. InJ. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems, 6 (pp. 711-718). San Mateo, CA: Morgan Kaufmann.
-
(1994)
Advances in Neural Information Processing Systems
, vol.6
, pp. 711-718
-
-
Moore, A.W.1
-
23
-
-
0039225090
-
A convergent reinforcement learning algorithm in the continuous case based on a finite difference method
-
Munos, R. (1997). A convergent reinforcement learning algorithm in the continuous case based on a finite difference method. In Proceedings of International Joint Conference on Artificial Intelligence (pp. 826-831).
-
(1997)
Proceedings of International Joint Conference on Artificial Intelligence
, pp. 826-831
-
-
Munos, R.1
-
24
-
-
0039225086
-
Reinforcement learning for continuous stochastic control problems
-
M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
-
Munos, R., & Bourgine, P. (1998). Reinforcement learning for continuous stochastic control problems. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10 (pp. 1029-1035). Cambridge, MA: MIT Press.
-
(1998)
Advances in Neural Information Processing Systems
, vol.10
, pp. 1029-1035
-
-
Munos, R.1
Bourgine, P.2
-
25
-
-
0039225087
-
Adaptive choice of grid and time in reinforcement learning
-
M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Cambridge, MA: MIT Press
-
Pareigis, S. (1998). Adaptive choice of grid and time in reinforcement learning. In M. I. Jordan, M. J. Kearns, & S. A. Solla (Eds.), Advances in neural information processing systems, 10 (pp. 1036-1042). Cambridge, MA: MIT Press.
-
(1998)
Advances in Neural Information Processing Systems
, vol.10
, pp. 1036-1042
-
-
Pareigis, S.1
-
26
-
-
0039225088
-
On-line estimation of the optimal value function: HJB-estimators
-
C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), San Mateo, CA: Morgan Kaufmann
-
Peterson, J. K. (1993). On-line estimation of the optimal value function: HJB-estimators. In C. L. Giles, S. J. Hanson, & J. D. Cowan (Eds.), Advances in neural information processing systems, 5 (pp. 319-326). San Mateo, CA: Morgan Kaufmann.
-
(1993)
Advances in Neural Information Processing Systems
, vol.5
, pp. 319-326
-
-
Peterson, J.K.1
-
27
-
-
84898995067
-
Learning from demonstration
-
M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
-
Schaal, S. (1997). Learning from demonstration. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 1040-1046). Cambridge, MA: MIT Press.
-
(1997)
Advances in Neural Information Processing Systems
, vol.9
, pp. 1040-1046
-
-
Schaal, S.1
-
28
-
-
84898972974
-
Reinforcement learning for dynamic channel allocation in cellular telephone systems
-
M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
-
Singh, S., & Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In M. C. Mozer, M. I. Jordan, & T. Petsche (Eds.), Advances in neural information processing systems, 9 (pp. 974-980). Cambridge, MA: MIT Press.
-
(1997)
Advances in Neural Information Processing Systems
, vol.9
, pp. 974-980
-
-
Singh, S.1
Bertsekas, D.2
-
29
-
-
85153965130
-
Reinforcement learning with soft state aggregation
-
G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Cambridge, MA: MIT Press
-
Singh, S. P., Jaakkola, T., & Jordan, M. I. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. S. Touretzky, & T. K. Leen (Eds.), Advances in neural information processing systems, 7 (pp. 361-368). Cambridge, MA: MIT Press.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 361-368
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
30
-
-
33847202724
-
Learning to predict by the methods of temporal difference
-
Sutton, R. S. (1988). Learning to predict by the methods of temporal difference. Machine Learning, 3, 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
32
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
-
Sutton, R. S. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in neural information processing systems, 8 (pp. 1038-1044). Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1044
-
-
Sutton, R.S.1
-
34
-
-
0000985504
-
TD-Gammon, a self teaching backgammon program, achieves master-level play
-
Tesauro, G. (1994). TD-Gammon, a self teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
-
(1994)
Neural Computation
, vol.6
, pp. 215-219
-
-
Tesauro, G.1
-
35
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
36
-
-
0004049893
-
-
Unpublished doctoral dissertation, Cambridge University
-
Watkins, C. J. C. H. (1989). Learning from delayed rewards. Unpublished doctoral dissertation, Cambridge University.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.J.C.H.1
-
37
-
-
0002011091
-
A menu of designs for reinforcement learning over time
-
W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Cambridge, MA: MIT Press
-
Werbos, P. J. (1990). A menu of designs for reinforcement learning over time. In W. T. Miller, R. S. Sutton, & P. J. Werbos (Eds.), Neural networks for control (pp. 67-95). Cambridge, MA: MIT Press.
-
(1990)
Neural Networks for Control
, pp. 67-95
-
-
Werbos, P.J.1
-
38
-
-
0001648572
-
High-performance job-shop scheduling with a time-delay TD(λ) network
-
D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Cambridge, MA: MIT Press
-
Zhang, W., & Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD(λ) network. In D. S. Touretzky, M. C. Mozer, & M. E. Hasselmo (Eds.), Advances in Neural Information Processing Systems, 8. Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
-
-
Zhang, W.1
Dietterich, T.G.2
|