-
1
-
-
49649148257
-
A theory of cerebellar function
-
J. S. Albus. A theory of cerebellar function. Mathematical Biosciences, 10:25-61, 1975.
-
(1975)
Mathematical Biosciences
, vol.10
, pp. 25-61
-
-
Albus, J.S.1
-
2
-
-
85151728371
-
Residual algorithms: Reinforcement learning with function approximation
-
A. Prieditis and S. Russell, editors Morgan Kaufmann Publishers, San Francisco, CA
-
L. Baird. Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 30-37. Morgan Kaufmann Publishers, San Francisco, CA, 1995.
-
(1995)
Machine Learning: Proceedings of the Twelfth International Conference
, pp. 30-37
-
-
Baird, L.1
-
4
-
-
85153940465
-
Generalization in reinforcement learning: Safely approximating the value function
-
G. Tesauro, D. S. Touretzky, and T. K. Leen, editors. MIT Press, Cambridge MA
-
J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 369-376. MIT Press, Cambridge MA, 1995.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 369-376
-
-
Boyan, J.A.1
Moore, A.W.2
-
5
-
-
0038595393
-
Stable function approximation in dynamic programming
-
Carnegie Mellon University
-
G.J. Gordon. Stable function approximation in dynamic programming. Technical Report CMU-CS-95-103, Carnegie Mellon University, 1995.
-
(1995)
Technical Report CMU-CS-95-103
-
-
Gordon, G.J.1
-
6
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6:1185-1201, 1994.
-
(1994)
Neural Computation
, vol.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
8
-
-
22944468429
-
A convergent form of approximate policy iteration
-
Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors. MIT Press
-
T.J. Perkins and D. Precup. A convergent form of approximate policy iteration. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13. MIT Press, 2002.
-
(2002)
Advances in Neural Information Processing Systems
, vol.13
-
-
Perkins, T.J.1
Precup, D.2
-
10
-
-
0345161982
-
On-line Q-learning using connectionist sytems
-
Cambridge University, UK
-
G.A. Rummery and M. Niranjan. On-line Q-learning using connectionist sytems. Technical Report CUED/F-INFENG-TR 166, Cambridge University, UK, 1994.
-
(1994)
Technical Report
, vol.CUED-F-INFENG-TR 166
-
-
Rummery, G.A.1
Niranjan, M.2
-
11
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
S.P. Singh, T. Jaakkola, M.L. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.P.1
Jaakkola, T.2
Littman, M.L.3
Szepesvari, C.4
-
12
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors. MIT Press, Cambridge MA
-
R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038-1045. MIT Press, Cambridge MA, 1996.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1045
-
-
Sutton, R.S.1
-
15
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
G.J. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58-68, 1995.
-
(1995)
Communications of the ACM
, vol.38
, pp. 58-68
-
-
Tesauro, G.J.1
-
16
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
J. N Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185-202, 1994.
-
(1994)
Machine Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
|