-
2
-
-
33646413135
-
-
J. Peters, S. Vijayakumar, and S. Schaal, Natural actor-critic, in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., 3720. Springer, 2005, pp. 280-291.
-
J. Peters, S. Vijayakumar, and S. Schaal, "Natural actor-critic," in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., vol. 3720. Springer, 2005, pp. 280-291.
-
-
-
-
3
-
-
33646398129
-
-
M. Riedmiller, Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method, in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., 3720. Springer, 2005, pp. 317-328.
-
M. Riedmiller, "Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method," in 16th European Conference on Machine Learning, Porto, Portugal, October 3-7, 2005, Proceedings, ser. Lecture Notes in Computer Science, J. Gama, R. Camacho, P. Brazdil, A. Jorge, and L. Torgo, Eds., vol. 3720. Springer, 2005, pp. 317-328.
-
-
-
-
5
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
6
-
-
0000430514
-
The convergence of TD(λ) for general lambda
-
P. Dayan, "The convergence of TD(λ) for general lambda," Machine Learning, vol. 8, pp. 341-362, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.1
-
7
-
-
0004049893
-
Learning from delayed rewards,
-
Ph.D. dissertation, King's College, Cambridge, England
-
C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D. dissertation, King's College, Cambridge, England, 1989.
-
(1989)
-
-
Watkins, C.J.C.H.1
-
9
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press, Cambridge MA
-
R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. MIT Press, Cambridge MA, 1996, pp. 1038-1045.
-
(1996)
Advances in Neural Information Processing Systems 8
, pp. 1038-1045
-
-
Sutton, R.S.1
-
11
-
-
0003477315
-
-
Wright-Patterson Air Force Base Ohio: Wright Laboratory, Tech. Rep. WL-TR-93-1147, Online, Available
-
L. C. Baird and A. H. Klopf, "Reinforcement learning with high-dimensional, continuous actions," Wright-Patterson Air Force Base Ohio: Wright Laboratory, Tech. Rep. WL-TR-93-1147, 1993. [Online]. Available: http://leemon.eom/papers/index.html#b93b
-
(1993)
Reinforcement learning with high-dimensional, continuous actions
-
-
Baird, L.C.1
Klopf, A.H.2
-
12
-
-
0031236002
-
Adaptive critic designs
-
September, Online, Available
-
D. V. Prokhorov and D. C. Wunsch II, "Adaptive critic designs," IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 997-1007, September 1997. [Online]. Available: citeseer.csail.mit.edu/prokhorov97adaptive. html
-
(1997)
IEEE Transactions on Neural Networks
, vol.8
, Issue.5
, pp. 997-1007
-
-
Prokhorov, D.V.1
Wunsch II, D.C.2
|