-
1
-
-
0029679044
-
Reinforcement learning :A survey
-
L.P.Kaelbling, M.L Littman, A.W.Moore. Reinforcement Learning :A Survey. Journal of Artificial Intelligence Research, Vol.4., pp:237-285, 1996.
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, M.L.2
Moore, A.W.3
-
3
-
-
84898958374
-
Gradient descent for general reinforcement learning
-
M. S. Kearns, S. A. Solla, and D. A. Cohn, editors; MIT Press, Cambridge, MA
-
L.C. Baird. Gradient descent for general reinforcement learning. M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in NeuralInformation Processing Systems 11, 1999, MIT Press, Cambridge, MA.
-
(1999)
Advances in NeuralInformation Processing Systems
, vol.11
-
-
Baird, L.C.1
-
4
-
-
0029751418
-
The loss from imperfect value functions in expectation-based and minimax-based tasks
-
M. Heger. The Loss From Imperfect Value Functions in Expectation-Based and Minimax-Based Tasks. Machine Learning, 22, pp. 197-225, 1996.
-
(1996)
Machine Learning
, vol.22
, pp. 197-225
-
-
Heger, M.1
-
5
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
MIT Press
-
R. Sutton. Generalization in Reinforcement Learning: Successful Examples Using Sparse Coarse Coding. Advances in Neural Information Processing Systems 8, 1996, MIT Press, pp: 1038-1044.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1044
-
-
Sutton, R.1
-
9
-
-
0025421590
-
Nonlinear controllers for non-integratable systems: The acrobot example
-
J. Hauser, R.M. Murray. Nonlinear controllers for non-integratable systems: the acrobot example. Proc. Of American Control Conference, San Diego, USA, 1990, pp. 669-671.
-
Proc. of American Control Conference, San Diego, USA, 1990
, pp. 669-671
-
-
Hauser, J.1
Murray, R.M.2
-
10
-
-
0041123319
-
Psedolinearization of the acrobot using spline functions
-
S. Bortoff, M.W. Spong, Psedolinearization of the acrobot using spline functions. Proc. Of the IEEE Conf. On Decision and Control, Teuson, Arizona, 1992, pp. 593-598.
-
Proc. of the IEEE Conf. on Decision and Control, Teuson, Arizona, 1992
, pp. 593-598
-
-
Bortoff, S.1
Spong, M.W.2
-
11
-
-
0029255284
-
The swing up control problem for the acrobot
-
M.W. Spong. The swing up control problem for the acrobot. IEEE Control System Magazine, 15(1), pp. 49-55, 1995.
-
(1995)
IEEE Control System Magazine
, vol.15
, Issue.1
, pp. 49-55
-
-
Spong, M.W.1
-
12
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
S. Singh, T. Jaakkola, M.L. Littman and C. Szepesvari Convergence Results for Single-step On-policy Reinforcement-learning Algorithms. Machine Learning, Vol. 38, pp. 287-308, 2000.
-
(2000)
Machine Learning
, vol.38
, pp. 287-308
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
Szepesvari, C.4
-
13
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
MIT Press
-
R. Sutton, D. McAllester, S. Singh, Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in NIPS 12, 1999, pp. 1057-1063, MIT Press.
-
(1999)
Advances in NIPS
, vol.12
, pp. 1057-1063
-
-
Sutton, R.1
Mcallester, D.2
Singh, S.3
Mansour, Y.4
-
14
-
-
0031143730
-
An analysis of temporal difference learning with function approximation
-
J.N. Tsitsiklis and B.V. Roy. An analysis of Temporal Difference Learning with Function Approximation. IEEE Transactions on Automatic Control. 42(5), pp. 674-690, 1997.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Roy, B.V.2
|