-
1
-
-
0037288370
-
-
A.G. Barto, and S. Mahadevan, S, Recent Advances in Hierarchical reinforcement Learning. Discrete Event Dynamic Systems: Theory and Applications 13, 2003, pp. 341-379.
-
A.G. Barto, and S. Mahadevan, S, "Recent Advances in Hierarchical reinforcement Learning". Discrete Event Dynamic Systems: Theory and Applications 13, 2003, pp. 341-379.
-
-
-
-
2
-
-
0029210635
-
Learning to Act Using Real-Time Dynamic Programming
-
A.G. Barto, S.J. Bradtke, and S.P. Singh, "Learning to Act Using Real-Time Dynamic Programming", Artificial Intelligence 72, 1995, pp. 81-138.
-
(1995)
Artificial Intelligence
, vol.72
, pp. 81-138
-
-
Barto, A.G.1
Bradtke, S.J.2
Singh, S.P.3
-
6
-
-
0002278788
-
Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition
-
T.D. Dietterich, "Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition", Journal of Artificial Intelligence Research 13, 2000, pp. 227-303.
-
(2000)
Journal of Artificial Intelligence Research
, vol.13
, pp. 227-303
-
-
Dietterich, T.D.1
-
9
-
-
0029679044
-
Reinforcement Learning: A Survey
-
L.P. Kaelbling, L.M. Littman, and A.W. Moore, "Reinforcement Learning: a Survey", Journal of Artificial Intelligence Research 4, 1996, pp. 237-285.
-
(1996)
Journal of Artificial Intelligence Research
, vol.4
, pp. 237-285
-
-
Kaelbling, L.P.1
Littman, L.M.2
Moore, A.W.3
-
12
-
-
0033901602
-
Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms
-
S.P. Singh, T. Jaakkola, M.L. Littman, C. Szepesvári, "Convergence Results for Single-Step On-Policy Reinforcement-Learning Algorithms", Machine Learning 38, 2000, pp. 287-308.
-
(2000)
Machine Learning
, vol.38
, pp. 287-308
-
-
Singh, S.P.1
Jaakkola, T.2
Littman, M.L.3
Szepesvári, C.4
-
13
-
-
85132026293
-
Integrated Architectures for Learning, Planning and Reacting Based on Approximating Dynamic Programming
-
R.S. Sutton, "Integrated Architectures for Learning, Planning and Reacting Based on Approximating Dynamic Programming", Proceedings of the 7th International Conference on Machine Learning, 1990, pp. 216-224.
-
(1990)
Proceedings of the 7th International Conference on Machine Learning
, pp. 216-224
-
-
Sutton, R.S.1
-
15
-
-
0033170372
-
Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning
-
R.S. Sutton, D. Precup, and S. Singh. "Between MDPs and Semi-MDPs: a Framework for Temporal Abstraction in Reinforcement Learning", Artificial Intelligence 112, 1999, pp. 181-211.
-
(1999)
Artificial Intelligence
, vol.112
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
16
-
-
34547533392
-
-
C.J.C.H. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge, England
-
C.J.C.H. Watkins, Learning from Delayed Rewards, Ph.D. thesis, Cambridge University, Cambridge, England.
-
-
-
|