-
5
-
-
77954101982
-
GQ(A): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
-
Atlantis Press
-
Maei, H. R., Sutton, R. S. (2010). GQ(A): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, pp. 91-96. Atlantis Press.
-
(2010)
Proceedings of the Third Conference on Artificial General Intelligence
, pp. 91-96
-
-
Maei, H.R.1
Sutton, R.S.2
-
6
-
-
84896357393
-
Multi-timescale nexting in a reinforcement learning robot
-
Modayil, J., White, A., Sutton, R. S. (2014). Multi-timescale nexting in a reinforcement learning robot. Adaptive Behavior 22(2): 146-160.
-
(2014)
Adaptive Behavior
, vol.22
, Issue.2
, pp. 146-160
-
-
Modayil, J.1
White, A.2
Sutton, R.S.3
-
8
-
-
0242393653
-
Eligibility traces for off-policy policy evaluation
-
Morgan Kaufmann
-
Precup, D., Sutton, R. S., Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the 17th International Conference on Machine Learning, pp. 759-766. Morgan Kaufmann.
-
(2000)
Proceedings of the 17th International Conference on Machine Learning
, pp. 759-766
-
-
Precup, D.1
Sutton, R.S.2
Singh, S.3
-
9
-
-
4644328593
-
Off-policy temporal-difference learning with function approximation
-
Precup, D., Sutton, R. S., Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In Proceedings of the 18th International Conference on Machine Learning, pp. 417-424.
-
(2001)
Proceedings of the 18th International Conference on Machine Learning
, pp. 417-424
-
-
Precup, D.1
Sutton, R.S.2
Dasgupta, S.3
-
11
-
-
0032114627
-
Analytical mean squared error curves for temporal difference learning
-
Singh, S. P., Dayan, P. (1998). Analytical mean squared error curves for temporal difference learning. Machine Learning 52:5-40.
-
(1998)
Machine Learning
, vol.52
, pp. 5-40
-
-
Singh, S.P.1
Dayan, P.2
-
12
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
Singh, S. P., Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning 22:123-158.
-
(1996)
Machine Learning
, vol.22
, pp. 123-158
-
-
Singh, S.P.1
Sutton, R.S.2
-
13
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning 3:9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
16
-
-
84899464022
-
Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
-
Taipei, Taiwan
-
Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan.
-
(2011)
Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems
-
-
Sutton, R.S.1
Modayil, J.2
Delp, M.3
Degris, T.4
Pilarski, P.M.5
White, A.6
Precup, D.7
-
17
-
-
0010495476
-
On bias and step size in temporal-difference learning
-
New Haven, CT. Yale University
-
Sutton, R. S., Singh, S. (1994). On bias and step size in temporal-difference learning. In Proceedings of the Eighth Yale Workshop on Adaptive and Learning Systems, pp. 91-96, New Haven, CT. Yale University.
-
(1994)
Proceedings of the Eighth Yale Workshop on Adaptive and Learning Systems
, pp. 91-96
-
-
Sutton, R.S.1
Singh, S.2
-
18
-
-
84923286978
-
Off-policy TD(A) with a true online equivalence
-
Quebec City, Canada
-
van Hasselt, H., Mahmood, A. R., Sutton, R. S. (2014). Off-policy TD(A) with a true online equivalence. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, Quebec City, Canada.
-
(2014)
Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence
-
-
Van Hasselt, H.1
Mahmood, A.R.2
Sutton, R.S.3
-
20
-
-
67650505307
-
A theoretical and empirical analysis of expected sarsa
-
van Seijen, H., van Hasselt, H., Whiteson, S., Wiering, M. (2009). A theoretical and empirical analysis of Expected Sarsa. In Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 177-184.
-
(2009)
Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
, pp. 177-184
-
-
Van Seijen, H.1
Van Hasselt, H.2
Whiteson, S.3
Wiering, M.4
-
22
-
-
77956517288
-
Convergence of least-squares temporal difference methods under general conditions
-
Yu, H. (2010). Convergence of least-squares temporal difference methods under general conditions. In Proceedings of the 27th International Conference on Machine Learning, pp. 1207-1214.
-
(2010)
Proceedings of the 27th International Conference on Machine Learning
, pp. 1207-1214
-
-
Yu, H.1
|