-
4
-
-
84879976780
-
The arcade learning Environment: An evaluation platform for general agents
-
Bellemare, M. G.; Naddaf, Y.; Veness, J.; and Bowling, M. 2013. The Arcade Learning Environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47:253-279.
-
(2013)
Journal of Artificial Intelligence Research
, vol.47
, pp. 253-279
-
-
Bellemare, M.G.1
Naddaf, Y.2
Veness, J.3
Bowling, M.4
-
5
-
-
85012688561
-
-
Princeton, NJ: Princeton University Press
-
Bellman, R. E. 1957. Dynamic programming. Princeton, NJ: Princeton University Press.
-
(1957)
Dynamic Programming
-
-
Bellman, R.E.1
-
7
-
-
84861380255
-
Q-learning and enhanced policy iteration in discounted dynamic programming
-
Bertsekas, D. P., and Yu, H. 2012. Q-learning and enhanced policy iteration in discounted dynamic programming. Mathematics of Operations Research 37(1):66-94.
-
(2012)
Mathematics of Operations Research
, vol.37
, Issue.1
, pp. 66-94
-
-
Bertsekas, D.P.1
Yu, H.2
-
8
-
-
79960439729
-
Approximate policy iteration: A survey and some new methods
-
Bertsekas, D. P. 2011. Approximate policy iteration: A survey and some new methods. Journal of Control Theory and Applications 9(3):310-335.
-
(2011)
Journal of Control Theory and Applications
, vol.9
, Issue.3
, pp. 310-335
-
-
Bertsekas, D.P.1
-
13
-
-
84947249480
-
Optimal hour ahead bidding in the real time electricity market with battery storage using approximate dynamic programming
-
Jiang, D. R., and Powell, W. B. 2015. Optimal hour ahead bidding in the real time electricity market with battery storage using approximate dynamic programming. INFORMS Journal on Computing 27(3):525 - 543.
-
(2015)
INFORMS Journal on Computing
, vol.27
, Issue.3
, pp. 525-543
-
-
Jiang, D.R.1
Powell, W.B.2
-
16
-
-
84924051598
-
Humanlevel control through deep reinforcement, learning
-
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; et al. 2015. Humanlevel control through deep reinforcement learning. Nature 518(7540):529-533.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
18
-
-
0036832953
-
Variable resolution discretization in optimal control
-
Munos, R., and Moore, A. 2002. Variable resolution discretization in optimal control. Machine learning 49(2- 3):291-323.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 291-323
-
-
Munos, R.1
Moore, A.2
-
19
-
-
0036832956
-
Kernel-based reinforcement learning
-
Ormoneit, D., and Sen, S. 2002. Kernel-based reinforcement learning. Machine learning 49(2-3):161-178.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 161-178
-
-
Ormoneit, D.1
Sen, S.2
-
21
-
-
67650996818
-
Reinforcement learning for robot soccer
-
Riedmiller, M.; Gabel, T.; Hafner, R.; and Lange, S. 2009. Reinforcement learning for robot soccer. Autonomous Robots 27(1):55-73.
-
(2009)
Autonomous Robots
, vol.27
, Issue.1
, pp. 55-73
-
-
Riedmiller, M.1
Gabel, T.2
Hafner, R.3
Lange, S.4
-
24
-
-
84899464022
-
Horde: A scalable real-Time architecture for learning knowledge from unsupervised sensorimotor interaction
-
Sutton, R.; Modayil, J.; Delp, M.; Degris, T.; Pilarski, P.; White, A.; and Precup, D. 2011. Horde: A scalable real-Time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proceedings of the Tenth International Conference on Autonomous Agents and Multiagents Systems.
-
(2011)
Proceedings of the Tenth International Conference on Autonomous Agents and Multiagents Systems
-
-
Sutton, R.1
Modayil, J.2
Delp, M.3
Degris, T.4
Pilarski, P.5
White, A.6
Precup, D.7
-
25
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3(1):9-44.
-
(1988)
Machine Learning
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
26
-
-
85156221438
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
Sutton, R. S. 1996. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems 8, 1038- 1044.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
, pp. 1038-1044
-
-
Sutton, R.S.1
-
27
-
-
0029276036
-
Temporal difference learning and TDGammon
-
Tesauro, G. 1995. Temporal difference learning and TDGammon. Communications of the ACM 38(3).
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
-
-
Tesauro, G.1
-
31
-
-
84960105649
-
Compress and control
-
Veness, J.; Bellemare, M. G.; Hutter, M.; Chua, A.; and Desjardins, G. 2015. Compress and control. In Proceedings of the AAAI Conference on Artificial Intelligence.
-
(2015)
Proceedings of the AAAI Conference on Artificial Intelligence
-
-
Veness, J.1
Bellemare, M.G.2
Hutter, M.3
Chua, A.4
Desjardins, G.5
-
32
-
-
0004049893
-
-
Ph.D. Dissertation, Cambridge University, Cambridge, England
-
Watkins, C. 1989. Learning From Delayed Rewards. Ph.D. Dissertation, Cambridge University, Cambridge, England.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
|