-
1
-
-
0000396062
-
Natural gradient works efficiently in learning
-
Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 10(2):251-276, 1998.
-
(1998)
Neural Computation
, vol.10
, Issue.2
, pp. 251-276
-
-
Amari, S.-I.1
-
4
-
-
0004370245
-
-
Technical Wright-Patterson Air Force Base Ohio: Wright Laboratory
-
Leemon C Baird III. Advantage updating. Technical Report WL-TR-93-1146, Wright-Patterson Air Force Base Ohio: Wright Laboratory, 1993.
-
(1993)
Advantage Updating
-
-
Baird, L.C.1
-
18
-
-
84965135289
-
-
arXiv preprint
-
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
-
(2015)
Continuous Control with Deep Reinforcement Learning
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
19
-
-
0003673017
-
Reinforcement learning for robots using neural networks
-
Long-Ji Lin. Reinforcement learning for robots using neural networks. Technical report, DTIC Document, 1993.
-
(1993)
Technical Report, DTIC Document
-
-
Lin, L.-J.1
-
20
-
-
84904867557
-
Playing atari with deep reinforcement learning
-
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop. 2013.
-
(2013)
NIPS Deep Learning Workshop
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Graves, A.4
Antonoglou, I.5
Wierstra, D.6
Riedmiller, M.7
-
21
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
02
-
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 02 2015. URL http://dx.doi.org/10.1038/nature14236.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
Petersen, S.11
Beattie, C.12
Sadik, A.13
Antonoglou, I.14
King, H.15
Kumaran, D.16
Wierstra, D.17
Legg, S.18
Hassabis, D.19
-
22
-
-
84971448181
-
-
arXiv preprint
-
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. arXiv preprint arXiv:1602.01783, 2016.
-
(2016)
Asynchronous Methods for Deep Reinforcement Learning
-
-
Mnih, V.1
Badia, A.P.2
Mirza, M.3
Graves, A.4
Lillicrap, T.P.5
Harley, T.6
Silver, D.7
Kavukcuoglu, K.8
-
23
-
-
85070965865
-
-
arXiv preprint
-
Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, and Dale Schuurmans. Reward augmented maximum likelihood for neural structured prediction. arXiv preprint arXiv:1609.00150, 2016.
-
(2016)
Reward Augmented Maximum Likelihood for Neural Structured Prediction
-
-
Norouzi, M.1
Bengio, S.2
Chen, Z.3
Jaitly, N.4
Schuster, M.5
Wu, Y.6
Schuurmans, D.7
-
26
-
-
0000955979
-
Incremental multi-step Q-learning
-
Jing Peng and Ronald J Williams. Incremental multi-step Q-learning. Machine Learning, 22(1-3): 283-290, 1996.
-
(1996)
Machine Learning
, vol.22
, Issue.1-3
, pp. 283-290
-
-
Peng, J.1
Williams, R.J.2
-
27
-
-
85167411371
-
Relative entropy policy search
-
Atlanta
-
Jan Peters, Katharina Mülling, and Yasemin Altun. Relative entropy policy search. In AAAI. Atlanta, 2010.
-
(2010)
AAAI
-
-
Peters, J.1
Mülling, K.2
Altun, Y.3
-
28
-
-
33646398129
-
Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method
-
Springer Berlin Heidelberg
-
Martin Riedmiller. Neural fitted Q iteration-first experiences with a data efficient neural reinforcement learning method. In Machine Learning: ECML 2005, pp. 317-328. Springer Berlin Heidelberg, 2005.
-
(2005)
Machine Learning: ECML 2005
, pp. 317-328
-
-
Riedmiller, M.1
-
30
-
-
32844474095
-
Reinforcement learning with factored states and actions
-
Aug
-
Brian Sallans and Geoffrey E Hinton. Reinforcement learning with factored states and actions. Journal of Machine Learning Research, 5(Aug):1063-1088, 2004.
-
(2004)
Journal of Machine Learning Research
, vol.5
, pp. 1063-1088
-
-
Sallans, B.1
Hinton, G.E.2
-
32
-
-
84969963490
-
Trust region policy optimization
-
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In Proceedings of The 32nd International Conference on Machine Learning, pp. 1889-1897, 2015.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.4
Moritz, P.5
-
33
-
-
84919793697
-
Deterministic policy gradient algorithms
-
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 387-395, 2014.
-
(2014)
Proceedings of the 31st International Conference on Machine Learning (ICML)
, pp. 387-395
-
-
Silver, D.1
Lever, G.2
Heess, N.3
Degris, T.4
Wierstra, D.5
Riedmiller, M.6
-
34
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
36
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Richard S Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3 (1):9-44, 1988.
-
(1988)
Machine Learning
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
37
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, volume 99, pp. 1057-1063, 1999.
-
(1999)
Advances in Neural Information Processing Systems
, vol.99
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
38
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38 (3):58-68, 1995.
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
, pp. 58-68
-
-
Tesauro, G.1
-
41
-
-
67650505307
-
A theoretical and empirical analysis of expected sarsa
-
Harm Van Seijen, Hado Van Hasselt, Shimon Whiteson, and Marco Wiering. A theoretical and empirical analysis of expected sarsa. In 2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 177-184. IEEE, 2009.
-
(2009)
2009 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning
, pp. 177-184
-
-
Van Seijen, H.1
Van Hasselt, H.2
Whiteson, S.3
Wiering, M.4
-
42
-
-
84888336880
-
Backward q-learning: The combination of sarsa algorithm and q-learning
-
Yin-Hao Wang, Tzuu-Hseng S Li, and Chih-Jui Lin. Backward q-learning: The combination of sarsa algorithm and q-learning. Engineering Applications of Artificial Intelligence, 26(9):2184-2193, 2013.
-
(2013)
Engineering Applications of Artificial Intelligence
, vol.26
, Issue.9
, pp. 2184-2193
-
-
Wang, Y.-H.1
Li, T.-H.S.2
Lin, C.-J.3
-
43
-
-
84998996757
-
Dueling network architectures for deep reinforcement learning
-
Ziyu Wang, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot, and Nando de Freitas. Dueling network architectures for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1995-2003, 2016.
-
(2016)
Proceedings of the 33rd International Conference on Machine Learning (ICML)
, pp. 1995-2003
-
-
Wang, Z.1
Schaul, T.2
Hessel, M.3
Van Hasselt, H.4
Lanctot, M.5
De Freitas, N.6
-
45
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
-
46
-
-
0041154467
-
Function optimization using connectionist reinforcement learning algorithms
-
Ronald J Williams and Jing Peng. Function optimization using connectionist reinforcement learning algorithms. Connection Science, 3(3):241-268, 1991.
-
(1991)
Connection Science
, vol.3
, Issue.3
, pp. 241-268
-
-
Williams, R.J.1
Peng, J.2
|