-
1
-
-
0000396062
-
Natural gradient works efficiently in learning
-
S. I. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2): 251-276, 1998.
-
(1998)
Neural Computation
, vol.10
, Issue.2
, pp. 251-276
-
-
Amari, S.I.1
-
2
-
-
85057316160
-
Distributed second-order optimization using kronecker-factored approximations
-
J. Ba, R. Grosse, and J. Martens. Distributed second-order optimization using Kronecker-factored approximations. In ICLR, 2017.
-
(2017)
ICLR
-
-
Ba, J.1
Grosse, R.2
Martens, J.3
-
4
-
-
84879976780
-
The arcade learning environment: An evaluation platform for general agents
-
M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253-279, 2013.
-
(2013)
Journal of Artificial Intelligence Research
, vol.47
, pp. 253-279
-
-
Bellemare, M.G.1
Naddaf, Y.2
Veness, J.3
Bowling, M.4
-
5
-
-
85015444377
-
-
G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. OpenAI Gym. arXiv preprint arXiv: 1606.01540, 2016.
-
(2016)
OpenAI Gym
-
-
Brockman, G.1
Cheung, V.2
Pettersson, L.3
Schneider, J.4
Schulman, J.5
Tang, J.6
Zaremba, W.7
-
6
-
-
84998893215
-
A kronecker-factored approximate fisher matrix for convolutional layers
-
R. Grosse and J. Martens. A Kronecker-factored approximate Fisher matrix for convolutional layers. In ICML, 2016.
-
(2016)
ICML
-
-
Grosse, R.1
Martens, J.2
-
7
-
-
85041942380
-
Q-prop: Sample-efficient policy gradient with an off-policy critic
-
S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. In ICLR, 2017.
-
(2017)
ICLR
-
-
Gu, S.1
Lillicrap, T.2
Ghahramani, Z.3
Turner, R.E.4
Levine, S.5
-
8
-
-
85044446086
-
-
N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv: 1707.02286, 2017.
-
(2017)
Emergence of Locomotion Behaviours in Rich Environments
-
-
Heess, N.1
Tb, D.2
Sriram, S.3
Lemmon, J.4
Merel, J.5
Wayne, G.6
Tassa, Y.7
Erez, T.8
Wang, Z.9
Eslami, S.M.A.10
Riedmiller, M.11
Silver, D.12
-
9
-
-
85088229768
-
Reinforcement learning with unsupervised auxiliary tasks
-
M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. In ICLR, 2017.
-
(2017)
ICLR
-
-
Jaderberg, M.1
Mnih, V.2
Czarnecki, W.M.3
Schaul, T.4
Leibo, J.Z.5
Silver, D.6
Kavukcuoglu, K.7
-
11
-
-
85083951076
-
Adam: A method for stochastic optimization
-
D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.
-
(2015)
ICLR
-
-
Kingma, D.1
Ba, J.2
-
12
-
-
85083953657
-
Continuous control with deep reinforcement learning
-
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. In ICLR, 2016.
-
(2016)
ICLR
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
13
-
-
77956541496
-
Deep learning via hessian-free optimization
-
J. Martens. Deep learning via Hessian-free optimization. In ICML-10, 2010.
-
(2010)
ICML-10
-
-
Martens, J.1
-
15
-
-
84969988426
-
Optimizing neural networks with kronecker-factored approximate curvature
-
J. Martens and R. Grosse. Optimizing neural networks with kronecker-factored approximate curvature. In ICML, 2015.
-
(2015)
ICML
-
-
Martens, J.1
Grosse, R.2
-
16
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
Petersen, S.11
Beattie, C.12
Sadik, A.13
Antonoglou, I.14
King, H.15
Kumaran, D.16
Wierstra, D.17
Legg, S.18
Hassabis, D.19
-
17
-
-
84999036937
-
Asynchronous methods for deep reinforcement learning
-
V. Mnih, A. Puigdomenech Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In ICML, 2016.
-
(2016)
ICML
-
-
Mnih, V.1
Puigdomenech Badia, A.2
Mirza, M.3
Graves, A.4
Lillicrap, T.P.5
Harley, T.6
Silver, D.7
Kavukcuoglu, K.8
-
19
-
-
40649106649
-
Natural actor-critic
-
J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71(7-9): 1180-1190, 2008.
-
(2008)
Neurocomputing
, vol.71
, Issue.7-9
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
20
-
-
0036631778
-
Fast curvature matrix-vector products for second-order gradient descent
-
N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 2002.
-
(2002)
Neural Computation
-
-
Schraudolph, N.N.1
-
21
-
-
84969963490
-
Trust region policy optimization
-
J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning (ICML)
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.I.4
Moritz, P.5
-
22
-
-
85083954383
-
High-dimensional continuous control using generalized advantage estimation
-
J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
-
(2016)
Proceedings of the International Conference on Learning Representations (ICLR)
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
23
-
-
85041194636
-
-
J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.
-
(2017)
Proximal Policy Optimization Algorithms
-
-
Schulman, J.1
Wolski, F.2
Dhariwal, P.3
Radford, A.4
Klimov, O.5
-
24
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587): 484-489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
Dieleman, S.11
Grewe, D.12
Nham, J.13
Kalchbrenner, N.14
Sutskever, I.15
Lillicrap, T.16
Leach, M.17
Kavukcuoglu, K.18
Graepel, T.19
Hassabis, D.20
more..
-
27
-
-
85031087674
-
Sample efficient actor-critic with experience replay
-
Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos, K. Kavukcuoglu, and N. de Freitas. Sample efficient actor-critic with experience replay. In ICLR, 2016.
-
(2016)
ICLR
-
-
Wang, Z.1
Bapst, V.2
Heess, N.3
Mnih, V.4
Munos, R.5
Kavukcuoglu, K.6
De Freitas, N.7
-
28
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3): 229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 229-256
-
-
Williams, R.J.1
|