-
3
-
-
85015444377
-
-
arXiv preprint
-
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
-
(2016)
Openai Gym
-
-
Brockman, G.1
Cheung, V.2
Pettersson, L.3
Schneider, J.4
Schulman, J.5
Tang, J.6
Zaremba, W.7
-
5
-
-
84999018287
-
Benchmarking deep reinforcement learning for continuous control
-
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. International Conference on Machine Learning (ICML), 2016.
-
(2016)
International Conference on Machine Learning (ICML)
-
-
Duan, Y.1
Chen, X.2
Houthooft, R.3
Schulman, J.4
Abbeel, P.5
-
7
-
-
84897694817
-
Variance reduction techniques for gradient estimates in reinforcement learning
-
Nov
-
Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471-1530, 2004.
-
(2004)
Journal of Machine Learning Research
, vol.5
, pp. 1471-1530
-
-
Greensmith, E.1
Bartlett, P.L.2
Baxter, J.3
-
11
-
-
33646243319
-
A natural policy gradient
-
Sham Kakade. A natural policy gradient. In NIPS, volume 14, pp. 1531-1538, 2001.
-
(2001)
NIPS
, vol.14
, pp. 1531-1538
-
-
Kakade, S.1
-
15
-
-
85083953657
-
Continuous control with deep reinforcement learning
-
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. International Conference on Learning Representations (ICLR), 2016.
-
(2016)
International Conference on Learning Representations (ICLR)
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
18
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
19
-
-
84999036937
-
Asynchronous methods for deep reinforcement learning
-
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2016.
-
(2016)
International Conference on Machine Learning (ICML)
-
-
Mnih, V.1
Badia, A.P.2
Mirza, M.3
Graves, A.4
Lillicrap, T.P.5
Harley, T.6
Silver, D.7
Kavukcuoglu, K.8
-
23
-
-
85167411371
-
Relative entropy policy search
-
Atlanta
-
Jan Peters, Katharina Mülling, and Yasemin Altun. Relative entropy policy search. In AAAI. Atlanta, 2010.
-
(2010)
AAAI
-
-
Peters, J.1
Mülling, K.2
Altun, Y.3
-
25
-
-
0004020933
-
-
Burlington, MA: Elsevier
-
Sheldon M Ross. Simulation. Burlington, MA: Elsevier, 2006.
-
(2006)
Simulation
-
-
Ross, S.M.1
-
26
-
-
84969963490
-
Trust region policy optimization
-
John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning (ICML), pp. 1889-1897, 2015.
-
(2015)
International Conference on Machine Learning (ICML)
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.I.4
Moritz, P.5
-
27
-
-
85083954383
-
High-dimensional continuous control using generalized advantage estimation
-
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. International Conference on Learning Representations (ICLR), 2016.
-
(2016)
International Conference on Learning Representations (ICLR)
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
28
-
-
84919793697
-
Deterministic policy gradient algorithms
-
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In International Conference on Machine Learning (ICML), 2014.
-
(2014)
International Conference on Machine Learning (ICML)
-
-
Silver, D.1
Lever, G.2
Heess, N.3
Degris, T.4
Wierstra, D.5
Riedmiller, M.6
-
29
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
30
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In International Conference on Machine Learning (ICML), pp. 216-224, 1990.
-
(1990)
International Conference on Machine Learning (ICML)
, pp. 216-224
-
-
Sutton, R.S.1
-
31
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS), volume 99, pp. 1057-1063, 1999.
-
(1999)
Advances in Neural Information Processing Systems (NIPS)
, vol.99
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
32
-
-
70049090437
-
Fast gradient-descent methods for temporal-difference learning with linear function approximation
-
Richard S Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, and Eric Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 993-1000. ACM, 2009.
-
(2009)
Proceedings of the 26th Annual International Conference on Machine Learning
, pp. 993-1000
-
-
Sutton, R.S.1
Maei, H.R.2
Precup, D.3
Bhatnagar, S.4
Silver, D.5
Szepesvári, C.6
Wiewiora, E.7
-
34
-
-
85035116867
-
Bias in natural actor-critic algorithms
-
Philip Thomas. Bias in natural actor-critic algorithms. In ICML, pp. 441-448, 2014.
-
(2014)
ICML
, pp. 441-448
-
-
Thomas, P.1
-
38
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
|