-
1
-
-
85057345756
-
-
arXiv preprint
-
Kavosh Asadi, Cameron Allen, Melrose Roderick, Abdel-rahman Mohamed, George Konidaris, and Michael Littman. Mean actor critic. arXiv preprint arXiv:1709.00503, 2017.
-
(2017)
Mean Actor Critic
-
-
Asadi, K.1
Allen, C.2
Roderick, M.3
Mohamed, A.-R.4
Konidaris, G.5
Littman, M.6
-
3
-
-
85015444377
-
-
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
-
(2016)
Openai Gym
-
-
Brockman, G.1
Cheung, V.2
Pettersson, L.3
Schneider, J.4
Schulman, J.5
Tang, J.6
Zaremba, W.7
-
7
-
-
85083950952
-
Backpropagation through the void: Optimizing control variates for black-box gradient estimation
-
Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, and David Duvenaud. Backpropagation through the void: Optimizing control variates for black-box gradient estimation. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SyzKd1bCW.
-
(2018)
International Conference on Learning Representations
-
-
Grathwohl, W.1
Choi, D.2
Wu, Y.3
Roeder, G.4
Duvenaud, D.5
-
8
-
-
84897694817
-
Variance reduction techniques for gradient estimates in reinforcement learning
-
Nov
-
Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471–1530, 2004.
-
(2004)
Journal of Machine Learning Research
, vol.5
, pp. 1471-1530
-
-
Greensmith, E.1
Bartlett, P.L.2
Baxter, J.3
-
9
-
-
84979289652
-
Continuous deep q-learning with model-based acceleration
-
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, and Sergey Levine. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning, pp. 2829–2838, 2016a.
-
(2016)
International Conference on Machine Learning
, pp. 2829-2838
-
-
Gu, S.1
Lillicrap, T.2
Sutskever, I.3
Levine, S.4
-
10
-
-
85064813907
-
Q-prop: Sample-efficient policy gradient with an off-policy critic
-
Shixiang Gu, Timothy P. Lillicrap, Zoubin Ghahramani, Richard E. Turner, and Sergey Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. International Conference on Learning Representations (ICLR), 2016b.
-
(2016)
International Conference on Learning Representations (ICLR)
-
-
Gu, S.1
Lillicrap, T.P.2
Ghahramani, Z.3
Turner, R.E.4
Levine, S.5
-
11
-
-
85047014445
-
Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning
-
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, Bernhard Schölkopf, and Sergey Levine. Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. Advances in Neural Information Processing Systems, 2017.
-
(2017)
Advances in Neural Information Processing Systems
-
-
Gu, S.1
Lillicrap, T.2
Ghahramani, Z.3
Turner, R.E.4
Schölkopf, B.5
Levine, S.6
-
12
-
-
85044446086
-
-
arXiv preprint
-
Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
-
(2017)
Emergence of Locomotion Behaviours in Rich Environments
-
-
Heess, N.1
Sriram, S.2
Lemmon, J.3
Merel, J.4
Wayne, G.5
Tassa, Y.6
Erez, T.7
Wang, Z.8
Eslami, A.9
Riedmiller, M.10
-
16
-
-
85083953657
-
Continuous control with deep reinforcement learning
-
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. Proceedings of the 2nd International Conference on Learning Representations (ICLR), 2015.
-
(2015)
Proceedings of the 2nd International Conference on Learning Representations (ICLR)
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
18
-
-
85018878907
-
Stein variational gradient descent: A general purpose Bayesian inference algorithm
-
Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances in Neural Information Processing Systems, 2016.
-
(2016)
Advances in Neural Information Processing Systems
-
-
Liu, Q.1
Wang, D.2
-
20
-
-
84904867557
-
-
arXiv preprint
-
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
-
(2013)
Playing Atari with Deep Reinforcement Learning
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Graves, A.4
Antonoglou, I.5
Wierstra, D.6
Riedmiller, M.7
-
21
-
-
84971448181
-
Asynchronous methods for deep reinforcement learning
-
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 1928-1937
-
-
Mnih, V.1
Badia, A.P.2
Mirza, M.3
Graves, A.4
Lillicrap, T.5
Harley, T.6
Silver, D.7
Kavukcuoglu, K.8
-
24
-
-
84971325632
-
Control functionals for monte carlo integration
-
Chris J Oates, Mark Girolami, and Nicolas Chopin. Control functionals for monte carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):695–718, 2017.
-
(2017)
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
, vol.79
, Issue.3
, pp. 695-718
-
-
Oates, C.J.1
Girolami, M.2
Chopin, N.3
-
26
-
-
85046997203
-
Sticking the landing: An asymptotically zero-variance gradient estimator for variational inference
-
Geoffrey Roeder, Yuhuai Wu, and David K. Duvenaud. Sticking the landing: An asymptotically zero-variance gradient estimator for variational inference. Advances in Neural Information Processing Systems, 2017.
-
(2017)
Advances in Neural Information Processing Systems
-
-
Roeder, G.1
Wu, Y.2
Duvenaud, D.K.3
-
27
-
-
84969963490
-
Trust region policy optimization
-
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, and Pieter Abbeel. Trust region policy optimization. In International Conference on Machine Learning, 2015.
-
(2015)
International Conference on Machine Learning
-
-
Schulman, J.1
Levine, S.2
Moritz, P.3
Jordan, M.I.4
Abbeel, P.5
-
28
-
-
85083954383
-
High-dimensional continuous control using generalized advantage estimation
-
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. International Conference of Learning Representations (ICLR), 2016.
-
(2016)
International Conference of Learning Representations (ICLR)
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
29
-
-
85064820904
-
Proximal policy optimization algorithms
-
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. Advances in Neural Information Processing Systems, 2017.
-
(2017)
Advances in Neural Information Processing Systems
-
-
Schulman, J.1
Wolski, F.2
Dhariwal, P.3
Radford, A.4
Klimov, O.5
-
31
-
-
84919793697
-
Deterministic policy gradient algorithms
-
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In Proceedings of the 31st International Conference on Machine Learning (ICML), pp. 387–395, 2014.
-
(2014)
Proceedings of the 31st International Conference on Machine Learning (ICML)
, pp. 387-395
-
-
Silver, D.1
Lever, G.2
Heess, N.3
Degris, T.4
Wierstra, D.5
Riedmiller, M.6
-
32
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
33
-
-
85031918331
-
Mastering the game of go without human knowledge
-
Oct
-
David Silver, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang, Arthur Guez, Thomas Hubert, Lucas Baker, Matthew Lai, Adrian Bolton, Yutian Chen, Timothy Lillicrap, Fan Hui, Laurent Sifre, George van den Driessche, Thore Graepel, and Demis Hassabis. Mastering the game of go without human knowledge. Nature, 550(7676):354–359, Oct 2017. ISSN 0028-0836.
-
(2017)
Nature
, vol.550
, Issue.7676
, pp. 354-359
-
-
Silver, D.1
Schrittwieser, J.2
Simonyan, K.3
Antonoglou, I.4
Huang, A.5
Guez, A.6
Hubert, T.7
Baker, L.8
Lai, M.9
Bolton, A.10
Chen, Y.11
Lillicrap, T.12
Hui, F.13
Sifre, L.14
Van Den Driessche, G.15
Graepel, T.16
Hassabis, D.17
-
34
-
-
0003722779
-
Approximate computation of expectations
-
Charles Stein. Approximate computation of expectations. Lecture Notes-Monograph Series, 7: i–164, 1986.
-
(1986)
Lecture Notes-Monograph Series
, vol.7
, pp. 164
-
-
Stein, C.1
-
35
-
-
0004102479
-
-
MIT Press, Cambridge, MA, USA, 1st edition
-
Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998. ISBN 0262193981.
-
(1998)
Introduction to Reinforcement Learning
-
-
Sutton, R.S.1
Barto, A.G.2
-
37
-
-
84872292044
-
MujoCo: A physics engine for model-based control
-
IEEE
-
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
38
-
-
85046959617
-
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
-
George Tucker, Andriy Mnih, Chris J Maddison, and Jascha Sohl-Dickstein. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. Advances in Neural Information Processing Systems, 2017.
-
(2017)
Advances in Neural Information Processing Systems
-
-
Tucker, G.1
Mnih, A.2
Maddison, C.J.3
Sohl-Dickstein, J.4
-
40
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
-
41
-
-
85083951478
-
Variance reduction for policy gradient with action-dependent factorized baselines
-
Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, and Pieter Abbeel. Variance reduction for policy gradient with action-dependent factorized baselines. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=H1tSsb-AW.
-
(2018)
International Conference on Learning Representations
-
-
Wu, C.1
Rajeswaran, A.2
Duan, Y.3
Kumar, V.4
Bayen, A.M.5
Kakade, S.6
Mordatch, I.7
Abbeel, P.8
|