-
1
-
-
85019241632
-
Benchmarking deep reinforcement learning for continuous control
-
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, pp. 1329–1338, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 1329-1338
-
-
Duan, Y.1
Chen, X.2
Houthooft, R.3
Schulman, J.4
Abbeel, P.5
-
2
-
-
85083950952
-
Backpropagation through the void: Optimizing control variates for black-box gradient estimation
-
Will Grathwohl, Dami Choi, Yuhuai Wu, Geoff Roeder, and David Duvenaud. Backpropagation through the void: Optimizing control variates for black-box gradient estimation. International Conference on Learning Representations (ICLR), 2018.
-
(2018)
International Conference on Learning Representations (ICLR)
-
-
Grathwohl, W.1
Choi, D.2
Wu, Y.3
Roeder, G.4
Duvenaud, D.5
-
4
-
-
85047014445
-
Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning
-
Shixiang Gu, Tim Lillicrap, Richard E Turner, Zoubin Ghahramani, Bernhard Schölkopf, and Sergey Levine. Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. In Advances in Neural Information Processing Systems, pp. 3849–3858, 2017a.
-
(2017)
Advances in Neural Information Processing Systems
, pp. 3849-3858
-
-
Gu, S.1
Lillicrap, T.2
Turner, R.E.3
Ghahramani, Z.4
Schölkopf, B.5
Levine, S.6
-
5
-
-
85041942380
-
Q-prop: Sample-efficient policy gradient with an off-policy critic
-
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, and Sergey Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. International Conference on Learning Representations (ICLR), 2017b.
-
(2017)
International Conference on Learning Representations (ICLR)
-
-
Gu, S.1
Lillicrap, T.2
Ghahramani, Z.3
Turner, R.E.4
Levine, S.5
-
6
-
-
85161982655
-
On a connection between importance sampling and the likelihood ratio policy gradient
-
Tang Jie and Pieter Abbeel. On a connection between importance sampling and the likelihood ratio policy gradient. In Advances in Neural Information Processing Systems, pp. 1000–1008, 2010.
-
(2010)
Advances in Neural Information Processing Systems
, pp. 1000-1008
-
-
Jie, T.1
Abbeel, P.2
-
9
-
-
85083952784
-
Action-dependent control variates for policy optimization via Stein identity
-
Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, and Qiang Liu. Action-dependent control variates for policy optimization via stein identity. International Conference on Learning Representations (ICLR), 2018.
-
(2018)
International Conference on Learning Representations (ICLR)
-
-
Liu, H.1
Feng, Y.2
Mao, Y.3
Zhou, D.4
Peng, J.5
Liu, Q.6
-
11
-
-
84971448181
-
Asynchronous methods for deep reinforcement learning
-
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 1928-1937
-
-
Mnih, V.1
Badia, A.P.2
Mirza, M.3
Graves, A.4
Lillicrap, T.5
Harley, T.6
Silver, D.7
Kavukcuoglu, K.8
-
15
-
-
84969963490
-
Trust region policy optimization
-
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning, pp. 1889–1897, 2015a.
-
(2015)
International Conference on Machine Learning
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.4
Moritz, P.5
-
16
-
-
84993963574
-
-
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
-
(2015)
High-Dimensional Continuous Control Using Generalized Advantage Estimation
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
17
-
-
85041194636
-
-
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
-
(2017)
Proximal Policy Optimization Algorithms
-
-
Schulman, J.1
Wolski, F.2
Dhariwal, P.3
Radford, A.4
Klimov, O.5
-
18
-
-
84919793697
-
Deterministic policy gradient algorithms
-
David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In ICML, 2014.
-
(2014)
ICML
-
-
Silver, D.1
Lever, G.2
Heess, N.3
Degris, T.4
Wierstra, D.5
Riedmiller, M.6
-
20
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems, pp. 1057–1063, 2000.
-
(2000)
Advances in Neural Information Processing Systems
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
24
-
-
84941874233
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Springer
-
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. In Reinforcement Learning, pp. 5–32. Springer, 1992.
-
(1992)
Reinforcement Learning
, pp. 5-32
-
-
Williams, R.J.1
-
25
-
-
85083951478
-
Variance reduction for policy gradient with action-dependent factorized baselines
-
Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, and Pieter Abbeel. Variance reduction for policy gradient with action-dependent factorized baselines. International Conference on Learning Representations (ICLR), 2018.
-
(2018)
International Conference on Learning Representations (ICLR)
-
-
Wu, C.1
Rajeswaran, A.2
Duan, Y.3
Kumar, V.4
Bayen, A.M.5
Kakade, S.6
Mordatch, I.7
Abbeel, P.8
|