-
1
-
-
85015444377
-
-
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym, 2016.
-
(2016)
Openai Gym
-
-
Brockman, G.1
Cheung, V.2
Pettersson, L.3
Schneider, J.4
Schulman, J.5
Tang, J.6
Zaremba, W.7
-
3
-
-
85031100781
-
-
arXiv preprint
-
Shixiang Gu, Timothy Lillicrap, Zoubin Ghahramani, Richard E Turner, and Sergey Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. arXiv preprint arXiv:1611.02247, 2016.
-
(2016)
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
-
-
Gu, S.1
Lillicrap, T.2
Ghahramani, Z.3
Turner, R.E.4
Levine, S.5
-
4
-
-
85047014445
-
Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning
-
Shixiang Gu, Tim Lillicrap, Richard E Turner, Zoubin Ghahramani, Bernhard Schölkopf, and Sergey Levine. Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning. In Advances in Neural Information Processing Systems, pp. 3849–3858, 2017.
-
(2017)
Advances in Neural Information Processing Systems
, pp. 3849-3858
-
-
Gu, S.1
Lillicrap, T.2
Turner, R.E.3
Ghahramani, Z.4
Schölkopf, B.5
Levine, S.6
-
6
-
-
33751179655
-
An analysis of actor-critic algorithms using eligibility traces: Reinforcement learning with imperfect value functions
-
Hajime Kimura, Shigenobu Kobayashi, et al. An analysis of actor-critic algorithms using eligibility traces: reinforcement learning with imperfect value functions. Journal of Japanese Society for Artificial Intelligence, 15(2):267–275, 2000.
-
(2000)
Journal of Japanese Society for Artificial Intelligence
, vol.15
, Issue.2
, pp. 267-275
-
-
Kimura, H.1
Kobayashi, S.2
-
9
-
-
84949683101
-
Human-level concept learning through probabilistic program induction
-
Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 350(6266):1332–1338, 2015.
-
(2015)
Science
, vol.350
, Issue.6266
, pp. 1332-1338
-
-
Lake, B.M.1
Salakhutdinov, R.2
Tenenbaum, J.B.3
-
10
-
-
85064835943
-
-
arXiv preprint
-
Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, and Qiang Liu. Sample-efficient policy optimization with stein control variate. arXiv preprint arXiv:1710.11198, 2017.
-
(2017)
Sample-Efficient Policy Optimization with Stein Control Variate
-
-
Liu, H.1
Feng, Y.2
Mao, Y.3
Zhou, D.4
Peng, J.5
Liu, Q.6
-
15
-
-
85083936643
-
Reparameterization gradients through acceptance-rejection sampling algorithms
-
Christian Naesseth, Francisco Ruiz, Scott Linderman, and David Blei. Reparameterization gradients through acceptance-rejection sampling algorithms. In Artificial Intelligence and Statistics, pp. 489–498, 2017.
-
(2017)
Artificial Intelligence and Statistics
, pp. 489-498
-
-
Naesseth, C.1
Ruiz, F.2
Linderman, S.3
Blei, D.4
-
16
-
-
84971325632
-
Control functionals for monte carlo integration
-
Chris J Oates, Mark Girolami, and Nicolas Chopin. Control functionals for monte carlo integration. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 79(3):695–718, 2017.
-
(2017)
Journal of the Royal Statistical Society: Series B (Statistical Methodology)
, vol.79
, Issue.3
, pp. 695-718
-
-
Oates, C.J.1
Girolami, M.2
Chopin, N.3
-
22
-
-
0022471098
-
Learning representations by back-propagating errors
-
David E Rumelhart and Geoffrey E Hinton. Learning representations by back-propagating errors. Nature, 323:9, 1986.
-
(1986)
Nature
, vol.323
, pp. 9
-
-
Rumelhart, D.E.1
Hinton, G.E.2
-
24
-
-
84965157716
-
Gradient estimation using stochastic computation graphs
-
John Schulman, Nicolas Heess, Theophane Weber, and Pieter Abbeel. Gradient estimation using stochastic computation graphs. In Advances in Neural Information Processing Systems, pp. 3528–3536, 2015a.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 3528-3536
-
-
Schulman, J.1
Heess, N.2
Weber, T.3
Abbeel, P.4
-
25
-
-
84993963574
-
-
arXiv preprint
-
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
-
(2015)
High-Dimensional Continuous Control Using Generalized Advantage Estimation
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
28
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pp. 1057–1063, 2000.
-
(2000)
Advances in Neural Information Processing Systems
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
30
-
-
84872292044
-
MujoCo: A physics engine for model-based control
-
IEEE
-
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
31
-
-
85056157324
-
-
arXiv preprint
-
George Tucker, Andriy Mnih, Chris J Maddison, and Jascha Sohl-Dickstein. Rebar: Low-variance, unbiased gradient estimates for discrete latent variable models. arXiv preprint arXiv:1703.07370, 2017.
-
(2017)
Rebar: Low-Variance, Unbiased Gradient Estimates for Discrete Latent Variable Models
-
-
Tucker, G.1
Mnih, A.2
Maddison, C.J.3
Sohl-Dickstein, J.4
-
32
-
-
85083951605
-
-
George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E Turner, Zoubin Ghahramani, and Sergey Levine. The mirage of action-dependent baselines in reinforcement learning. 2018.
-
(2018)
The Mirage of Action-Dependent Baselines in Reinforcement Learning
-
-
Tucker, G.1
Bhupatiraju, S.2
Gu, S.3
Turner, R.E.4
Ghahramani, Z.5
Levine, S.6
-
33
-
-
84899819580
-
Natural evolution strategies
-
Daan Wierstra, Tom Schaul, Tobias Glasmachers, Yi Sun, Jan Peters, and Jürgen Schmidhuber. Natural evolution strategies. Journal of Machine Learning Research, 15(1):949–980, 2014.
-
(2014)
Journal of Machine Learning Research
, vol.15
, Issue.1
, pp. 949-980
-
-
Wierstra, D.1
Schaul, T.2
Glasmachers, T.3
Sun, Y.4
Peters, J.5
Schmidhuber, J.6
-
34
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
-
35
-
-
85083951478
-
Variance reduction for policy gradient with action-dependent factorized baselines
-
Cathy Wu, Aravind Rajeswaran, Yan Duan, Vikash Kumar, Alexandre M Bayen, Sham Kakade, Igor Mordatch, and Pieter Abbeel. Variance reduction for policy gradient with action-dependent factorized baselines. International Conference on Learning Representations, 2018.
-
(2018)
International Conference on Learning Representations
-
-
Wu, C.1
Rajeswaran, A.2
Duan, Y.3
Kumar, V.4
Bayen, A.M.5
Kakade, S.6
Mordatch, I.7
Abbeel, P.8
-
36
-
-
85046992971
-
Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation
-
Yuhuai Wu, Elman Mansimov, Shun Liao, Roger Grosse, and Jimmy Ba. Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In Advances in neural information processing systems, 2017.
-
(2017)
Advances in Neural Information Processing Systems
-
-
Wu, Y.1
Mansimov, E.2
Liao, S.3
Grosse, R.4
Ba, J.5
|