-
2
-
-
85015444377
-
-
Brockman, Greg, Cheung, Vicki, Pettersson, Ludwig, Schneider, Jonas, Schulman, John, Tang, Jie, and Zaremba, Wojciech. Openai gym. arXiv preprint arXiv:1606.01540 2016.
-
(2016)
Openai Gym
-
-
Brockman, G.1
Cheung, V.2
Pettersson, L.3
Schneider, J.4
Schulman, J.5
Tang, J.6
Zaremba, W.7
-
4
-
-
84999018287
-
Benchmarking deep reinforcement learning for continuous control
-
Duan, Yan, Chen, Xi, Houthooft, Rein, Schulman, John, and Abbeel, Pieter. Benchmarking deep reinforcement learning for continuous control. International Conference on Machine Learning (ICML), 2016.
-
(2016)
International Conference on Machine Learning (ICML)
-
-
Duan, Y.1
Chen, X.2
Houthooft, R.3
Schulman, J.4
Abbeel, P.5
-
5
-
-
85041942380
-
Q-prop: Sample-efficient policy gradient with an off-policy critic
-
Gu, Shixiang, Lillicrap, Timothy, Ghahramani, Zoubin, Turner, Richard E, and Levine, Sergey. Q-prop: Sample-efficient policy gradient with an off-policy critic. ICLR, 2017.
-
(2017)
ICLR
-
-
Gu, S.1
Lillicrap, T.2
Ghahramani, Z.3
Turner, R.E.4
Levine, S.5
-
6
-
-
84965103751
-
Learning continuous control policies by stochastic value gradients
-
Heess, Nicolas, Wayne, Gregory, Silver, David, Lillicrap, Tim, Erez, Tom, and Tassa, Yuval. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, pp. 2944-2952, 2015.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 2944-2952
-
-
Heess, N.1
Wayne, G.2
Silver, D.3
Lillicrap, T.4
Erez, T.5
Tassa, Y.6
-
7
-
-
85046992284
-
Doubly robust off-policy value evaluation for reinforcement learning
-
Jiang, Nan and Li, Lihong. Doubly robust off-policy value evaluation for reinforcement learning. In International Conference on Machine Learning, pp. 652-661, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 652-661
-
-
Jiang, N.1
Li, L.2
-
8
-
-
85161982655
-
On a connection between importance sampling and the likelihood ratio policy gradient
-
Jie, Tang and Abbeel, Pieter. On a connection between importance sampling and the likelihood ratio policy gradient. In Advances in Neural Information Processing Systems, pp. 1000-1008, 2010.
-
(2010)
Advances in Neural Information Processing Systems
, pp. 1000-1008
-
-
Jie, T.1
Abbeel, P.2
-
11
-
-
84979924150
-
End-to-end training of deep visuomotor policies
-
Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39):1-40, 2016.
-
(2016)
Journal of Machine Learning Research
, vol.17
, Issue.39
, pp. 1-40
-
-
Levine, S.1
Finn, C.2
Darrell, T.3
Abbeel, P.4
-
12
-
-
85083953657
-
Continuous control with deep reinforcement learning
-
Lillicrap, Timothy P, Hunt, Jonathan J, Pritzel, Alexander, Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David, and Wierstra, Daan. Continuous control with deep reinforcement learning. ICLR, 2016.
-
(2016)
ICLR
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
13
-
-
84937883130
-
Weighted importance sampling for off-policy learning with linear function approximation
-
Mahmood, A Rupam, van Hasselt, Hado P, and Sutton, Richard S. Weighted importance sampling for off-policy learning with linear function approximation. In Advances in Neural Information Processing Systems, pp. 3014-3022, 2014.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 3014-3022
-
-
Mahmood, A.1
Rupam Van, H.2
Hado, P.3
Sutton, R.S.4
-
14
-
-
84924051598
-
Humanlevel control through deep reinforcement learning
-
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Humanlevel control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
15
-
-
85047001601
-
-
Munos, Rémi, Stepleton, Tom, Harutyunyan, Anna, and Bellemare, Marc G. Safe and efficient off-policy reinforcement learning. arXiv preprint arXiv:1606.02647 2016.
-
(2016)
Safe and Efficient Off-policy Reinforcement Learning
-
-
Munos, R.1
Stepleton, T.2
Harutyunyan, A.3
Bellemare, M.G.4
-
16
-
-
85088228567
-
Pgq: Combining policy gradient and q-learning
-
O'Donoghue, Brendan, Munos, Remi, Kavukcuoglu, Koray, and Mnih, Volodymyr. Pgq: Combining policy gradient and q-learning. ICLR, 2017.
-
(2017)
ICLR
-
-
O'Donoghue, B.1
Munos, R.2
Kavukcuoglu, K.3
Mnih, V.4
-
20
-
-
33646398129
-
Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method
-
Springer
-
Riedmiller, Martin. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In European Conference on Machine Learning, pp. 317-328. Springer, 2005.
-
(2005)
European Conference on Machine Learning
, pp. 317-328
-
-
Riedmiller, M.1
-
21
-
-
0004020933
-
-
Burlington, MA: Elsevier
-
Ross, Sheldon M. Simulation. Burlington, MA: Elsevier, 2006.
-
(2006)
Simulation
-
-
Ross, S.M.1
-
22
-
-
84969963490
-
Trust region policy optimization
-
Schulman, John, Levine, Sergey, Abbeel, Pieter, Jordan, Michael I, and Moritz, Philipp. Trust region policy optimization. In ICML, pp. 1889-1897, 2015.
-
(2015)
ICML
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.I.4
Moritz, P.5
-
23
-
-
85083954383
-
High-dimensional continuous control using generalized advantage estimation
-
Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. High-dimensional continuous control using generalized advantage estimation. International Conference on Learning Representations (ICLR), 2016.
-
(2016)
International Conference on Learning Representations (ICLR)
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
24
-
-
84919793697
-
Deterministic policy gradient algorithms
-
Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In International Conference on Machine Learning (ICML), 2014.
-
(2014)
International Conference on Machine Learning (ICML)
-
-
Silver, D.1
Lever, G.2
Heess, N.3
Degris, T.4
Wierstra, D.5
Riedmiller, M.6
-
25
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
Silver, David, Huang, Aja, Maddison, Chris J, Guez, Arthur, Sifre, Laurent, Van Den Driessche, George, Schrittwieser, Julian, Antonoglou, Ioannis, Panneershelvam, Veda, Lanctot, Marc, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
26
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Sutton, Richard S, McAllester, David A, Singh, Satinder P, Mansour, Yishay, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS), Volume 99, pp. 1057-1063, 1999.
-
(1999)
Advances in Neural Information Processing Systems (NIPS)
, vol.99
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
27
-
-
85035116867
-
Bias in natural actor-critic algorithms
-
Thomas, Philip. Bias in natural actor-critic algorithms. In ICML, pp. 441-448, 2014.
-
(2014)
ICML
, pp. 441-448
-
-
Thomas, P.1
-
28
-
-
85018438849
-
Data-efficient off-policy policy evaluation for reinforcement learning
-
Thomas, Philip and Brunskill, Emma. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pp. 2139-2148, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 2139-2148
-
-
Thomas, P.1
Brunskill, E.2
-
29
-
-
84872292044
-
Mujoco: A physics engine for model-based control
-
IEEE
-
Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033. IEEE, 2012.
-
(2012)
2012 IEEE/RSJ International Conference on Intelligent Robots and Systems
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
30
-
-
85031087674
-
Sample efficient actor-critic with experience replay
-
Wang, Ziyu, Bapst, Victor, Heess, Nicolas, Mnih, Volodymyr, Munos, Remi, Kavukcuoglu, Koray, and de Freitas, Nando. Sample efficient actor-critic with experience replay. ICLR, 2017.
-
(2017)
ICLR
-
-
Wang, Z.1
Bapst, V.2
Heess, N.3
Mnih, V.4
Munos, R.5
Kavukcuoglu, K.6
De Freitas, N.7
-
31
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
|