SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Q-PrOP: Sample-efficient policy gradient with an off-policy critic

(5) Gu, Shixiang a,b,c Lillicrap, Timothy d Ghahramani, Zoubin a,f Turner, Richard E a Levine, Sergey c,e

a UNIVERSITY OF CAMBRIDGE (United Kingdom)

b MAX PLANCK INSTITUTE FOR INTELLIGENT SYSTEMS (Germany)

c GOOGLE INC (United States)

d DEEPMIND (United Kingdom)

e UNIVERSITY OF CALIFORNIA (United States)

f Uber AI Labs (United States)

Author keywords

[No Author keywords available]

Indexed keywords

EFFICIENCY; GRADIENT METHODS; LEARNING ALGORITHMS; MONTE CARLO METHODS; REINFORCEMENT LEARNING;

CONTINUOUS CONTROL; MODEL-FREE ALGORITHMS; POLICY GRADIENT METHODS; POLICY OPTIMIZATION; REINFORCEMENT LEARNING METHOD; SAMPLE COMPLEXITY; SIMULATED DOMAINS; TAYLOR EXPANSIONS;

DEEP LEARNING;

EID: 85041942380 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (175)

References (38)

1
- 85047002128
- arXiv preprint
- Brandon Amos, Lei Xu, and J Zico Kolter. Input convex neural networks. arXiv preprint arXiv:1609.07152, 2016.
- (2016) Input Convex Neural Networks
- Amos, B.¹ Xu, L.² Zico Kolter, J.³

2
- 0030691430
- A comparison of direct and model-based reinforcement learning
- In Citeseer
- Christopher G Atkeson and Juan Carlos Santamaria. A comparison of direct and model-based reinforcement learning. In In International Conference on Robotics and Automation. Citeseer, 1997.
- (1997) International Conference on Robotics and Automation
- Atkeson, C.G.¹ Santamaria, J.C.²

3
- 85015444377
- arXiv preprint
- Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
- (2016) Openai Gym
- Brockman, G.¹ Cheung, V.² Pettersson, L.³ Schneider, J.⁴ Schulman, J.⁵ Tang, J.⁶ Zaremba, W.⁷

4
- 80053441894
- PiLCO: A model-based and data-efficient approach to policy search
- Marc Deisenroth and Carl E Rasmussen. Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465-472, 2011.
- (2011) Proceedings of the 28th International Conference on Machine Learning (ICML-11) , pp. 465-472
- Deisenroth, M.¹ Rasmussen, C.E.²

5
- 84999018287
- Benchmarking deep reinforcement learning for continuous control
- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. International Conference on Machine Learning (ICML), 2016.
- (2016) International Conference on Machine Learning (ICML)
- Duan, Y.¹ Chen, X.² Houthooft, R.³ Schulman, J.⁴ Abbeel, P.⁵

6
- 84998747656
- arXiv preprint
- Miroslav Dudík, John Langford, and Lihong Li. Doubly robust policy evaluation and learning. arXiv preprint arXiv:1103.4601, 2011.
- (2011) Doubly Robust Policy Evaluation and Learning
- Dudík, M.¹ Langford, J.² Li, L.³

7
- 84897694817
- Variance reduction techniques for gradient estimates in reinforcement learning
- Nov
- Evan Greensmith, Peter L Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research, 5(Nov):1471-1530, 2004.
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.L.² Baxter, J.³

8
- 85083953202
- Muprop: Unbiased backpropagation for stochastic neural networks
- Shixiang Gu, Sergey Levine, Ilya Sutskever, and Andriy Mnih. Muprop: Unbiased backpropagation for stochastic neural networks. International Conference on Learning Representations (ICLR), 2016a.
- (2016) International Conference on Learning Representations (ICLR)
- Gu, S.¹ Levine, S.² Sutskever, I.³ Mnih, A.⁴

9
- 84998579328
- Continuous deep q-learning with model-based acceleration
- Shixiang Gu, Tim Lillicrap, Ilya Sutskever, and Sergey Levine. Continuous deep q-learning with model-based acceleration. In International Conference on Machine Learning (ICML), 2016b.
- (2016) International Conference on Machine Learning (ICML)
- Gu, S.¹ Lillicrap, T.² Sutskever, I.³ Levine, S.⁴

10
- 85161998941
- Double q-learning
- Hado V Hasselt. Double q-learning. In Advances in Neural Information Processing Systems, pp. 2613-2621, 2010.
- (2010) Advances in Neural Information Processing Systems , pp. 2613-2621
- Hasselt, H.V.¹

11
- 33646243319
- A natural policy gradient
- Sham Kakade. A natural policy gradient. In NIPS, volume 14, pp. 1531-1538, 2001.
- (2001) NIPS , vol.14 , pp. 1531-1538
- Kakade, S.¹

12
- 84941620184
- arXiv preprint
- Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

13
- 85030997365
- Guy Lever. Deterministic policy gradient algorithms. 2014.
- (2014) Deterministic Policy Gradient Algorithms
- Lever, G.¹

14
- 84897529781
- Guided policy search
- Sergey Levine and Vladlen Koltun. Guided policy search. In International Conference on Machine Learning (ICML), pp. 1-9, 2013.
- (2013) International Conference on Machine Learning (ICML) , pp. 1-9
- Levine, S.¹ Koltun, V.²

15
- 85083953657
- Continuous control with deep reinforcement learning
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. International Conference on Learning Representations (ICLR), 2016.
- (2016) International Conference on Learning Representations (ICLR)
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

16
- 84937883130
- Weighted importance sampling for off-policy learning with linear function approximation
- A Rupam Mahmood, Hado P van Hasselt, and Richard S Sutton. Weighted importance sampling for off-policy learning with linear function approximation. In Advances in Neural Information Processing Systems, pp. 3014-3022, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 3014-3022
- Rupam Mahmood, A.¹ Van Hasselt, H.P.² Sutton, R.S.³

17
- 84919786239
- Neural variational inference and learning in belief networks
- Andriy Mnih and Karol Gregor. Neural variational inference and learning in belief networks. International Conference on Machine Learning (ICML), 2014.
- (2014) International Conference on Machine Learning (ICML)
- Mnih, A.¹ Gregor, K.²

18
- 84924051598
- Human-level control through deep reinforcement learning
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

19
- 84999036937
- Asynchronous methods for deep reinforcement learning
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy P Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning (ICML), 2016.
- (2016) International Conference on Machine Learning (ICML)
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.P.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

20
- 85047001601
- arXiv preprint
- Rémi Munos, Tom Stepleton, Anna Harutyunyan, and Marc G Bellemare. Safe and efficient off-policy reinforcement learning. arXiv preprint arXiv:1606.02647, 2016.
- (2016) Safe and Efficient Off-Policy Reinforcement Learning
- Munos, R.¹ Stepleton, T.² Harutyunyan, A.³ Bellemare, M.G.⁴

21
- 84867133463
- Variational Bayesian inference with stochastic search
- John Paisley, David Blei, and Michael Jordan. Variational bayesian inference with stochastic search. International Conference on Machine Learning (ICML), 2012.
- (2012) International Conference on Machine Learning (ICML)
- Paisley, J.¹ Blei, D.² Jordan, M.³

22
- 34250635407
- Policy gradient methods for robotics
- Jan Peters and Stefan Schaal. Policy gradient methods for robotics. In International Conference on Intelligent Robots and Systems (IROS), pp. 2219-2225. IEEE, 2006.
- (2006) International Conference on Intelligent Robots and Systems (IROS) , pp. 2219-2225
- Peters, J.¹ Schaal, S.²

23
- 85167411371
- Relative entropy policy search
- Atlanta
- Jan Peters, Katharina Mülling, and Yasemin Altun. Relative entropy policy search. In AAAI. Atlanta, 2010.
- (2010) AAAI
- Peters, J.¹ Mülling, K.² Altun, Y.³

24
- 0242393653
- Eligibility traces for off-policy policy evaluation
- Doina Precup. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series, pp. 80, 2000.
- (2000) Computer Science Department Faculty Publication Series , pp. 80
- Precup, D.¹

25
- 0004020933
- Burlington, MA: Elsevier
- Sheldon M Ross. Simulation. Burlington, MA: Elsevier, 2006.
- (2006) Simulation
- Ross, S.M.¹

26
- 84969963490
- Trust region policy optimization
- John Schulman, Sergey Levine, Pieter Abbeel, Michael I. Jordan, and Philipp Moritz. Trust region policy optimization. In International Conference on Machine Learning (ICML), pp. 1889-1897, 2015.
- (2015) International Conference on Machine Learning (ICML) , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.I.⁴ Moritz, P.⁵

27
- 85083954383
- High-dimensional continuous control using generalized advantage estimation
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. International Conference on Learning Representations (ICLR), 2016.
- (2016) International Conference on Learning Representations (ICLR)
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

28
- 84919793697
- Deterministic policy gradient algorithms
- David Silver, Guy Lever, Nicolas Heess, Thomas Degris, Daan Wierstra, and Martin Riedmiller. Deterministic policy gradient algorithms. In International Conference on Machine Learning (ICML), 2014.
- (2014) International Conference on Machine Learning (ICML)
- Silver, D.¹ Lever, G.² Heess, N.³ Degris, T.⁴ Wierstra, D.⁵ Riedmiller, M.⁶

29
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

30
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In International Conference on Machine Learning (ICML), pp. 216-224, 1990.
- (1990) International Conference on Machine Learning (ICML) , pp. 216-224
- Sutton, R.S.¹

31
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS), volume 99, pp. 1057-1063, 1999.
- (1999) Advances in Neural Information Processing Systems (NIPS) , vol.99 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

32
- 70049090437
- Fast gradient-descent methods for temporal-difference learning with linear function approximation
- Richard S Sutton, Hamid Reza Maei, Doina Precup, Shalabh Bhatnagar, David Silver, Csaba Szepesvári, and Eric Wiewiora. Fast gradient-descent methods for temporal-difference learning with linear function approximation. In Proceedings of the 26th Annual International Conference on Machine Learning, pp. 993-1000. ACM, 2009.
- (2009) Proceedings of the 26th Annual International Conference on Machine Learning , pp. 993-1000
- Sutton, R.S.¹ Maei, H.R.² Precup, D.³ Bhatnagar, S.⁴ Silver, D.⁵ Szepesvári, C.⁶ Wiewiora, E.⁷

33
- 85014295239
- An emphatic approach to the problem of off-policy temporal-difference learning
- Richard S Sutton, A Rupam Mahmood, and Martha White. An emphatic approach to the problem of off-policy temporal-difference learning. The Journal of Machine Learning Research, 2015.
- (2015) The Journal of Machine Learning Research
- Sutton, R.S.¹ Rupam Mahmood, A.² White, M.³

34
- 85035116867
- Bias in natural actor-critic algorithms
- Philip Thomas. Bias in natural actor-critic algorithms. In ICML, pp. 441-448, 2014.
- (2014) ICML , pp. 441-448
- Thomas, P.¹

35
- 84872292044
- MujoCo: A physics engine for model-based control
- Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033. IEEE, 2012.
- (2012) 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

36
- 34249833101
- Q-learning
- Christopher JCH Watkins and Peter Dayan. Q-learning. Machine learning, 8(3-4):279-292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

37
- 21444437925
- The optimal reward baseline for gradient-based reinforcement learning
- Morgan Kaufmann Publishers Inc
- Lex Weaver and Nigel Tao. The optimal reward baseline for gradient-based reinforcement learning. In Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence, pp. 538-545. Morgan Kaufmann Publishers Inc., 2001.
- (2001) Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence , pp. 538-545
- Weaver, L.¹ Tao, N.²

38
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.