SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 2017-December, Issue , 2017, Pages 3847-3856

Interpolated policy gradient: Merging on-policy and off-policy gradient estimation for deep reinforcement learning

(6) Gu, Shixiang a Lillicrap, Timothy b Ghahramani, Zoubin a Turner, Richard E a Schölkopf, Bernhard a Levine, Sergey c

a UNIVERSITY OF CAMBRIDGE (United Kingdom)

b DEEPMIND (United Kingdom)

c UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

MERGING; REINFORCEMENT LEARNING;

CONTINUOUS CONTROL; CONTROL VARIATES; EMPIRICAL - COMPARISONS; EMPIRICAL PERFORMANCE; PERFORMANCE BOUNDS; REINFORCEMENT LEARNING METHOD; STATE OF THE ART; THEORETICAL GUARANTEES;

DEEP LEARNING;

EID: 85047014445 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (134)

References (31)

1
- 84858765598
- Covariant policy search
- Bagnell, J Andrew and Schneider, Jeff. Covariant policy search. IJCAI, 2003.
- (2003) IJCAI
- Bagnell, J.A.¹ Schneider, J.²

2
- 85015444377
- Brockman, Greg, Cheung, Vicki, Pettersson, Ludwig, Schneider, Jonas, Schulman, John, Tang, Jie, and Zaremba, Wojciech. Openai gym. arXiv preprint arXiv:1606.01540 2016.
- (2016) Openai Gym
- Brockman, G.¹ Cheung, V.² Pettersson, L.³ Schneider, J.⁴ Schulman, J.⁵ Tang, J.⁶ Zaremba, W.⁷

3
- 84925072591
- Degris, Thomas, White, Martha, and Sutton, Richard S. Off-policy actor-critic. arXiv preprint arXiv:1205.4839 2012.
- (2012) Off-policy Actor-critic
- Degris, T.¹ White, M.² Sutton, R.S.³

4
- 84999018287
- Benchmarking deep reinforcement learning for continuous control
- Duan, Yan, Chen, Xi, Houthooft, Rein, Schulman, John, and Abbeel, Pieter. Benchmarking deep reinforcement learning for continuous control. International Conference on Machine Learning (ICML), 2016.
- (2016) International Conference on Machine Learning (ICML)
- Duan, Y.¹ Chen, X.² Houthooft, R.³ Schulman, J.⁴ Abbeel, P.⁵

5
- 85041942380
- Q-prop: Sample-efficient policy gradient with an off-policy critic
- Gu, Shixiang, Lillicrap, Timothy, Ghahramani, Zoubin, Turner, Richard E, and Levine, Sergey. Q-prop: Sample-efficient policy gradient with an off-policy critic. ICLR, 2017.
- (2017) ICLR
- Gu, S.¹ Lillicrap, T.² Ghahramani, Z.³ Turner, R.E.⁴ Levine, S.⁵

6
- 84965103751
- Learning continuous control policies by stochastic value gradients
- Heess, Nicolas, Wayne, Gregory, Silver, David, Lillicrap, Tim, Erez, Tom, and Tassa, Yuval. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, pp. 2944-2952, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 2944-2952
- Heess, N.¹ Wayne, G.² Silver, D.³ Lillicrap, T.⁴ Erez, T.⁵ Tassa, Y.⁶

7
- 85046992284
- Doubly robust off-policy value evaluation for reinforcement learning
- Jiang, Nan and Li, Lihong. Doubly robust off-policy value evaluation for reinforcement learning. In International Conference on Machine Learning, pp. 652-661, 2016.
- (2016) International Conference on Machine Learning , pp. 652-661
- Jiang, N.¹ Li, L.²

8
- 85161982655
- On a connection between importance sampling and the likelihood ratio policy gradient
- Jie, Tang and Abbeel, Pieter. On a connection between importance sampling and the likelihood ratio policy gradient. In Advances in Neural Information Processing Systems, pp. 1000-1008, 2010.
- (2010) Advances in Neural Information Processing Systems , pp. 1000-1008
- Jie, T.¹ Abbeel, P.²

9
- 1942514728
- Approximately optimal approximate reinforcement learning
- Kakade, Sham and Langford, John. Approximately optimal approximate reinforcement learning. In International Conference on Machine Learning (ICML), Volume 2, pp. 267-274, 2002.
- (2002) International Conference on Machine Learning (ICML) , vol.2 , pp. 267-274
- Kakade, S.¹ Langford, J.²

10
- 84897529781
- Guided policy search
- Levine, Sergey and Koltun, Vladlen. Guided policy search. In International Conference on Machine Learning (ICML), pp. 1-9, 2013.
- (2013) International Conference on Machine Learning (ICML) , pp. 1-9
- Levine, S.¹ Koltun, V.²

11
- 84979924150
- End-to-end training of deep visuomotor policies
- Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39):1-40, 2016.
- (2016) Journal of Machine Learning Research , vol.17 , Issue.39 , pp. 1-40
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

12
- 85083953657
- Continuous control with deep reinforcement learning
- Lillicrap, Timothy P, Hunt, Jonathan J, Pritzel, Alexander, Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David, and Wierstra, Daan. Continuous control with deep reinforcement learning. ICLR, 2016.
- (2016) ICLR
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

13
- 84937883130
- Weighted importance sampling for off-policy learning with linear function approximation
- Mahmood, A Rupam, van Hasselt, Hado P, and Sutton, Richard S. Weighted importance sampling for off-policy learning with linear function approximation. In Advances in Neural Information Processing Systems, pp. 3014-3022, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 3014-3022
- Mahmood, A.¹ Rupam Van, H.² Hado, P.³ Sutton, R.S.⁴

14
- 84924051598
- Humanlevel control through deep reinforcement learning
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Humanlevel control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

15
- 85047001601
- Munos, Rémi, Stepleton, Tom, Harutyunyan, Anna, and Bellemare, Marc G. Safe and efficient off-policy reinforcement learning. arXiv preprint arXiv:1606.02647 2016.
- (2016) Safe and Efficient Off-policy Reinforcement Learning
- Munos, R.¹ Stepleton, T.² Harutyunyan, A.³ Bellemare, M.G.⁴

16
- 85088228567
- Pgq: Combining policy gradient and q-learning
- O'Donoghue, Brendan, Munos, Remi, Kavukcuoglu, Koray, and Mnih, Volodymyr. Pgq: Combining policy gradient and q-learning. ICLR, 2017.
- (2017) ICLR
- O'Donoghue, B.¹ Munos, R.² Kavukcuoglu, K.³ Mnih, V.⁴

17
- 18544382314
- Learning from scarce experience
- Peshkin, Leonid and Shelton, Christian R. Learning from scarce experience. In Proceedings of the Nineteenth International Conference on Machine Learning, 2002.
- (2002) Proceedings of the Nineteenth International Conference on Machine Learning
- Peshkin, L.¹ Shelton, C.R.²

18
- 85167411371
- Relative entropy policy search
- Peters, Jan, Mülling, Katharina, and Altun, Yasemin. Relative entropy policy search. In AAAI. Atlanta, 2010.
- (2010) AAAI. Atlanta
- Peters, J.¹ Mülling, K.² Altun, Y.³

19
- 0242393653
- Computer Science Department Faculty Publication Series
- Precup, Doina. Eligibility traces for off-policy policy evaluation. Computer Science Department Faculty Publication Series, pp. 80, 2000.
- (2000) Eligibility Traces for Off-policy Policy Evaluation , pp. 80
- Precup, D.¹

20
- 33646398129
- Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method
- Springer
- Riedmiller, Martin. Neural fitted q iteration-first experiences with a data efficient neural reinforcement learning method. In European Conference on Machine Learning, pp. 317-328. Springer, 2005.
- (2005) European Conference on Machine Learning , pp. 317-328
- Riedmiller, M.¹

21
- 0004020933
- Burlington, MA: Elsevier
- Ross, Sheldon M. Simulation. Burlington, MA: Elsevier, 2006.
- (2006) Simulation
- Ross, S.M.¹

22
- 84969963490
- Trust region policy optimization
- Schulman, John, Levine, Sergey, Abbeel, Pieter, Jordan, Michael I, and Moritz, Philipp. Trust region policy optimization. In ICML, pp. 1889-1897, 2015.
- (2015) ICML , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.I.⁴ Moritz, P.⁵

23
- 85083954383
- High-dimensional continuous control using generalized advantage estimation
- Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. High-dimensional continuous control using generalized advantage estimation. International Conference on Learning Representations (ICLR), 2016.
- (2016) International Conference on Learning Representations (ICLR)
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

24
- 84919793697
- Deterministic policy gradient algorithms
- Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In International Conference on Machine Learning (ICML), 2014.
- (2014) International Conference on Machine Learning (ICML)
- Silver, D.¹ Lever, G.² Heess, N.³ Degris, T.⁴ Wierstra, D.⁵ Riedmiller, M.⁶

25
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- Silver, David, Huang, Aja, Maddison, Chris J, Guez, Arthur, Sifre, Laurent, Van Den Driessche, George, Schrittwieser, Julian, Antonoglou, Ioannis, Panneershelvam, Veda, Lanctot, Marc, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

26
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton, Richard S, McAllester, David A, Singh, Satinder P, Mansour, Yishay, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS), Volume 99, pp. 1057-1063, 1999.
- (1999) Advances in Neural Information Processing Systems (NIPS) , vol.99 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

27
- 85035116867
- Bias in natural actor-critic algorithms
- Thomas, Philip. Bias in natural actor-critic algorithms. In ICML, pp. 441-448, 2014.
- (2014) ICML , pp. 441-448
- Thomas, P.¹

28
- 85018438849
- Data-efficient off-policy policy evaluation for reinforcement learning
- Thomas, Philip and Brunskill, Emma. Data-efficient off-policy policy evaluation for reinforcement learning. In International Conference on Machine Learning, pp. 2139-2148, 2016.
- (2016) International Conference on Machine Learning , pp. 2139-2148
- Thomas, P.¹ Brunskill, E.²

29
- 84872292044
- Mujoco: A physics engine for model-based control
- IEEE
- Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5026-5033. IEEE, 2012.
- (2012) 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

30
- 85031087674
- Sample efficient actor-critic with experience replay
- Wang, Ziyu, Bapst, Victor, Heess, Nicolas, Mnih, Volodymyr, Munos, Remi, Kavukcuoglu, Koray, and de Freitas, Nando. Sample efficient actor-critic with experience replay. ICLR, 2017.
- (2017) ICLR
- Wang, Z.¹ Bapst, V.² Heess, N.³ Mnih, V.⁴ Munos, R.⁵ Kavukcuoglu, K.⁶ De Freitas, N.⁷

31
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.