SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 2017-December, Issue , 2017, Pages 5280-5289

Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation

(5) Wu, Yuhuai a Mansimov, Elman b Liao, Shun a Grosse, Roger a Ba, Jimmy a

a UNIVERSITY OF TORONTO (Canada)

b NEW YORK UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

CURVE FITTING; GRADIENT METHODS; REINFORCEMENT LEARNING;

ACTOR-CRITIC METHODS; CONTINUOUS CONTROL; CONTINUOUS DOMAIN; DISCRETE CONTROL; NATURAL GRADIENT METHODS; NON-TRIVIAL TASKS; STATE OF THE ART; TRUST-REGION METHODS;

DEEP LEARNING;

EID: 85046992971 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (501)

References (28)

1
- 0000396062
- Natural gradient works efficiently in learning
- S. I. Amari. Natural gradient works efficiently in learning. Neural Computation, 10(2): 251-276, 1998.
- (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
- Amari, S.I.¹

2
- 85057316160
- Distributed second-order optimization using kronecker-factored approximations
- J. Ba, R. Grosse, and J. Martens. Distributed second-order optimization using Kronecker-factored approximations. In ICLR, 2017.
- (2017) ICLR
- Ba, J.¹ Grosse, R.² Martens, J.³

3
- 84858765598
- Covariant policy search
- J. A. Bagnell and J. G. Schneider. Covariant policy search. In IJCAI, 2003.
- (2003) IJCAI
- Bagnell, J.A.¹ Schneider, J.G.²

4
- 84879976780
- The arcade learning environment: An evaluation platform for general agents
- M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 47: 253-279, 2013.
- (2013) Journal of Artificial Intelligence Research , vol.47 , pp. 253-279
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

5
- 85015444377
- G. Brockman, V. Cheung, L. Pettersson, J. Schneider, J. Schulman, J. Tang, and W. Zaremba. OpenAI Gym. arXiv preprint arXiv: 1606.01540, 2016.
- (2016) OpenAI Gym
- Brockman, G.¹ Cheung, V.² Pettersson, L.³ Schneider, J.⁴ Schulman, J.⁵ Tang, J.⁶ Zaremba, W.⁷

6
- 84998893215
- A kronecker-factored approximate fisher matrix for convolutional layers
- R. Grosse and J. Martens. A Kronecker-factored approximate Fisher matrix for convolutional layers. In ICML, 2016.
- (2016) ICML
- Grosse, R.¹ Martens, J.²

7
- 85041942380
- Q-prop: Sample-efficient policy gradient with an off-policy critic
- S. Gu, T. Lillicrap, Z. Ghahramani, R. E. Turner, and S. Levine. Q-prop: Sample-efficient policy gradient with an off-policy critic. In ICLR, 2017.
- (2017) ICLR
- Gu, S.¹ Lillicrap, T.² Ghahramani, Z.³ Turner, R.E.⁴ Levine, S.⁵

8
- 85044446086
- N. Heess, D. TB, S. Sriram, J. Lemmon, J. Merel, G. Wayne, Y. Tassa, T. Erez, Z. Wang, S. M. A. Eslami, M. Riedmiller, and D. Silver. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv: 1707.02286, 2017.
- (2017) Emergence of Locomotion Behaviours in Rich Environments
- Heess, N.¹ Tb, D.² Sriram, S.³ Lemmon, J.⁴ Merel, J.⁵ Wayne, G.⁶ Tassa, Y.⁷ Erez, T.⁸ Wang, Z.⁹ Eslami, S.M.A.¹⁰ Riedmiller, M.¹¹ Silver, D.¹²

9
- 85088229768
- Reinforcement learning with unsupervised auxiliary tasks
- M. Jaderberg, V. Mnih, W. M. Czarnecki, T. Schaul, J. Z. Leibo, D. Silver, and K. Kavukcuoglu. Reinforcement learning with unsupervised auxiliary tasks. In ICLR, 2017.
- (2017) ICLR
- Jaderberg, M.¹ Mnih, V.² Czarnecki, W.M.³ Schaul, T.⁴ Leibo, J.Z.⁵ Silver, D.⁶ Kavukcuoglu, K.⁷

10
- 84898930479
- A natural policy gradient
- S. Kakade. A natural policy gradient. In Advances in Neural Information Processing Systems, 2002.
- (2002) Advances in Neural Information Processing Systems
- Kakade, S.¹

11
- 85083951076
- Adam: A method for stochastic optimization
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.
- (2015) ICLR
- Kingma, D.¹ Ba, J.²

12
- 85083953657
- Continuous control with deep reinforcement learning
- T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. In ICLR, 2016.
- (2016) ICLR
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

13
- 77956541496
- Deep learning via hessian-free optimization
- J. Martens. Deep learning via Hessian-free optimization. In ICML-10, 2010.
- (2010) ICML-10
- Martens, J.¹

14
- 84969971072
- J. Martens. New insights and perspectives on the natural gradient method. arXiv preprint arXiv: 1412.1193, 2014.
- (2014) New Insights and Perspectives on the Natural Gradient Method
- Martens, J.¹

15
- 84969988426
- Optimizing neural networks with kronecker-factored approximate curvature
- J. Martens and R. Grosse. Optimizing neural networks with kronecker-factored approximate curvature. In ICML, 2015.
- (2015) ICML
- Martens, J.¹ Grosse, R.²

16
- 84924051598
- Human-level control through deep reinforcement learning
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

17
- 84999036937
- Asynchronous methods for deep reinforcement learning
- V. Mnih, A. Puigdomenech Badia, M. Mirza, A. Graves, T. P. Lillicrap, T. Harley, D. Silver, and K. Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In ICML, 2016.
- (2016) ICML
- Mnih, V.¹ Puigdomenech Badia, A.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.P.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

18
- 0003982971
- Springer
- J. Nocedal and S. Wright. Numerical Optimization. Springer, 2006.
- (2006) Numerical Optimization
- Nocedal, J.¹ Wright, S.²

19
- 40649106649
- Natural actor-critic
- J. Peters and S. Schaal. Natural actor-critic. Neurocomputing, 71(7-9): 1180-1190, 2008.
- (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
- Peters, J.¹ Schaal, S.²

20
- 0036631778
- Fast curvature matrix-vector products for second-order gradient descent
- N. N. Schraudolph. Fast curvature matrix-vector products for second-order gradient descent. Neural Computation, 2002.
- (2002) Neural Computation
- Schraudolph, N.N.¹

21
- 84969963490
- Trust region policy optimization
- J. Schulman, S. Levine, P. Abbeel, M. I. Jordan, and P. Moritz. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML), 2015.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML)
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.I.⁴ Moritz, P.⁵

22
- 85083954383
- High-dimensional continuous control using generalized advantage estimation
- J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
- (2016) Proceedings of the International Conference on Learning Representations (ICLR)
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

23
- 85041194636
- J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv: 1707.06347, 2017.
- (2017) Proximal Policy Optimization Algorithms
- Schulman, J.¹ Wolski, F.² Dhariwal, P.³ Radford, A.⁴ Klimov, O.⁵

24
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587): 484-489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰ Dieleman, S.¹¹ Grewe, D.¹² Nham, J.¹³ Kalchbrenner, N.¹⁴ Sutskever, I.¹⁵ Lillicrap, T.¹⁶ Leach, M.¹⁷ Kavukcuoglu, K.¹⁸ Graepel, T.¹⁹ Hassabis, D.²⁰ more..

25
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- R. S. Sutton, D. A. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems 12, 2000.
- (2000) Advances in Neural Information Processing Systems 12
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.³ Mansour, Y.⁴

26
- 84872292044
- MuJoCo: A physics engine for model-based control
- E. Todorov, T. Erez, and Y. Tassa. MuJoCo: A physics engine for model-based control. IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
- (2012) IEEE/RSJ International Conference on Intelligent Robots and Systems
- Todorov, E.¹ Erez, T.² Tassa, Y.³

27
- 85031087674
- Sample efficient actor-critic with experience replay
- Z. Wang, V. Bapst, N. Heess, V. Mnih, R. Munos, K. Kavukcuoglu, and N. de Freitas. Sample efficient actor-critic with experience replay. In ICLR, 2016.
- (2016) ICLR
- Wang, Z.¹ Bapst, V.² Heess, N.³ Mnih, V.⁴ Munos, R.⁵ Kavukcuoglu, K.⁶ De Freitas, N.⁷

28
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3): 229-256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 229-256
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.