SCOPUS 정보 검색 플랫폼

4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings

Volumn , Issue , 2016, Pages

Continuous control with deep reinforcement learning

(8) Lillicrap, Timothy P a Hunt, Jonathan J a Pritzel, Alexander a Heess, Nicolas a Erez, Tom a Tassa, Yuval a Silver, David a Wierstra, Daan a

a DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; MACHINE LEARNING; NETWORK ARCHITECTURE; REINFORCEMENT LEARNING;

CONTINUOUS ACTIONS; CONTINUOUS CONTROL; DEXTEROUS MANIPULATION; HYPER-PARAMETER; LEGGED LOCOMOTION; MODEL-FREE ALGORITHMS; PLANNING ALGORITHMS; POLICY GRADIENT;

DEEP LEARNING;

EID: 85083953657 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (4297)

References (30)

1
- 84965129387
- arXiv preprint
- Balduzzi, David and Ghifary, Muhammad. Compatible value gradients for reinforcement learning of continuous deep policies. arXiv preprint arXiv:1509.03005, 2015.
- (2015) Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies
- Balduzzi, D.¹ Ghifary, M.²

2
- 80053441894
- PiLCO: A model-based and data-efficient approach to policy search
- Deisenroth, Marc and Rasmussen, Carl E. Pilco: A model-based and data-efficient approach to policy search. In Proceedings of the 28th International Conference on machine learning (ICML-11), pp. 465–472, 2011.
- (2011) Proceedings of the 28th International Conference on Machine Learning (ICML-11) , pp. 465-472
- Deisenroth, M.¹ Rasmussen, C.E.²

3
- 84903590417
- A survey on policy search for robotics
- Deisenroth, Marc Peter, Neumann, Gerhard, Peters, Jan, et al. A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1-2):1–142, 2013.
- (2013) Foundations and Trends in Robotics , vol.2 , Issue.1-2 , pp. 1-142
- Deisenroth, M.P.¹ Neumann, G.² Peters, J.³

4
- 77953260848
- States versus rewards: Dis-sociable neural prediction error signals underlying model-based and model-free reinforcement learning
- Gläscher, Jan, Daw, Nathaniel, Dayan, Peter, and O’Doherty, John P. States versus rewards: dis-sociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66(4):585–595, 2010.
- (2010) Neuron , vol.66 , Issue.4 , pp. 585-595
- Gläscher, J.¹ Daw, N.² Dayan, P.³ O’Doherty, J.P.⁴

5
- 84862294866
- Deep sparse rectifier networks
- Glorot, Xavier, Bordes, Antoine, and Bengio, Yoshua. Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume, volume 15, pp. 315–323, 2011.
- (2011) Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP , vol.15 , pp. 315-323
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

6
- 79958779459
- Reinforcement learning in feedback control
- Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine learning, 84(1-2):137–169, 2011.
- (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 137-169
- Hafner, R.¹ Riedmiller, M.²

7
- 85161998941
- Double q-learning
- Hasselt, Hado V. Double q-learning. In Advances in Neural Information Processing Systems, pp. 2613–2621, 2010.
- (2010) Advances in Neural Information Processing Systems , pp. 2613-2621
- Hasselt, H.V.¹

8
- 84998919856
- Memory-based control with recurrent neural networks
- Heess, N., Hunt, J. J, Lillicrap, T. P, and Silver, D. Memory-based control with recurrent neural networks. NIPS Deep Reinforcement Learning Workshop (arXiv:1512.04455), 2015.
- (2015) NIPS Deep Reinforcement Learning Workshop
- Heess, N.¹ Hunt, J.J.² Lillicrap, T.P.³ Silver, D.⁴

9
- 84965103751
- Learning continuous control policies by stochastic value gradients
- Heess, Nicolas, Wayne, Gregory, Silver, David, Lillicrap, Tim, Erez, Tom, and Tassa, Yuval. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, pp. 2926–2934, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 2926-2934
- Heess, N.¹ Wayne, G.² Silver, D.³ Lillicrap, T.⁴ Erez, T.⁵ Tassa, Y.⁶

10
- 84964923476
- arXiv preprint
- Ioffe, Sergey and Szegedy, Christian. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Ioffe, S.¹ Szegedy, C.²

11
- 84941620184
- arXiv preprint
- Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

12
- 84905695541
- Evolving deep unsupervised convolutional networks for vision-based reinforcement learning
- Koutník, Jan, Schmidhuber, Jürgen, and Gomez, Faustino. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proceedings of the 2014 conference on Genetic and evolutionary computation, pp. 541–548. ACM, 2014a.
- (2014) Proceedings of the 2014 Conference on Genetic and Evolutionary Computation , pp. 541-548
- Koutník, J.¹ Schmidhuber, J.² Gomez, F.³

13
- 84959255008
- Online evolution of deep convolutional network for vision-based reinforcement learning
- Springer
- Koutník, Jan, Schmidhuber, Jürgen, and Gomez, Faustino. Online evolution of deep convolutional network for vision-based reinforcement learning. In From Animals to Animats 13, pp. 260–269. Springer, 2014b.
- (2014) From Animals to Animats , vol.13 , pp. 260-269
- Koutník, J.¹ Schmidhuber, J.² Gomez, F.³

14
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

15
- 84943767635
- arXiv preprint
- Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702, 2015.
- (2015) End-to-End Training of Deep Visuomotor Policies
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

16
- 84904867557
- arXiv preprint
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
- (2013) Playing Atari with Deep Reinforcement Learning
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.⁷

17
- 84924051598
- Human-level control through deep reinforcement learning
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

18
- 0031236002
- Adaptive critic designs
- Prokhorov, Danil V, Wunsch, Donald C, et al. Adaptive critic designs. Neural Networks, IEEE Transactions on, 8(5):997–1007, 1997.
- (1997) Neural Networks, IEEE Transactions on , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.V.¹ Wunsch, D.C.²

19
- 84965157716
- Gradient estimation using stochastic computation graphs
- Schulman, John, Heess, Nicolas, Weber, Theophane, and Abbeel, Pieter. Gradient estimation using stochastic computation graphs. In Advances in Neural Information Processing Systems, pp. 3510–3522, 2015a.
- (2015) Advances in Neural Information Processing Systems , pp. 3510-3522
- Schulman, J.¹ Heess, N.² Weber, T.³ Abbeel, P.⁴

20
- 84965149509
- arXiv preprint
- Schulman, John, Levine, Sergey, Moritz, Philipp, Jordan, Michael I, and Abbeel, Pieter. Trust region policy optimization. arXiv preprint arXiv:1502.05477, 2015b.
- (2015) Trust Region Policy Optimization
- Schulman, J.¹ Levine, S.² Moritz, P.³ Jordan, M.I.⁴ Abbeel, P.⁵

21
- 84919793697
- Deterministic policy gradient algorithms
- Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In ICML, 2014.
- (2014) ICML
- Silver, D.¹ Lever, G.² Heess, N.³ Degris, T.⁴ Wierstra, D.⁵ Riedmiller, M.⁶

22
- 84872363924
- Synthesis and stabilization of complex behaviors through online trajectory optimization
- Tassa, Yuval, Erez, Tom, and Todorov, Emanuel. Synthesis and stabilization of complex behaviors through online trajectory optimization. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 4906–4913. IEEE, 2012.
- (2012) Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on , pp. 4906-4913
- Tassa, Y.¹ Erez, T.² Todorov, E.³

23
- 23944452693
- A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems
- Todorov, Emanuel and Li, Weiwei. A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems. In American Control Conference, 2005. Proceedings of the 2005, pp. 300–306. IEEE, 2005.
- (2005) American Control Conference, 2005. Proceedings of the 2005 , pp. 300-306
- Todorov, E.¹ Li, W.²

24
- 84872292044
- MujoCo: A physics engine for model-based control
- Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
- (2012) Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

25
- 36149005118
- On the theory of the brownian motion
- Uhlenbeck, George E and Ornstein, Leonard S. On the theory of the brownian motion. Physical review, 36(5):823, 1930.
- (1930) Physical Review , vol.36 , Issue.5 , pp. 823
- Uhlenbeck, G.E.¹ Ornstein, L.S.²

26
- 84965104233
- arXiv preprint
- Wahlström, Niklas, Schön, Thomas B, and Deisenroth, Marc Peter. From pixels to torques: Policy learning with deep dynamical models. arXiv preprint arXiv:1502.02251, 2015.
- (2015) From Pixels to Torques: Policy Learning with Deep Dynamical Models
- Wahlström, N.¹ Schön, T.B.² Deisenroth, M.P.³

27
- 34249833101
- Q-learning
- Watkins, Christopher JCH and Dayan, Peter. Q-learning. Machine learning, 8(3-4):279–292, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

28
- 71749106087
- Real-time reinforcement learning by sequential actor–critics and experience replay
- Wawrzynski, ´ Paweł. Real-time reinforcement learning by sequential actor–critics and experience replay. Neural Networks, 22(10):1484–1497, 2009.
- (2009) Neural Networks , vol.22 , Issue.10 , pp. 1484-1497
- Wawrzynski, P.¹

29
- 85029148817
- Control policy with autocorrelated noise in reinforcement learning for robotics
- Wawrzynski, ´ Paweł. Control policy with autocorrelated noise in reinforcement learning for robotics. International Journal of Machine Learning and Computing, 5:91–95, 2015.
- (2015) International Journal of Machine Learning and Computing , vol.5 , pp. 91-95
- Wawrzynski, P.¹

30
- 84875884428
- Autonomous reinforcement learning with experience replay
- Wawrzynski, ´ Paweł and Tanwani, Ajay Kumar. Autonomous reinforcement learning with experience replay. Neural Networks, 41:156–167, 2013.
- (2013) Neural Networks , vol.41 , pp. 156-167
- Wawrzynski, P.¹ Tanwani, A.K.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.