SCOPUS 정보 검색 플랫폼

33rd International Conference on Machine Learning, ICML 2016

Volumn 6, Issue , 2016, Pages 4135-4148

Continuous deep q-learning with model-based acceleration

(4) Gu, Shixiang a,b,c Lillicrap, Timothy d Sutskever, Uya c Levine, Sergey c

a UNIVERSITY OF CAMBRIDGE (United Kingdom)

b MAX PLANCK INSTITUTE FOR INTELLIGENT SYSTEMS (Germany)

c GoogIe Brain (United States)

d DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; COMPLEX NETWORKS; DEEP LEARNING; EFFICIENCY; ITERATIVE METHODS; LEARNING ALGORITHMS;

ACTOR-CRITIC METHODS; COMPLEMENTARY TECHNIQUES; CONTINUOUS CONTROL; EXPERIENCE REPLAY; LOCAL LINEAR MODELS; MODEL-FREE ALGORITHMS; PHYSICAL SYSTEMS; SAMPLE COMPLEXITY;

REINFORCEMENT LEARNING;

EID: 84998579328 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (352)

References (37)

1
- 0031073475
- Locally weighted learning for control
- Springer
- Atkeson, Christopher G, Moore, Andrew W, and Schaal, Stefan. Locally weighted learning for control. In Lazy learning, pp. 75-113. Springer, 1997.
- (1997) Lazy Learning , pp. 75-113
- Atkeson, C.G.¹ Moore, A.W.² Schaal, S.³

2
- 0004370245
- Technical report, DTIC Document
- Baird III, Leemon C. Advantage updating. Technical report, DTIC Document, 1993.
- (1993) Advantage Updating
- Baird, L.C.¹

3
- 84998965670
- The importance of experience replay database composition in deep reinforcement learning
- NIPS
- de Bruin, Tim, Kober, Jens, Tuyls, Karl, and Babuska, Robert. The importance of experience replay database composition in deep reinforcement learning. Deep Reinforcement Learning Workshop, NIPS, 2015.
- (2015) Deep Reinforcement Learning Workshop
- De Bruin, T.¹ Kober, J.² Tuyls, K.³ Babuska, R.⁴

4
- 80053441894
- Pilco: A model-based and data-efficient approach to policy search
- Deisenroth, Marc and Rasmussen, Carl E. Pilco: A model-based and data-efficient approach to policy search. In International Conference on Machine Learning (ICML), pp. 465-172, 2011.
- (2011) International Conference on Machine Learning (ICML) , pp. 172-465
- Deisenroth, M.¹ Rasmussen, C.E.²

5
- 84903590417
- A survey on policy search for robotics
- Deisenroth, Marc Peter, Neumann, Gerhard, Peters, Jan, et al. A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1-2): 1-142, 2013.
- (2013) Foundations and Trends in Robotics , vol.2 , Issue.1-2 , pp. 1-142
- Deisenroth, M.P.¹ Neumann, G.² Peters, J.³

6
- 84998968291
- arXiv preprint arXiv: 1509.06841
- Fu, Justin, Levine, Sergey, and Abbeel, Pieter. One-shot learning of manipulation skills with online dynamics adaptation and neural network priors. arXiv preprint arXiv: 1509.06841, 2015.
- (2015) One-shot Learning of Manipulation Skills with Online Dynamics Adaptation and Neural Network Priors
- Fu, J.¹ Levine, S.² Abbeel, P.³

7
- 79958779459
- Reinforcement learning in feedback control
- Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine learning, 84(1-2): 137-169, 2011.
- (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 137-169
- Hafner, R.¹ Riedmiller, M.²

8
- 0003996286
- Multi-player residual advantage learning with general function approximation
- OH
- Harmon, Mance E and Baird III, Leemon C. Multi-player residual advantage learning with general function approximation. Wright Laboratory, WL/AACF, Wright-Patterson Air Force Base, OH, pp. 45433-7308, 1996.
- (1996) Wright Laboratory, WL/AACF, Wright-Patterson Air Force Base , pp. 45433-47308
- Harmon, M.E.¹ Baird, L.C.²

9
- 84980050020
- arXiv preprint arXiv: 1511.04143
- Hausknecht, Matthew and Stone, Peter. Deep reinforcement learning in parameterized action space. arXiv preprint arXiv: 1511.04143, 2015.
- (2015) Deep Reinforcement Learning in Parameterized Action Space
- Hausknecht, M.¹ Stone, P.²

10
- 84965103751
- Learning continuous control policies by stochastic value gradients
- Heess, Nicolas, Wayne, Gregory, Silver, David, Lillicrap, Tim, Erez, Tom, and Tassa, Yuval. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems (NIPS), pp. 2926-2934, 2015.
- (2015) Advances in Neural Information Processing Systems (NIPS) , pp. 2926-2934
- Heess, N.¹ Wayne, G.² Silver, D.³ Lillicrap, T.⁴ Erez, T.⁵ Tassa, Y.⁶

11
- 84994166299
- arXiv preprint arXiv:1412.6980
- Kingma, Diederik and Ba, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

12
- 84892593209
- Reinforcement learning in robotics: A survey
- Springer
- Kober, Jens and Peters, Jan. Reinforcement learning in robotics: A survey. In Reinforcement Learning, pp. 579-610. Springer, 2012.
- (2012) Reinforcement Learning , pp. 579-610
- Kober, J.¹ Peters, J.²

13
- 84898938510
- Actor-critic algorithms
- Konda, Vijay R and Tsitsiklis, John N. Actor-critic algorithms. In Advances in Neural Information Processing Systems (NIPS), volume 13, pp. 1008-1014, 1999.
- (1999) Advances in Neural Information Processing Systems (NIPS) , vol.13 , pp. 1008-1014
- Konda, V.R.¹ Tsitsiklis, J.N.²

14
- 84883060087
- Evolving large-scale neural networks for vision-based reinforcement learning
- ACM
- Koutnfk, Jan, Cuccu, Giuseppe, Schmidhuber, Jiirgen, and Gomez, Faustino. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp. 1061-1068. ACM, 2013.
- (2013) Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation , pp. 1061-1068
- Koutnfk, J.¹ Cuccu, G.² Schmidhuber, J.³ Gomez, F.⁴

15
- 84908494630
- Approximate model-assisted neural fitted q-iteration
- IEEE
- Lampe, Thomas and Riedmiller, Martin. Approximate model-assisted neural fitted q-iteration. In Neural Networks (IJCNN), 2014 International Joint Conference on, pp. 2698-2704. IEEE, 2014.
- (2014) Neural Networks (IJCNN), 2014 International Joint Conference on , pp. 2698-2704
- Lampe, T.¹ Riedmiller, M.²

16
- 84937822296
- Learning neural network policies with guided policy search under unknown dynamics
- Levine, Sergey and Abbeel, Pieter. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems (NIPS), pp. 1071-1079, 2014.
- (2014) Advances in Neural Information Processing Systems (NIPS) , pp. 1071-1079
- Levine, S.¹ Abbeel, P.²

17
- 84897529781
- Guided policy search
- Levine, Sergey and Koltun, Vladlen. Guided policy search. In International Conference on Machine Learning (ICML), pp. 1-9, 2013.
- (2013) International Conference on Machine Learning (ICML) , pp. 1-9
- Levine, S.¹ Koltun, V.²

18
- 84979924150
- End-to-end training of deep visuomotor policies
- Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. JMLR 17, 2016.
- (2016) JMLR , vol.17
- Levine, S.¹ Finn, C.² Darrell, T.³ Abbeel, P.⁴

19
- 17444424051
- Iterative linear quadratic regulator design for nonlinear biological movement systems
- Li, Weiwei and Todorov, Emanuel. Iterative linear quadratic regulator design for nonlinear biological movement systems. In ICINCO (1), pp. 222-229, 2004.
- (2004) ICINCO , Issue.1 , pp. 222-229
- Li, W.¹ Todorov, E.²

20
- 85083953657
- Continuous control with deep reinforcement learning
- Lillicrap, Timothy P, Hunt, Jonathan J, Pritzel, Alexander, Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David, and Wierstra, Daan. Continuous control with deep reinforcement learning. International Conference on Learning Representations (ICLR), 2016.
- (2016) International Conference on Learning Representations (ICLR)
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

21
- 84924051598
- Human-level control through deep reinforcement learning
- Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

22
- 0042474292
- Nguyen, D and Widrow, B. The truck backer upper: An example of self learning in neural networks, 1989.
- (1989) The Truck Backer Upper: An Example of Self Learning in Neural Networks
- Nguyen, D.¹ Widrow, B.²

23
- 34250635407
- Policy gradient methods for robotics
- IEEE
- Peters, Jan and Schaal, Stefan. Policy gradient methods for robotics. In International Conference on Intelligent Robots and Systems (IROS), pp. 2219-2225. IEEE, 2006.
- (2006) International Conference on Intelligent Robots and Systems (IROS) , pp. 2219-2225
- Peters, J.¹ Schaal, S.²

24
- 85167411371
- Relative entropy policy search
- Atlanta
- Peters, Jan, Mulling, Katharina, and Altun, Yasemin. Relative entropy policy search. In AAAI. Atlanta, 2010.
- (2010) AAAI
- Peters, J.¹ Mulling, K.² Altun, Y.³

25
- 84959314908
- and, Robotics
- Rawlik, Konrad, Toussaint, Marc, and Vijayakumar, Sethu. On stochastic optimal control and reinforcement learning by approximate inference. Robotics, pp. 353, 2013.
- (2013) On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference , pp. 353
- Rawlik, K.¹ Toussaint, M.² Vijayakumar Sethu³

26
- 84980041049
- arXiv preprint arXiv: 1511.05952
- Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv: 1511.05952, 2015.
- (2015) Prioritized Experience Replay
- Schaul, T.¹ Quan, J.² Antonoglou, I.³ Silver, D.⁴

27
- 0000728324
- Schmidhuber, Jurgen. Reinforcement learning in marko-vian and non-markovian environments, pp. 500-506, 1991.
- (1991) Reinforcement Learning in Marko-Vian and Non-markovian Environments , pp. 500-506
- Schmidhuber, J.¹

28
- 84969963490
- Trust region policy optimization
- Schulman, John, Levine, Sergey, Abbeel, Pieter, Jordan, Michael I., and Moritz, Philipp. Trust region policy optimization. In International Conference on Machine Learning (ICML), pp. 1889-1897, 2015.
- (2015) International Conference on Machine Learning (ICML) , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.I.⁴ Moritz, P.⁵

29
- 85083954383
- High-dimensional continuous control using generalized advantage estimation
- Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. High-dimensional continuous control using generalized advantage estimation. International Conference on Learning Representations (ICLR), 2016.
- (2016) International Conference on Learning Representations (ICLR)
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

30
- 84919793697
- Deterministic policy gradient algorithms
- Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In International Conference on Machine Learning (ICML), 2014.
- (2014) International Conference on Machine Learning (ICML)
- Silver, D.¹ Lever, G.² Heess, N.³ Degris, T.⁴ Wierstra, D.⁵ Riedmiller, M.⁶

31
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Sutton, Richard S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In International Conference on Machine Learning (ICML), pp. 216-224, 1990.
- (1990) International Conference on Machine Learning (ICML) , pp. 216-224
- Sutton, R.S.¹

32
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton, Richard S, McAllester, David A, Singh, Satin-der P, Mansour, Yishay, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS), volume 99, pp. 1057-1063, 1999.
- (1999) Advances in Neural Information Processing Systems (NIPS) , vol.99 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.-D.P.³ Mansour, Y.⁴

33
- 84872363924
- Synthesis and stabilization of complex behaviors through online trajectory optimization
- IEEE
- Tassa, Yuval, Erez, Tom, and Todorov, Emanuel. Synthesis and stabilization of complex behaviors through online trajectory optimization. In International Conference on Intelligent Robots and Systems (IROS), pp. 4906-4913. IEEE, 2012.
- (2012) International Conference on Intelligent Robots and Systems (IROS) , pp. 4906-4913
- Tassa, Y.¹ Erez, T.² Todorov, E.³

34
- 84872292044
- Mujoco: A physics engine for model-based control
- IEEE
- Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In International Conference on Intelligent Robots and Systems (IROS), pp. 5026-5033. IEEE, 2012.
- (2012) International Conference on Intelligent Robots and Systems (IROS) , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

35
- 84965104233
- arXiv preprint arXiv: 1502.02251
- Wahlstrom, Niklas, Schon, Thomas B, and Deisenroth, Marc Peter. From pixels to torques: Policy learning with deep dynamical models. arXiv preprint arXiv: 1502.02251, 2015.
- (2015) From Pixels to Torques: Policy Learning with Deep Dynamical Models
- Wahlstrom, N.¹ Schon, T.B.² Deisenroth, M.P.³

36
- 84998595997
- arXiv preprint arXiv:1511.06581
- Wang, Ziyu, de Freitas, Nando, and Lanctot, Marc. Dueling network architectures for deep reinforcement learning. arXiv preprint arXiv:1511.06581, 2015.
- (2015) Dueling Network Architectures for Deep Reinforcement Learning
- Wang, Z.¹ De Freitas, N.² Lanctot, M.³

37
- 84965129327
- Embed to control: A locally linear latent dynamics model for control from raw images
- Watter, Manuel, Springenberg, Jost, Boedecker, Joschka, and Riedmiller, Martin. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems (NIPS), pp. 2728-2736, 2015.
- (2015) Advances in Neural Information Processing Systems (NIPS) , pp. 2728-2736
- Watter, M.¹ Springenberg, J.² Boedecker, J.³ Riedmiller, M.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.