SCOPUS 정보 검색 플랫폼

30th International Conference on Machine Learning, ICML 2013

Volumn , Issue PART 2, 2013, Pages 1038-1046

Guided policy search

(2) Levine, Sergey a Koltun, Vladlen a

a Stanford University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

OPTIMIZATION;

DIFFERENTIAL DYNAMIC PROGRAMMING; DIRECT POLICY SEARCH; HIGH-DIMENSIONAL SYSTEMS; LEARNING NEURAL NETWORKS; POLICY LEARNING; POLICY OPTIMIZATION; POLICY SEARCH; TRAJECTORY OPTIMIZATION;

LEARNING SYSTEMS;

EID: 84897529781 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (295)

References (23)

1
- 84864030941
- An application of reinforcement learning to aerobatic helicopter flight
- Abbeel, P., Coates, A., Quigley, M., and Ng, A. An application of reinforcement learning to aerobatic helicopter flight. In Advances in Neural Information Processing Systems (NIPS 19), 2006.
- (2006) Advances in Neural Information Processing Systems (NIPS 19)
- Abbeel, P.¹ Coates, A.² Quigley, M.³ Ng, A.⁴

2
- 85156195508
- Nonparametric representation of policies and value functions: A trajectory-based approach
- Atkeson, C. and Morimoto, J. Nonparametric representation of policies and value functions: A trajectory-based approach. In Advances in Neural Information Processing Systems (NIPS 15), 2002.
- (2002) Advances in Neural Information Processing Systems (NIPS 15)
- Atkeson, C.¹ Morimoto, J.²

3
- 49049094416
- Random sampling of states in dynamic programming
- Atkeson, C. and Stephens, B. Random sampling of states in dynamic programming. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 38(4): 924-929, 2008.
- (2008) IEEE Transactions on Systems, Man, and Cybernetics, Part B , vol.38 , Issue.4 , pp. 924-929
- Atkeson, C.¹ Stephens, B.²

4
- 84898962948
- Policy search by dynamic programming
- Bagnell, A., Kakade, S., Ng, A., and Schneider, J. Policy search by dynamic programming. In Advances in Neural Information Processing Systems (NIPS 16), 2003.
- (2003) Advances in Neural Information Processing Systems (NIPS 16)
- Bagnell, A.¹ Kakade, S.² Ng, A.³ Schneider, J.⁴

5
- 0013495368
- Experiments with infinite-horizon, policy-gradient estimation
- Baxter, J., Bartlett, P., and Weaver, L. Experiments with infinite-horizon, policy-gradient estimation. Journal of Artificial Intelligence Research, 15: 351-381, 2001.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.² Weaver, L.³

6
- 80053441894
- PILCO: A model-based and data-efficient approach to policy search
- Deisenroth, M. and Rasmussen, C. PILCO: a model-based and data-efficient approach to policy search. In International Conference on Machine Learning (ICML), 2011.
- International Conference on Machine Learning (ICML), 2011
- Deisenroth, M.¹ Rasmussen, C.²

7
- 77956518472
- Inverse optimal control with linearly-solvable MDPs
- Dvijotham, K. and Todorov, E. Inverse optimal control with linearly-solvable MDPs. In International Conference on Machine Learning (ICML), 2010.
- International Conference on Machine Learning (ICML), 2010
- Dvijotham, K.¹ Todorov, E.²

8
- 0036059542
- Movement imitation with nonlinear dynamical systems in humanoid robots
- Ijspeert, A., Nakanishi, J., and Schaal, S. Movement imitation with nonlinear dynamical systems in humanoid robots. In International Conference on Robotics and Automation, 2002.
- International Conference on Robotics and Automation, 2002
- Ijspeert, A.¹ Nakanishi, J.² Schaal, S.³

9
- 1942514728
- Approximately optimal approximate reinforcement learning
- Kakade, S. and Langford, J. Approximately optimal approximate reinforcement learning. In International Conference on Machine Learning (ICML), 2002.
- International Conference on Machine Learning (ICML), 2002
- Kakade, S.¹ Langford, J.²

10
- 84871705710
- STOMP: Stochastic trajectory optimization for motion planning
- Kalakrishnan, M., Chitta, S., Theodorou, E., Pastor, P., and Schaal, S. STOMP: stochastic trajectory optimization for motion planning. In International Conference on Robotics and Automation, 2011.
- International Conference on Robotics and Automation, 2011
- Kalakrishnan, M.¹ Chitta, S.² Theodorou, E.³ Pastor, P.⁴ Schaal, S.⁵

11
- 85060321083
- Learning motor primitives for robotics
- Kober, J. and Peters, J. Learning motor primitives for robotics. In International Conference on Robotics and Automation, 2009.
- International Conference on Robotics and Automation, 2009
- Kober, J.¹ Peters, J.²

12
- 18544382314
- Learning from scarce experience
- Peshkin, L. and Shelton, C. Learning from scarce experience. In International Conference on Machine Learning (ICML), 2002.
- International Conference on Machine Learning (ICML), 2002
- Peshkin, L.¹ Shelton, C.²

13
- 44949241322
- Reinforcement learning of motor skills with policy gradients
- Peters, J. and Schaal, S. Reinforcement learning of motor skills with policy gradients. Neural Networks, 21(4):682-697, 2008.
- (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
- Peters, J.¹ Schaal, S.²

14
- 84867115891
- Agnostic system identification for model-based reinforcement learning
- Ross, S. and Bagnell, A. Agnostic system identification for model-based reinforcement learning. In International Conference on Machine Learning (ICML), 2012.
- International Conference on Machine Learning (ICML), 2012
- Ross, S.¹ Bagnell, A.²

15
- 84862273266
- A reduction of imitation learning and structured prediction to no-regret online learning
- Ross, S., Gordon, G., and Bagnell, A. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research, 15:627-635, 2011.
- (2011) Journal of Machine Learning Research , vol.15 , pp. 627-635
- Ross, S.¹ Gordon, G.² Bagnell, A.³

16
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton, R., McAllester, D., Singh, S., and Mansour, Y. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS 11), 1999.
- (1999) Advances in Neural Information Processing Systems (NIPS 11)
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

17
- 85161982655
- On a connection between importance sampling and the likelihood ratio policy gradient
- Tang, J. and Abbeel, P. On a connection between importance sampling and the likelihood ratio policy gradient. In Advances in Neural Information Processing Systems (NIPS 23), 2010.
- (2010) Advances in Neural Information Processing Systems (NIPS 23)
- Tang, J.¹ Abbeel, P.²

18
- 84872363924
- Synthesis and stabilization of complex behaviors through online trajectory optimization
- Tassa, Y., Erez, T., and Todorov, E. Synthesis and stabilization of complex behaviors through online trajectory optimization. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
- IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012
- Tassa, Y.¹ Erez, T.² Todorov, E.³

19
- 14044262287
- Stochastic policy gradient reinforcement learning on a simple 3d biped
- Tedrake, R., Zhang, T., and Seung, H. Stochastic policy gradient reinforcement learning on a simple 3d biped. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004.
- IEEE/RSJ International Conference on Intelligent Robots and Systems, 2004
- Tedrake, R.¹ Zhang, T.² Seung, H.³

20
- 84872292044
- MuJoCo: A physics engine for model-based control
- Todorov, E., Erez, T., and Tassa, Y. MuJoCo: A physics engine for model-based control. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012.
- IEEE/RSJ International Conference on Intelligent Robots and Systems, 2012
- Todorov, E.¹ Erez, T.² Tassa, Y.³

21
- 77950579842
- Control of a walking biped using a combination of simple policies
- Whitman, E. and Atkeson, C. Control of a walking biped using a combination of simple policies. In 9th IEEE-RAS International Conference on Humanoid Robots, 2009.
- 9th IEEE-RAS International Conference on Humanoid Robots, 2009
- Whitman, E.¹ Atkeson, C.²

22
- 34547691027
- SIMBICON: Simple biped locomotion control
- Yin, K., Loken, K., and van de Panne, M. SIMBICON: simple biped locomotion control. ACM Transactions Graphics, 26(3), 2007.
- (2007) ACM Transactions Graphics , vol.26 , Issue.3
- Yin, K.¹ Loken, K.² Van De Panne, M.³

23
- 84856113877
- PhD thesis, Carnegie Mellon University
- Ziebart, B. Modeling purposeful adaptive behavior with the principle of maximum causal entropy. PhD thesis, Carnegie Mellon University, 2010.
- (2010) Modeling Purposeful Adaptive Behavior with the Principle of Maximum Causal Entropy
- Ziebart, B.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.