메뉴 건너뛰기




Volumn 84, Issue 1-2, 2011, Pages 171-203

Policy search for motor primitives in robotics

Author keywords

Episodic reinforcement learning; Motor control; Motor primitives; Policy learning

Indexed keywords

ACTOR CRITIC; BENCH-MARK PROBLEMS; EPISODIC REINFORCEMENT LEARNING; EXPECTATION MAXIMIZATION; HIGH-DIMENSIONAL; HUMANOID ROBOTICS; IMITATION LEARNING; MOTOR CONTROL; MOTOR LEARNING; MOTOR PRIMITIVES; MOTOR SKILLS; POLICY GRADIENT; POLICY GRADIENT METHODS; POLICY SEARCH; REAL ROBOT; REINFORCEMENT LEARNING METHOD; ROBOT ARMS;

EID: 78049390740     PISSN: 08856125     EISSN: 15730565     Source Type: Journal    
DOI: 10.1007/s10994-010-5223-6     Document Type: Article
Times cited : (262)

References (61)
  • 1
    • 0037262814 scopus 로고    scopus 로고
    • An introduction to MCMC for machine learning
    • 1033.68081 10.1023/A:1020281327116
    • C. Andrieu N. de Freitas A. Doucet M. I. Jordan 2003 An introduction to MCMC for machine learning Machine Learning 50 1 5 43 1033.68081 10.1023/A:1020281327116
    • (2003) Machine Learning , vol.50 , Issue.1 , pp. 5-43
    • Andrieu, C.1    De Freitas, N.2    Doucet, A.3    Jordan, M.I.4
  • 2
    • 0039816976 scopus 로고
    • Using local trajectory optimizers to speed up global optimization in dynamic programming
    • Atkeson, C. G. (1994). Using local trajectory optimizers to speed up global optimization in dynamic programming. In Advances in neural information processing systems (Vol. 6, pp. 503-521), Denver, CO, USA.
    • (1994) Advances in Neural Information Processing Systems Denver, CO, USA , vol.6 , pp. 503-521
    • Atkeson, C.G.1
  • 6
    • 0031273462 scopus 로고    scopus 로고
    • Adaptive Probabilistic Networks with Hidden Variables
    • J. Binder D. Koller S. Russell K. Kanazawa 1997 Adaptive probabilistic networks with hidden variables Machine Learning 29 2-3 213 244 0892.68079 10.1023/A:1007421730016 (Pubitemid 127510039)
    • (1997) Machine Learning , vol.29 , Issue.2-3 , pp. 213-244
    • Binder, J.1    Koller, D.2    Russell, S.3    Kanazawa, K.4
  • 7
    • 70049111229 scopus 로고    scopus 로고
    • Using Bayesian dynamical systems for motion template libraries
    • D. Koller D. Schuurmans Y. Bengio L. Bottou (eds)
    • Chiappa, S., Kober, J., & Peters, J. (2009). Using Bayesian dynamical systems for motion template libraries. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 297-304).
    • (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 297-304
    • Chiappa, S.1    Kober, J.2    Peters, J.3
  • 8
    • 79958813136 scopus 로고    scopus 로고
    • DARPA
    • DARPA (2010a). Learning locomotion (L2). http://www.darpa.mil/ipto/ programs/ll/ll.asp.
    • (2010) Learning Locomotion (L2)
  • 11
    • 0346982426 scopus 로고    scopus 로고
    • Using expectation-maximization for reinforcement learning
    • P. Dayan G. E. Hinton 1997 Using expectation-maximization for reinforcement learning Neural Computation 9 2 271 278 0876.68090 10.1162/neco.1997.9.2.271 (Pubitemid 127635391)
    • (1997) Neural Computation , vol.9 , Issue.2 , pp. 271-278
    • Dayan, P.1    Hinton, G.E.2
  • 15
    • 34948857495 scopus 로고    scopus 로고
    • Reinforcement learning for imitating constrained reaching movements
    • F. Guenter M. Hersch S. Calinon A. Billard 2007 Reinforcement learning for imitating constrained reaching movements Advanced Robotics, Special Issue on Imitative Robots 21 13 1521 1544 (Pubitemid 47529845)
    • (2007) Advanced Robotics , vol.21 , Issue.13 , pp. 1521-1544
    • Guenter, F.1    Hersch, M.2    Calinon, S.3    Billard, A.4
  • 20
    • 4243385070 scopus 로고
    • Convergence of stochastic iterative dynamic programming algorithms
    • J. D. Cowan G. Tesauro J. Alspector (eds). Morgan Kaufmann San Mateo
    • Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). Convergence of stochastic iterative dynamic programming algorithms. In J. D. Cowan, G. Tesauro, & J. Alspector (Eds.), Advances in neural information processing systems (Vol. 6, pp. 703-710). San Mateo: Morgan Kaufmann.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 703-710
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 23
    • 84858754385 scopus 로고    scopus 로고
    • Policy search for motor primitives in robotics
    • D. Koller D. Schuurmans Y. Bengio L. Bottou (eds)
    • Kober, J., & Peters, J. (2009b). Policy search for motor primitives in robotics. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (Vol. 21, pp. 849-856).
    • (2009) Advances in Neural Information Processing Systems , vol.21 , pp. 849-856
    • Kober, J.1    Peters, J.2
  • 30
    • 0030297195 scopus 로고    scopus 로고
    • A Kendama learning robot based on bi-directional theory
    • DOI 10.1016/S0893-6080(96)00043-3, PII S0893608096000433
    • H. Miyamoto S. Schaal F. Gandolfo H. Gomi Y. Koike R. Osu E. Nakano Y. Wada M. Kawato 1996 A Kendama learning robot based on bi-directional theory Neural Networks 9 8 1281 1302 10.1016/S0893-6080(96)00043-3 (Pubitemid 26413052)
    • (1996) Neural Networks , vol.9 , Issue.8 , pp. 1281-1302
    • Miyamoto, H.1    Schaal, S.2    Gandolfo, F.3    Gomi, H.4    Koike, Y.5    Osu, R.6    Nakano, E.7    Wada, Y.8    Kawato, M.9
  • 34
    • 79951968365 scopus 로고    scopus 로고
    • PASCAL2
    • PASCAL2 (2010). Challenges. http://pascallin2.ecs.soton.ac.uk/Challenges/ .
    • (2010) Challenges
  • 36
    • 70049104346 scopus 로고    scopus 로고
    • PhD thesis, University of Southern California, Los Angeles, CA, 90089, USA
    • Peters, J. (2007). Machine learning of motor skills for robotics. PhD thesis, University of Southern California, Los Angeles, CA, 90089, USA.
    • (2007) Machine Learning of Motor Skills for Robotics
    • Peters, J.1
  • 37
    • 34250635407 scopus 로고    scopus 로고
    • Policy gradient methods for robotics
    • DOI 10.1109/IROS.2006.282564, 4058714, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
    • Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ 2006 international conference on intelligent robots and systems (IROS) (pp. 2219-2225), Beijing, China. (Pubitemid 46928224)
    • (2006) IEEE International Conference on Intelligent Robots and Systems , pp. 2219-2225
    • Peters, J.1    Schaal, S.2
  • 43
    • 0036639869 scopus 로고    scopus 로고
    • Scalable techniques from nonparametric statistics for real time robot learning
    • DOI 10.1023/A:1015727715131
    • S. Schaal C. G. Atkeson S. Vijayakumar 2002 Scalable techniques from nonparameteric statistics for real-time robot learning Applied Intelligence 17 1 49 60 1003.68169 10.1023/A:1015727715131 (Pubitemid 34789897)
    • (2002) Applied Intelligence , vol.17 , Issue.1 , pp. 49-60
    • Schaal, S.1    Atkeson, C.G.2    Vijayakumar, S.3
  • 45
    • 34848832311 scopus 로고    scopus 로고
    • Dynamics systems vs. optimal control - A unifying view
    • DOI 10.1016/S0079-6123(06)65027-9, PII S0079612306650279, Computational Neuroscience: Theoretical Insights into Brain Function
    • S. Schaal P. Mohajerian A. J. Ijspeert 2007 Dynamics systems vs. optimal control-a unifying view Progress in Brain Research 165 1 425 445 10.1016/S0079-6123(06)65027-9 (Pubitemid 47513886)
    • (2007) Progress in Brain Research , vol.165 , pp. 425-445
    • Schaal, S.1    Mohajerian, P.2    Ijspeert, A.3
  • 51
    • 0002995053 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the international machine learning conference (pp. 9-44).
    • (1990) Proceedings of the International Machine Learning Conference , pp. 9-44
    • Sutton, R.S.1
  • 55
    • 14044262287 scopus 로고    scopus 로고
    • Stochastic policy gradient reinforcement learning on a simple 3D biped
    • SA1-E5, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
    • Tedrake, R., Zhang, T. W., & Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3d biped. In Proceedings of the IEEE 2004 international conference on intelligent robots and systems (IROS) (pp. 2849-2854). (Pubitemid 40276027)
    • (2004) 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) , vol.3 , pp. 2849-2854
    • Tedrake, R.1    Zhang, T.W.2    Seung, H.S.3
  • 59
    • 70349327392 scopus 로고    scopus 로고
    • Learning model-free robot control by a Monte Carlo em algorithm
    • 10.1007/s10514-009-9132-0
    • N. Vlassis M. Toussaint G. Kontes S. Piperidis 2009 Learning model-free robot control by a Monte Carlo EM algorithm Autonomous Robots 27 2 123 130 10.1007/s10514-009-9132-0
    • (2009) Autonomous Robots , vol.27 , Issue.2 , pp. 123-130
    • Vlassis, N.1    Toussaint, M.2    Kontes, G.3    Piperidis, S.4
  • 60
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • 0772.68076
    • R. J. Williams 1992 Simple statistical gradient-following algorithms for connectionist reinforcement learning Machine Learning 8 229 256 0772.68076
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.