메뉴 건너뛰기




Volumn 54, Issue 11, 2006, Pages 911-920

Learning CPG-based biped locomotion with a policy gradient method

Author keywords

Biped locomotion; Central pattern generator; Policy gradient; Reinforcement learning

Indexed keywords

CENTRAL PATTERN GENERATORS; POLICY GRADIENT; REINFORCEMENT LEARNING;

EID: 33749990848     PISSN: 09218890     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.robot.2006.05.012     Document Type: Article
Times cited : (85)

References (26)
  • 1
    • 0022390346 scopus 로고
    • Sustained oscillations generated by mutually inhibiting neurons with adaptation
    • Matsuoka K. Sustained oscillations generated by mutually inhibiting neurons with adaptation. Biological Cybernetics 52 (1985) 367-376
    • (1985) Biological Cybernetics , vol.52 , pp. 367-376
    • Matsuoka, K.1
  • 2
    • 0026045478 scopus 로고
    • Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment
    • Taga G., Yamaguchi Y., and Shimizu H. Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment. Biological Cybernetics 65 (1991) 147-159
    • (1991) Biological Cybernetics , vol.65 , pp. 147-159
    • Taga, G.1    Yamaguchi, Y.2    Shimizu, H.3
  • 3
    • 0037645833 scopus 로고    scopus 로고
    • Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts
    • Fukuoka Y., Kimura H., and Cohen A. Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts. The International Journal of Robotics Research 22 3-4 (2003) 187-202
    • (2003) The International Journal of Robotics Research , vol.22 , Issue.3-4 , pp. 187-202
    • Fukuoka, Y.1    Kimura, H.2    Cohen, A.3
  • 4
    • 2942520057 scopus 로고    scopus 로고
    • G. Endo, J. Morimoto, J. Nakanishi, G. Cheng, An empirical exploration of a neural oscillator for biped locomotion control, in: IEEE International Conference on Robotics and Automation, 2004, pp. 3036-3042
  • 5
    • 0032251175 scopus 로고    scopus 로고
    • Computer simulation of the ontogeny of biped walking
    • Hase K., and Yamazaki N. Computer simulation of the ontogeny of biped walking. Anthropological Science 106 4 (1998) 327-347
    • (1998) Anthropological Science , vol.106 , Issue.4 , pp. 327-347
    • Hase, K.1    Yamazaki, N.2
  • 6
    • 84902174443 scopus 로고    scopus 로고
    • M. Sato, Y. Nakamura, S. Ishii, Reinforcement learning for biped locomotion, in: International Conference on Artificial Neural Networks, 2002, pp. 777-782
  • 7
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 (1992) 229-256
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 8
    • 0008336447 scopus 로고    scopus 로고
    • An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function
    • Kimura H., and Kobayashi S. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. Internal Conference on Machine Learning (1998) 278-286
    • (1998) Internal Conference on Machine Learning , pp. 278-286
    • Kimura, H.1    Kobayashi, S.2
  • 13
    • 33750033017 scopus 로고    scopus 로고
    • S. Singh, T. Jaakkola, M. Jordan, Learning without state-estimation in partially observable markovian decision processes, in: In Machine Learning: Proceedings of the Eleventh International Conference, 1994, pp. 284-292
  • 14
    • 0035709047 scopus 로고    scopus 로고
    • H. Kimura, T. Yamashita, S. Kobayashi, Reinforcement learning of walking behavior for a four-legged robot, in: Proceedings of the IEEE Conference on Decision and Control, 2001, pp. 411-416
  • 15
    • 14044262287 scopus 로고    scopus 로고
    • R. Tedrake, T.W. Zhang, H.S. Seung, Stochastic policy gradient reinforcement learning on a simple 3D biped, in: Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2004, pp. 2849-2854
  • 16
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • Doya K. Reinforcement learning in continuous time and space. Neural Computation 12 (2000) 219-245
    • (2000) Neural Computation , vol.12 , pp. 219-245
    • Doya, K.1
  • 17
    • 0347409473 scopus 로고    scopus 로고
    • J. Morimoto, G. Zeglin, C. Atkeson, Minimax differential dynamic programming: Application to a biped walking robot, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003, pp. 1927-1932
  • 18
    • 0035979437 scopus 로고    scopus 로고
    • Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning
    • Morimoto J., and Doya K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems 36 (2001) 37-51
    • (2001) Robotics and Autonomous Systems , vol.36 , pp. 37-51
    • Morimoto, J.1    Doya, K.2
  • 19
    • 0012003778 scopus 로고    scopus 로고
    • Autobalancer: An online dynamic balance compensation scheme for humanoid robots
    • Donald B.R., Lynch K., and Rus D. (Eds), A K Peters, Ltd.
    • Kagami S., Kanehiro F., Tamiya Y., Inaba M., and Inoue H. Autobalancer: An online dynamic balance compensation scheme for humanoid robots. In: Donald B.R., Lynch K., and Rus D. (Eds). Algorithmic and Computational Robotics: New Directions (2001), A K Peters, Ltd. 329-340
    • (2001) Algorithmic and Computational Robotics: New Directions , pp. 329-340
    • Kagami, S.1    Kanehiro, F.2    Tamiya, Y.3    Inaba, M.4    Inoue, H.5
  • 20
    • 0036168467 scopus 로고    scopus 로고
    • A fast dynamically equilibrated walking trajectory generation method of humanoid robot
    • Kagami S., Kitagawa T., Nishiwaki K., Sugihara T., and Inaba M. A fast dynamically equilibrated walking trajectory generation method of humanoid robot. Autonomous Robots 12 (2002) 71-82
    • (2002) Autonomous Robots , vol.12 , pp. 71-82
    • Kagami, S.1    Kitagawa, T.2    Nishiwaki, K.3    Sugihara, T.4    Inaba, M.5
  • 21
    • 0031638777 scopus 로고    scopus 로고
    • K. Hirai, M. Hirose, Y. Haikawa, T. Takenaka, The development of Honda humanoid robot, in: IEEE International Conference on Robotics and Automation, 1998, pp. 1321-1326
  • 23
    • 33750000384 scopus 로고    scopus 로고
    • C. Tuchiya, H. Kimura, S. Kobayashi, Policy learning by ga using importance sampling, in: The 8th Conference on Intelligent Autonomous Systems, 2004, pp. 281-290
  • 24
    • 33750021810 scopus 로고    scopus 로고
    • D. Aberdeen, J. Baxter, Scalable internal-state policy-gradient methods for pomdps, in: ICML, 2002, pp. 3-10
  • 25
    • 0000985504 scopus 로고
    • Td-gammon, A self teaching backgammon program, achieves master legel play
    • Tesauro G. Td-gammon, A self teaching backgammon program, achieves master legel play. Neural Computation 6 (1994) 215-219
    • (1994) Neural Computation , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 26
    • 33750024561 scopus 로고    scopus 로고
    • M.J. Mataric, Reward functions for accelerated learning, in: Machine Learning: Proceedings of the Eleventh International Conference, 1994, pp. 181-189


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.