메뉴 건너뛰기




Volumn 20, Issue 6, 2007, Pages 723-735

Reinforcement learning for a biped robot based on a CPG-actor-critic method

Author keywords

Actor critic model; Biped walking; Central pattern generator; Policy gradient method; Reinforcement learning

Indexed keywords

AUTONOMOUS AGENTS; BIPED LOCOMOTION; GRADIENT METHODS; MOTION PLANNING; RANDOM PROCESSES; REINFORCEMENT LEARNING;

EID: 34547769694     PISSN: 08936080     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.neunet.2007.01.002     Document Type: Article
Times cited : (110)

References (53)
  • 1
    • 0000396062 scopus 로고    scopus 로고
    • Natural gradient works efficiently in learning
    • Amari S. Natural gradient works efficiently in learning. Neural Computation 10 2 (1998) 251-276
    • (1998) Neural Computation , vol.10 , Issue.2 , pp. 251-276
    • Amari, S.1
  • 3
    • 0036452642 scopus 로고    scopus 로고
    • Bentivegna, D. C., Ude, A., Atkeson, C. G., & Cheng, G. (2002). Humanoid robot learning and game playing using pc-based vision. In Proceedings of the 2002 IEEE/RSJ international conference on intelligent robots and systems (pp. 2449-2454)
  • 4
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke S.J., and Barto A.G. Linear least-squares algorithms for temporal difference learning. Machine Learning 22 (1996) 33-57
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 5
    • 0032192336 scopus 로고    scopus 로고
    • Walknet - A biologically inspired network to control six-legged walking
    • Cruse H., Kindermann T., Schumm M., Dean J., and Schmitz J. Walknet - A biologically inspired network to control six-legged walking. Neural Networks 11 7-8 (1998) 1435-1447
    • (1998) Neural Networks , vol.11 , Issue.7-8 , pp. 1435-1447
    • Cruse, H.1    Kindermann, T.2    Schumm, M.3    Dean, J.4    Schmitz, J.5
  • 7
    • 85047678284 scopus 로고
    • A combined neuronal and mechanical model of fish swimming
    • Ekeberg Ö. A combined neuronal and mechanical model of fish swimming. Biological Cybernetics 69 (1993) 363-374
    • (1993) Biological Cybernetics , vol.69 , pp. 363-374
    • Ekeberg, Ö.1
  • 8
    • 0037645833 scopus 로고    scopus 로고
    • Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts
    • Fukuoka Y., Kimura H., and Cohen A.H. Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts. International Journal of Robotics Research 22 3-4 (2003) 187-202
    • (2003) International Journal of Robotics Research , vol.22 , Issue.3-4 , pp. 187-202
    • Fukuoka, Y.1    Kimura, H.2    Cohen, A.H.3
  • 9
    • 0026011636 scopus 로고
    • Neuronal network generating locomotor behavior in lamprey: Circuitry, transmitters, membrane properties and simulations
    • Grillner S., Wallen P., Brodin L., and Lansner A. Neuronal network generating locomotor behavior in lamprey: Circuitry, transmitters, membrane properties and simulations. Annual Review of Neuroscience 14 (1991) 169-199
    • (1991) Annual Review of Neuroscience , vol.14 , pp. 169-199
    • Grillner, S.1    Wallen, P.2    Brodin, L.3    Lansner, A.4
  • 10
    • 0031638777 scopus 로고    scopus 로고
    • Hirai, K., Hirose, M., Haikawa, Y., & Takenaka, T. (1998). The development of honda humanoid robot. In Proceedings of the 1998 IEEE international conference on robotics & automation (pp. 1321-1326)
  • 11
    • 79957983466 scopus 로고    scopus 로고
    • Hitomi, K., Shibata, T., Nakamura, Y., & Ishii, S. (2005). On-line learning of a feedback controller for quasi-passive-dynamic walking by a stochastic policy gradient method. In IEEE/RSJ international conference on intelligent robots and systems (pp. 1923-1928)
  • 13
    • 0347409420 scopus 로고    scopus 로고
    • Inada, H., & Ishii, K. (2003). Behavior generation of bipedal robot using central pattern generator(CPG) (1st report: Cpg parameters searching method by genetic algorithm). In Proceedings of international conference on intelligent robots and systems: Vol. 3 (pp. 2179-2184)
  • 15
    • 0036592028 scopus 로고    scopus 로고
    • Control of exploitation-exploration meta-parameter in reinforcement learning
    • Ishii S., Yoshida W., and Yoshimoto J. Control of exploitation-exploration meta-parameter in reinforcement learning. Neural Networks 15 4 (2002) 665-687
    • (2002) Neural Networks , vol.15 , Issue.4 , pp. 665-687
    • Ishii, S.1    Yoshida, W.2    Yoshimoto, J.3
  • 16
    • 11244335985 scopus 로고    scopus 로고
    • Itoh, Y., Taki, K., Kato, S., & Itoh, H. (2004). A stochastic optimization method of CPG-based motion control for humanoid locomotion. In IEEE conference on robotics, automation and mechatronics (pp. 347-351)
  • 19
    • 33749263205 scopus 로고    scopus 로고
    • Keller, P. W., Mannor, S., & Precup, D. (2006). Automatic basis function construction for approximate dynamic programming and reinforcement learning. In The 23rd international conference on machine learning
  • 20
    • 34547742904 scopus 로고    scopus 로고
    • Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In 15th international conference on machine learning (pp. 278-286)
  • 22
    • 34547753299 scopus 로고    scopus 로고
    • Kotosaka, S., & Schaal, S. (2000). Synchronized robot drumming by neural oscillator. In The international symposium on adaptive motion of animals and machines
  • 23
    • 35048819671 scopus 로고    scopus 로고
    • Lagoudakis, M. G., Parr, R., & Littman, M. L. (2002). Least-squares methods in reinforcement learning for control. In Methods and applications of artificial intelligence, second hellenic conference on AI, SETN (pp. 249-260)
  • 24
    • 34547795543 scopus 로고    scopus 로고
    • Lewis, M., Fagg, A., & Bekey, G. (1993). Genetic algorithms for gait synthesis in a hexapod robot. In Recent trends in mobile robots (pp. 317-331)
  • 25
    • 0036284101 scopus 로고    scopus 로고
    • Stabilization control for biped follow walking
    • Lim H., Yamamoto Y., and Takanishi A. Stabilization control for biped follow walking. Advanced Robotics 16 4 (2002) 361-380
    • (2002) Advanced Robotics , vol.16 , Issue.4 , pp. 361-380
    • Lim, H.1    Yamamoto, Y.2    Takanishi, A.3
  • 26
    • 0022390346 scopus 로고
    • Sustained oscillations generated by mutually inhibiting neurons with adaption
    • Matsuoka K. Sustained oscillations generated by mutually inhibiting neurons with adaption. Biological Cybernetics 52 (1985) 367-376
    • (1985) Biological Cybernetics , vol.52 , pp. 367-376
    • Matsuoka, K.1
  • 28
    • 17444414191 scopus 로고    scopus 로고
    • Basis function adaptation in temporal difference reinforcement learning
    • Menache I., Mannor S., and Shimkin N. Basis function adaptation in temporal difference reinforcement learning. Annals of Operations Research 134 1 (2005) 215-238
    • (2005) Annals of Operations Research , vol.134 , Issue.1 , pp. 215-238
    • Menache, I.1    Mannor, S.2    Shimkin, N.3
  • 29
    • 0038348342 scopus 로고    scopus 로고
    • Evolutionary generation of human-like bipedal locomotion
    • Miyashita K., Ok S., and Hase K. Evolutionary generation of human-like bipedal locomotion. Mechatronics 13 (2003) 791-807
    • (2003) Mechatronics , vol.13 , pp. 791-807
    • Miyashita, K.1    Ok, S.2    Hase, K.3
  • 30
    • 84898963340 scopus 로고    scopus 로고
    • Minimax differential dynamic programming: An application to robust biped walking
    • Morimoto J., and Atkeson C.G. Minimax differential dynamic programming: An application to robust biped walking. Advances in Neural Information Processing Systems 15 (2003) 1539-1546
    • (2003) Advances in Neural Information Processing Systems , vol.15 , pp. 1539-1546
    • Morimoto, J.1    Atkeson, C.G.2
  • 31
    • 0035979437 scopus 로고    scopus 로고
    • Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning
    • Morimoto J., and Doya K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems 36 (2001) 37-51
    • (2001) Robotics and Autonomous Systems , vol.36 , pp. 37-51
    • Morimoto, J.1    Doya, K.2
  • 32
    • 34547734144 scopus 로고    scopus 로고
    • Nakamura, Y., Sato, M., & Ishii, S. (2003). Reinforcement learning for biped robot. In 2nd international symposium on adaptive motion of animals and machines
  • 34
    • 0034328047 scopus 로고    scopus 로고
    • Legged insects select the optimal locomotor pattern based on the energetic cost
    • Nishii J. Legged insects select the optimal locomotor pattern based on the energetic cost. Journal of Biological Cybernetics 83 5 (2000)
    • (2000) Journal of Biological Cybernetics , vol.83 , Issue.5
    • Nishii, J.1
  • 35
    • 0035218760 scopus 로고    scopus 로고
    • Generation of human bipedal locomotion by a bio-mimetic neuro-musculo-skeletal model
    • Ogihara N., and Yamazaki N. Generation of human bipedal locomotion by a bio-mimetic neuro-musculo-skeletal model. Biological Cybernetics 84 (2001) 1-11
    • (2001) Biological Cybernetics , vol.84 , pp. 1-11
    • Ogihara, N.1    Yamazaki, N.2
  • 36
    • 0034266869 scopus 로고    scopus 로고
    • Natural gradient works efficiently in learning
    • Park H., Amari S., and Fukumizu K. Natural gradient works efficiently in learning. Neural Networks 13 (2000) 755-764
    • (2000) Neural Networks , vol.13 , pp. 755-764
    • Park, H.1    Amari, S.2    Fukumizu, K.3
  • 37
    • 0029375851 scopus 로고
    • Gradient calculations for dynamic recurrent neural networks: A survey
    • Pearlmutter B.A. Gradient calculations for dynamic recurrent neural networks: A survey. IEEE Transactions on Neural Networks 6 5 (1995) 1212-1228
    • (1995) IEEE Transactions on Neural Networks , vol.6 , Issue.5 , pp. 1212-1228
    • Pearlmutter, B.A.1
  • 38
    • 34547731726 scopus 로고    scopus 로고
    • Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Third IEEE international conference on humanoid robotics 2003
  • 39
    • 34547785329 scopus 로고    scopus 로고
    • Peters, J., Vijayakumar, S., & Schaal, S. (2005). Natural actor-critic. In The 16th European conference on machine learning
  • 40
    • 22944448066 scopus 로고    scopus 로고
    • Ratitch, B., & Precup, D. (2004). Sparse distributed memories for on-line value-based reinforcement learning. In The 16th European conference on machine learning (pp. 347-358)
  • 41
    • 0025020623 scopus 로고
    • A real time learning algorithm for recurrent analog neural networks
    • Sato M. A real time learning algorithm for recurrent analog neural networks. Biological Cybernetics 62 (1990) 237-241
    • (1990) Biological Cybernetics , vol.62 , pp. 237-241
    • Sato, M.1
  • 43
    • 34547774922 scopus 로고    scopus 로고
    • Schaal, S., Peters, J., & Ijspeert, J. N. A. (2004). Learning movement primitives. In International symposium on robotics research
  • 46
    • 0026045478 scopus 로고
    • Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment
    • Taga G., Yamaguchi Y., and Shimizu H. Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment. Biological Cybernetics 65 (1991) 147-159
    • (1991) Biological Cybernetics , vol.65 , pp. 147-159
    • Taga, G.1    Yamaguchi, Y.2    Shimizu, H.3
  • 47
    • 14044262287 scopus 로고    scopus 로고
    • Tedrake, R., Zhang, T. W., & Seung, H. S. (2004). Stochastic policy gradient reinforcement learning on a simple 3d biped. In Proceedings of the IEEE international conference on intelligent robots and systems(pp. 2849-2854)
  • 48
    • 0002210775 scopus 로고
    • The role of exploration in learning control with neural networks
    • White D.A., and Sofge D.A. (Eds), Van Nostrand Reinhold, Florence, Kentucky
    • Thrun S.B. The role of exploration in learning control with neural networks. In: White D.A., and Sofge D.A. (Eds). Handbook of intelligent control: Neural, fuzzy and adaptive approaches (1992), Van Nostrand Reinhold, Florence, Kentucky
    • (1992) Handbook of intelligent control: Neural, fuzzy and adaptive approaches
    • Thrun, S.B.1
  • 49
    • 0032134983 scopus 로고    scopus 로고
    • A neuro-mechanical model of legged locomotion: Single leg control
    • Wadden T., and Ekeberg O. A neuro-mechanical model of legged locomotion: Single leg control. Biological Cybernetics 79 2 (1998) 161-173
    • (1998) Biological Cybernetics , vol.79 , Issue.2 , pp. 161-173
    • Wadden, T.1    Ekeberg, O.2
  • 51
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 (1992) 229-256
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 52
    • 0032191803 scopus 로고    scopus 로고
    • Neural control of rhythmic arm movements
    • Williamson M.M. Neural control of rhythmic arm movements. Neural Networks 11 7-8 (1998) 1379-1394
    • (1998) Neural Networks , vol.11 , Issue.7-8 , pp. 1379-1394
    • Williamson, M.M.1
  • 53
    • 0033720246 scopus 로고    scopus 로고
    • Yoshimoto, J., Ishii, S., & Sato, M. (2000). On-line em reinforcement learning. In IEEE-INNS-ENNS international joint conference on neural networks: Vol. 3 (pp. 163-168)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.