SCOPUS 정보 검색 플랫폼

Robotics and Autonomous Systems

Volumn 54, Issue 11, 2006, Pages 911-920

Learning CPG-based biped locomotion with a policy gradient method

(5) Matsubara, Takamitsu a,c Morimoto, Jun b,c Nakanishi, Jun b,c Sato, Masa aki c Doya, Kenji a,c,d

a NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

b JAPAN SCIENCE AND TECHNOLOGY AGENCY (Japan)

c ADVANCED TELECOMMUNICATIONS RESEARCH INSTITUTE INTERNATIONAL (Japan)

d OKINAWA INSTITUTE OF SCIENCE AND TECHNOLOGY GRADUATE UNIVERSITY (Japan)

Author keywords

Biped locomotion; Central pattern generator; Policy gradient; Reinforcement learning

Indexed keywords

CENTRAL PATTERN GENERATORS; POLICY GRADIENT; REINFORCEMENT LEARNING;

COMPUTER SIMULATION; LEARNING SYSTEMS; PATTERN RECOGNITION; ROBOTS; SENSORY FEEDBACK;

BIPED LOCOMOTION;

EID: 33749990848 PISSN: 09218890 EISSN: None Source Type: Journal
DOI: 10.1016/j.robot.2006.05.012 Document Type: Article

Times cited : (87)

References (26)

1
- 0022390346
- Sustained oscillations generated by mutually inhibiting neurons with adaptation
- Matsuoka K. Sustained oscillations generated by mutually inhibiting neurons with adaptation. Biological Cybernetics 52 (1985) 367-376
- (1985) Biological Cybernetics , vol.52 , pp. 367-376
- Matsuoka, K.¹

2
- 0026045478
- Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment
- Taga G., Yamaguchi Y., and Shimizu H. Self-organized control of bipedal locomotion by neural oscillators in unpredictable environment. Biological Cybernetics 65 (1991) 147-159
- (1991) Biological Cybernetics , vol.65 , pp. 147-159
- Taga, G.¹ Yamaguchi, Y.² Shimizu, H.³

3
- 0037645833
- Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts
- Fukuoka Y., Kimura H., and Cohen A. Adaptive dynamic walking of a quadruped robot on irregular terrain based on biological concepts. The International Journal of Robotics Research 22 3-4 (2003) 187-202
- (2003) The International Journal of Robotics Research , vol.22 , Issue.3-4 , pp. 187-202
- Fukuoka, Y.¹ Kimura, H.² Cohen, A.³

4
- 2942520057
- G. Endo, J. Morimoto, J. Nakanishi, G. Cheng, An empirical exploration of a neural oscillator for biped locomotion control, in: IEEE International Conference on Robotics and Automation, 2004, pp. 3036-3042

5
- 0032251175
- Computer simulation of the ontogeny of biped walking
- Hase K., and Yamazaki N. Computer simulation of the ontogeny of biped walking. Anthropological Science 106 4 (1998) 327-347
- (1998) Anthropological Science , vol.106 , Issue.4 , pp. 327-347
- Hase, K.¹ Yamazaki, N.²

6
- 84902174443
- M. Sato, Y. Nakamura, S. Ishii, Reinforcement learning for biped locomotion, in: International Conference on Artificial Neural Networks, 2002, pp. 777-782

7
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 (1992) 229-256
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

8
- 0008336447
- An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function
- Kimura H., and Kobayashi S. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. Internal Conference on Machine Learning (1998) 278-286
- (1998) Internal Conference on Machine Learning , pp. 278-286
- Kimura, H.¹ Kobayashi, S.²

9
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton R.S., McAllester D., Singh S., and Mansour Y. Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems 12 (2000) 1057-1063
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

10
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter J., and Bartlett P.L. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15 (2001) 319-350
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

11
- 4043069840
- On actor-critic algorithms
- Konda V., and Tsitsiklis J. On actor-critic algorithms. Society for Industrial and Applied Mathematics 42 4 (2003) 1143-1166
- (2003) Society for Industrial and Applied Mathematics , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.¹ Tsitsiklis, J.²

12
- 0004102479
- MIT Press
- Sutton R.S., and Barto A.G. Reinforcement Learning: An Introduction (1998), MIT Press
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

13
- 33750033017
- S. Singh, T. Jaakkola, M. Jordan, Learning without state-estimation in partially observable markovian decision processes, in: In Machine Learning: Proceedings of the Eleventh International Conference, 1994, pp. 284-292

14
- 0035709047
- H. Kimura, T. Yamashita, S. Kobayashi, Reinforcement learning of walking behavior for a four-legged robot, in: Proceedings of the IEEE Conference on Decision and Control, 2001, pp. 411-416

15
- 14044262287
- R. Tedrake, T.W. Zhang, H.S. Seung, Stochastic policy gradient reinforcement learning on a simple 3D biped, in: Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2004, pp. 2849-2854

16
- 0033629916
- Reinforcement learning in continuous time and space
- Doya K. Reinforcement learning in continuous time and space. Neural Computation 12 (2000) 219-245
- (2000) Neural Computation , vol.12 , pp. 219-245
- Doya, K.¹

17
- 0347409473
- J. Morimoto, G. Zeglin, C. Atkeson, Minimax differential dynamic programming: Application to a biped walking robot, in: Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, 2003, pp. 1927-1932

18
- 0035979437
- Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning
- Morimoto J., and Doya K. Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning. Robotics and Autonomous Systems 36 (2001) 37-51
- (2001) Robotics and Autonomous Systems , vol.36 , pp. 37-51
- Morimoto, J.¹ Doya, K.²

19
- 0012003778
- Autobalancer: An online dynamic balance compensation scheme for humanoid robots
- Donald B.R., Lynch K., and Rus D. (Eds), A K Peters, Ltd.
- Kagami S., Kanehiro F., Tamiya Y., Inaba M., and Inoue H. Autobalancer: An online dynamic balance compensation scheme for humanoid robots. In: Donald B.R., Lynch K., and Rus D. (Eds). Algorithmic and Computational Robotics: New Directions (2001), A K Peters, Ltd. 329-340
- (2001) Algorithmic and Computational Robotics: New Directions , pp. 329-340
- Kagami, S.¹ Kanehiro, F.² Tamiya, Y.³ Inaba, M.⁴ Inoue, H.⁵

20
- 0036168467
- A fast dynamically equilibrated walking trajectory generation method of humanoid robot
- Kagami S., Kitagawa T., Nishiwaki K., Sugihara T., and Inaba M. A fast dynamically equilibrated walking trajectory generation method of humanoid robot. Autonomous Robots 12 (2002) 71-82
- (2002) Autonomous Robots , vol.12 , pp. 71-82
- Kagami, S.¹ Kitagawa, T.² Nishiwaki, K.³ Sugihara, T.⁴ Inaba, M.⁵

21
- 0031638777
- K. Hirai, M. Hirose, Y. Haikawa, T. Takenaka, The development of Honda humanoid robot, in: IEEE International Conference on Robotics and Automation, 1998, pp. 1321-1326

22
- 0003490410
- Westview press
- Strogatz S.H. Nonlinear Dynamics and Chaos (1994), Westview press
- (1994) Nonlinear Dynamics and Chaos
- Strogatz, S.H.¹

23
- 33750000384
- C. Tuchiya, H. Kimura, S. Kobayashi, Policy learning by ga using importance sampling, in: The 8th Conference on Intelligent Autonomous Systems, 2004, pp. 281-290

24
- 33750021810
- D. Aberdeen, J. Baxter, Scalable internal-state policy-gradient methods for pomdps, in: ICML, 2002, pp. 3-10

25
- 0000985504
- Td-gammon, A self teaching backgammon program, achieves master legel play
- Tesauro G. Td-gammon, A self teaching backgammon program, achieves master legel play. Neural Computation 6 (1994) 215-219
- (1994) Neural Computation , vol.6 , pp. 215-219
- Tesauro, G.¹

26
- 33750024561
- M.J. Mataric, Reward functions for accelerated learning, in: Machine Learning: Proceedings of the Eleventh International Conference, 1994, pp. 181-189

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.