SCOPUS 정보 검색 플랫폼

Neural Networks

Volumn 21, Issue 4, 2008, Pages 682-697

Reinforcement learning of motor skills with policy gradients

(2) Peters, Jan a,b Schaal, Stefan b,c

a MAX PLANCK INSTITUTE FOR BIOLOGICAL CYBERNETICS (Germany)

b UNIVERSITY OF SOUTHERN CALIFORNIA (United States)

c ADVANCED TELECOMMUNICATIONS RESEARCH INSTITUTE INTERNATIONAL (Japan)

Author keywords

Motor primitives; Motor skills; Natural Actor Critic; Natural gradients; Policy gradient methods; Reinforcement learning

Indexed keywords

ARTIFICIAL LIMBS; BEHAVIORAL RESEARCH; CONTINUOUS TIME SYSTEMS; ERROR ANALYSIS; GRADIENT METHODS; MOTOR TRANSPORTATION;

MOTOR PRIMITIVES; MOTOR SKILLS; NATURAL GRADIENTS; POLICY GRADIENT METHODS;

REINFORCEMENT LEARNING;

ALGORITHM; ARTICLE; LEARNING; MACHINE; MOTOR CONTROL; MOTOR PERFORMANCE; NEUROBIOLOGY; POLICY; PRIORITY JOURNAL; REINFORCEMENT; ROBOTICS; STOCHASTIC MODEL;

ALGORITHMS; ANIMALS; ARTIFICIAL INTELLIGENCE; COMPUTER SIMULATION; EXTREMITIES; FEEDBACK; FORELIMB; HUMANS; LOCOMOTION; MOTOR SKILLS; MOVEMENT; REINFORCEMENT (PSYCHOLOGY); ROBOTICS; STOCHASTIC PROCESSES;

EID: 44949241322 PISSN: 08936080 EISSN: None Source Type: Journal
DOI: 10.1016/j.neunet.2008.02.003 Document Type: Article

Times cited : (907)

References (76)

1
- 44949217877
- Aberdeen, D. (2006). POMDPs and policy gradients, presentation at the Machine Learning Summer School (MLSS)
- Aberdeen, D. (2006). POMDPs and policy gradients, presentation at the Machine Learning Summer School (MLSS)

2
- 0002686204
- Stochastic optimization
- Aleksandrov V., Sysoyev V., and Shemeneva V. Stochastic optimization. Engineering Cybernetics 5 (1968) 11-16
- (1968) Engineering Cybernetics , vol.5 , pp. 11-16
- Aleksandrov, V.¹ Sysoyev, V.² Shemeneva, V.³

3
- 0000396062
- Natural gradient works efficiently in learning
- Amari S. Natural gradient works efficiently in learning. Neural Computation 10 (1998) 251
- (1998) Neural Computation , vol.10 , pp. 251
- Amari, S.¹

4
- 0039816976
- Using local trajectory optimizers to speed up global optimization in dynamic programming
- Hanson J.E., Moody S.J., and Lippmann R.P. (Eds), Morgan Kaufmann
- Atkeson C.G. Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Hanson J.E., Moody S.J., and Lippmann R.P. (Eds). Advances in neural information processing systems 6 (1994), Morgan Kaufmann 503-521
- (1994) Advances in neural information processing systems 6 , pp. 503-521
- Atkeson, C.G.¹

5
- 84858765598
- Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (pp. 1019-1024)
- Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (pp. 1019-1024)

6
- 44949159194
- Baird, L. (1993). Advantage updating. Technical Report WL-TR-93-1146. Wright laboratory, Wright-Patterson air force base. OH
- Baird, L. (1993). Advantage updating. Technical Report WL-TR-93-1146. Wright laboratory, Wright-Patterson air force base. OH

7
- 0000810448
- Statistical inference, occam's razor, and statistical mechanics on the space of probability distributions
- Balasubramanian V. Statistical inference, occam's razor, and statistical mechanics on the space of probability distributions. Neural Computation 9 2 (1997) 349-368
- (1997) Neural Computation , vol.9 , Issue.2 , pp. 349-368
- Balasubramanian, V.¹

8
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- Barto A.G., Sutton R.S., and Anderson C.W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC 13 5 (1983) 115-133
- (1983) IEEE Transactions on Systems, Man, and Cybernetics SMC , vol.13 , Issue.5 , pp. 115-133
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

9
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter J., and Bartlett P. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15 (2001) 319-350
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.²

10
- 0013495368
- Experiments with infinite-horizon, policy- gradient estimation
- Baxter J., Bartlett P., and Weaver L. Experiments with infinite-horizon, policy- gradient estimation. Journal of Artificial Intelligence Research 15 (2001) 351-381
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.² Weaver, L.³

11
- 41549101727
- Minimum acceleration criterion with constraints implies bang-bang control as an underlying principle for optimal trajectories of arm reaching movements
- Ben-Itzhak S., and Karniel A. Minimum acceleration criterion with constraints implies bang-bang control as an underlying principle for optimal trajectories of arm reaching movements. Neural Computation 20 3 (2008) 779-812
- (2008) Neural Computation , vol.20 , Issue.3 , pp. 779-812
- Ben-Itzhak, S.¹ Karniel, A.²

12
- 44949086984
- Benbrahim, H., Doleac, J., Franklin, J., & Selfridge, O. (1992). Real-time learning: A ball on a beam. In Proceedings of the international joint conference on neural networks (pp. 92-103)
- Benbrahim, H., Doleac, J., Franklin, J., & Selfridge, O. (1992). Real-time learning: A ball on a beam. In Proceedings of the international joint conference on neural networks (pp. 92-103)

13
- 0031343491
- Biped dynamic walking using reinforcement learning
- Benbrahim H., and Franklin J. Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems 22 (1997) 283-302
- (1997) Robotics and Autonomous Systems , vol.22 , pp. 283-302
- Benbrahim, H.¹ Franklin, J.²

14
- 15544375067
- Statistical machine learning and combinatorial optimization
- Springer-Verlag, Heidelberg, Germany
- Berny A. Statistical machine learning and combinatorial optimization. Lecture notes in natural computing Vol. 33 (2000), Springer-Verlag, Heidelberg, Germany 287-306
- (2000) Lecture notes in natural computing , vol.33 , pp. 287-306
- Berny, A.¹

15
- 0004276055
- Academic Press, New York, NY
- Dyer P., and McReynolds S.R. The computation and theory of optimal control (1970), Academic Press, New York, NY
- (1970) The computation and theory of optimal control
- Dyer, P.¹ McReynolds, S.R.²

16
- 29344443108
- Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2005). Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In Proceedings of the national conference on artificial intelligence (pp. 1267-1273)
- Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2005). Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In Proceedings of the national conference on artificial intelligence (pp. 1267-1273)

17
- 27944489902
- Motor primitives in vertebrates and invertebrates
- Flash T., and Hochner B. Motor primitives in vertebrates and invertebrates. Current Opinions in Neurobiology 15 (2005) 660-666
- (2005) Current Opinions in Neurobiology , vol.15 , pp. 660-666
- Flash, T.¹ Hochner, B.²

18
- 0003768769
- John Wiley & Sons, New York, NY
- Fletcher R., and Fletcher R. Practical methods of optimization (2000), John Wiley & Sons, New York, NY
- (2000) Practical methods of optimization
- Fletcher, R.¹ Fletcher, R.²

19
- 0012260296
- Feature article: Optimization for simulation: Theory vs. practice
- Fu M.C. Feature article: Optimization for simulation: Theory vs. practice. INFORMS Journal on Computing 14 3 (2002) 192-215
- (2002) INFORMS Journal on Computing , vol.14 , Issue.3 , pp. 192-215
- Fu, M.C.¹

20
- 0023543886
- Glynn, P. (1987). Likelihood ratio gradient estimation: An overview. In Proceedings of the winter simulation conference (pp. 366-375)
- Glynn, P. (1987). Likelihood ratio gradient estimation: An overview. In Proceedings of the winter simulation conference (pp. 366-375)

21
- 84976859194
- Likelihood ratio gradient estimation for stochastic systems
- Glynn P. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33 10 (1990) 75-84
- (1990) Communications of the ACM , vol.33 , Issue.10 , pp. 75-84
- Glynn, P.¹

22
- 34249002045
- Variance reduction techniques for gradient estimates in reinforcement learning
- Greensmith E., Bartlett P., and Baxter J. Variance reduction techniques for gradient estimates in reinforcement learning. Advances in Neural Information Processing Systems 14 34 (2001)
- (2001) Advances in Neural Information Processing Systems , vol.14 , Issue.34
- Greensmith, E.¹ Bartlett, P.² Baxter, J.³

23
- 84897694817
- Variance reduction techniques for gradient estimates in reinforcement learning
- Greensmith E., Bartlett P.L., and Baxter J. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5 (2004) 1471-1530
- (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.L.² Baxter, J.³

24
- 34948857495
- Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. In Imitative Robots [Special issue]. RSJ Advanced Robotics, 21, 1521-1544
- Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. In Imitative Robots [Special issue]. RSJ Advanced Robotics, 21, 1521-1544

25
- 0025600638
- A stochastic reinforcement learning algorithm for learning real-valued functions
- Gullapalli V. A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks 3 6 (1990) 671-692
- (1990) Neural Networks , vol.3 , Issue.6 , pp. 671-692
- Gullapalli, V.¹

26
- 44949145650
- Gullapalli, V. (1992). Learning control under extreme uncertainty. In Advances in neural information processing systems (pp. 327-334)
- Gullapalli, V. (1992). Learning control under extreme uncertainty. In Advances in neural information processing systems (pp. 327-334)

27
- 0028381374
- Aquiring robot skills via reinforcement learning
- Gullapalli V., Franklin J., and Benbrahim H. Aquiring robot skills via reinforcement learning. IEEE Control Systems Journal, Special Issue on Robotics: Capturing Natural Motion 4 1 (1994) 13-24
- (1994) IEEE Control Systems Journal, Special Issue on Robotics: Capturing Natural Motion , vol.4 , Issue.1 , pp. 13-24
- Gullapalli, V.¹ Franklin, J.² Benbrahim, H.³

28
- 0004182522
- Springer Verlag, Heidelberg, Germany
- Harville D.A. Matrix algebra from a statistician's perspective (2000), Springer Verlag, Heidelberg, Germany
- (2000) Matrix algebra from a statistician's perspective
- Harville, D.A.¹

29
- 0003859628
- John Wiley & Sons, New York, NY
- Hasdorff L. Gradient optimization and nonlinear control (1976), John Wiley & Sons, New York, NY
- (1976) Gradient optimization and nonlinear control
- Hasdorff, L.¹

30
- 0036059542
- Ijspeert, J.A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (pp. 1398-1403)
- Ijspeert, J.A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (pp. 1398-1403)

31
- 84899019754
- Learning attractor landscapes for learning motor primitives
- Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
- Ijspeert A., Nakanishi J., and Schaal S. Learning attractor landscapes for learning motor primitives. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in neural information processing systems Vol. 15 (2003), MIT Press, Cambridge, MA 1547-1554
- (2003) Advances in neural information processing systems , vol.15 , pp. 1547-1554
- Ijspeert, A.¹ Nakanishi, J.² Schaal, S.³

32
- 0004291983
- American Elsevier Publishing Company, Inc, New York, NY
- Jacobson D.H., and Mayne D.Q. Differential dynamic programming (1970), American Elsevier Publishing Company, Inc, New York, NY
- (1970) Differential dynamic programming
- Jacobson, D.H.¹ Mayne, D.Q.²

33
- 84943252297
- Kakade, S. (2001). Optimizing average reward using discounted rewards. In Proceedings of the conference on computational learning theory (pp. 605-615)
- Kakade, S. (2001). Optimizing average reward using discounted rewards. In Proceedings of the conference on computational learning theory (pp. 605-615)

34
- 84898930479
- Natural policy gradient
- Vancouver, CA
- Kakade S.A. Natural policy gradient. Advances in neural information processing systems Vol. 14 (2002), Vancouver, CA 1531-1538
- (2002) Advances in neural information processing systems , vol.14 , pp. 1531-1538
- Kakade, S.A.¹

35
- 44949231652
- Kakade, S.M. (2003). On the sample complexity of reinforcement learning. Ph.D. thesis, Gatsby computational Neuroscience Unit. University College London, London, UK
- Kakade, S.M. (2003). On the sample complexity of reinforcement learning. Ph.D. thesis, Gatsby computational Neuroscience Unit. University College London, London, UK

36
- 44949245319
- Kimura, H., & Kobayashi, S. (1997). Reinforcement learning for locomotion of a two-linked robot arm. In Proceedings of the Europian workshop on learning robots (pp. 144-153)
- Kimura, H., & Kobayashi, S. (1997). Reinforcement learning for locomotion of a two-linked robot arm. In Proceedings of the Europian workshop on learning robots (pp. 144-153)

37
- 44949221527
- Kimura, H., & Kobayashi, S. (1998). Reinforcement learning for continuous action using stochastic gradient ascent. In Proceedings of the international conference on intelligent autonomous systems (IAS): Vol. 5 (pp. 288-295)
- Kimura, H., & Kobayashi, S. (1998). Reinforcement learning for continuous action using stochastic gradient ascent. In Proceedings of the international conference on intelligent autonomous systems (IAS): Vol. 5 (pp. 288-295)

38
- 0033361754
- Simulation-based optimization with stochastic approximation using common random numbers
- Kleinman N., Spall J., and Naiman D. Simulation-based optimization with stochastic approximation using common random numbers. Management Science 45 (1999) 1570-1578
- (1999) Management Science , vol.45 , pp. 1570-1578
- Kleinman, N.¹ Spall, J.² Naiman, D.³

39
- 3042534761
- Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE international conference on robotics and automation (pp. 2619-2624)
- Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE international conference on robotics and automation (pp. 2619-2624)

40
- 84898938510
- Actor-critic algorithms
- Konda V., and Tsitsiklis J. Actor-critic algorithms. Advances in Neural Information Processing Systems (2000) 12
- (2000) Advances in Neural Information Processing Systems , pp. 12
- Konda, V.¹ Tsitsiklis, J.²

41
- 44949089920
- Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 354-361)
- Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 354-361)

42
- 34250186688
- Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2005). Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1594-1601)
- Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2005). Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1594-1601)

43
- 0029471271
- Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Osu, R. et al. (1995). A kendama learning robot based on a dynamic optimization theory. In Proceedings of the IEEE international workshop on robot and human communication (pp. 327-332)
- Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Osu, R. et al. (1995). A kendama learning robot based on a dynamic optimization theory. In Proceedings of the IEEE international workshop on robot and human communication (pp. 327-332)

44
- 44949208863
- Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Rieka, O. et al. (1996). A kendama learning robot based on a dynamic optimization principle. In Proceedings of the international conference on neural information processing (pp. 938-942)
- Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Rieka, O. et al. (1996). A kendama learning robot based on a dynamic optimization principle. In Proceedings of the international conference on neural information processing (pp. 938-942)

45
- 0004086262
- Prentice Hall, Upper Saddle River, NJ
- Moon T., and Stirling W. Mathematical methods and algorithms for signal processing (2000), Prentice Hall, Upper Saddle River, NJ
- (2000) Mathematical methods and algorithms for signal processing
- Moon, T.¹ Stirling, W.²

46
- 9444286978
- Mori, T., Nakamura, Y., aki Sato, M., & Ishii, S. (2004). Reinforcement learning for cpg-driven biped robot. In Proceedings of the national conference on artificial intelligence (pp. 623-630)
- Mori, T., Nakamura, Y., aki Sato, M., & Ishii, S. (2004). Reinforcement learning for cpg-driven biped robot. In Proceedings of the national conference on artificial intelligence (pp. 623-630)

47
- 44949119864
- Mori, T., Nakamura, Y., & Ishii, S. (2005). Efficient sample reuse by off-policy natural actor-critic learning. In Advances in neural information processing systems (NIPS '05 workshop presentation)
- Mori, T., Nakamura, Y., & Ishii, S. (2005). Efficient sample reuse by off-policy natural actor-critic learning. In Advances in neural information processing systems (NIPS '05 workshop presentation)

48
- 84898963340
- Minimax differential dynamic programming: an application to robust biped walking
- Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
- Morimoto J., and Atkeson C.A. Minimax differential dynamic programming: an application to robust biped walking. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in neural information processing systems 15 (2003), MIT Press, Cambridge, MA 1539-1546
- (2003) Advances in neural information processing systems 15 , pp. 1539-1546
- Morimoto, J.¹ Atkeson, C.A.²

49
- 35048832428
- Nakamura, Y., Mori, T., & Ishii, S. (2004). Natural policy gradient reinforcement learning for a CPG control of a biped robot. In Proceedings of the international conference on parallel problem solving from nature (pp. 972-981)
- Nakamura, Y., Mori, T., & Ishii, S. (2004). Natural policy gradient reinforcement learning for a CPG control of a biped robot. In Proceedings of the international conference on parallel problem solving from nature (pp. 972-981)

50
- 2942603368
- Learning from demonstration and adaptation of biped locomotion
- Nakanishi J., Morimoto J., Endo G., Cheng G., Schaal S., and Kawato M. Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems 47 2-3 (2004) 79-91
- (2004) Robotics and Autonomous Systems , vol.47 , Issue.2-3 , pp. 79-91
- Nakanishi, J.¹ Morimoto, J.² Endo, G.³ Cheng, G.⁴ Schaal, S.⁵ Kawato, M.⁶

51
- 44949098802
- Ng, A.Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 406-415)
- Ng, A.Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 406-415)

52
- 33646831159
- An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm
- Proceedings of the international conference on computational intelligence and security (CIS). Hao Y., Liu J., Wang Y., ming Cheung Y., Yin H., Jiao L., Ma J., and Jiao Y.-C. (Eds), Springer, Xi'an, China
- Park J., Kim J., and Kang D. An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm. In: Hao Y., Liu J., Wang Y., ming Cheung Y., Yin H., Jiao L., Ma J., and Jiao Y.-C. (Eds). Proceedings of the international conference on computational intelligence and security (CIS). Lecture notes in computer science Vol. 3801 (2005), Springer, Xi'an, China 65-72
- (2005) Lecture notes in computer science , vol.3801 , pp. 65-72
- Park, J.¹ Kim, J.² Kang, D.³

53
- 44949159193
- Peters, J. (2005). Machine learning of motor skills for robotics. Technical Report CS-05-867. University of Southern California, Los Angeles, CA
- Peters, J. (2005). Machine learning of motor skills for robotics. Technical Report CS-05-867. University of Southern California, Los Angeles, CA

54
- 44949115480
- Peters, J. (2007). Machine learning of motor skills for robotics. Ph.D. thesis University of Southern California, Los Angeles, CA, 90089, USA
- Peters, J. (2007). Machine learning of motor skills for robotics. Ph.D. thesis University of Southern California, Los Angeles, CA, 90089, USA

55
- 34250635407
- Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 2219-2225)
- Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 2219-2225)

56
- 44949213264
- Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103-123)
- Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103-123)

57
- 33646413135
- Peters, J., Vijayakumar, S., & Schaal, S. (2005a). Natural actor-critic. In Proceedings of the European machine learning conference (pp. 280-291)
- Peters, J., Vijayakumar, S., & Schaal, S. (2005a). Natural actor-critic. In Proceedings of the European machine learning conference (pp. 280-291)

58
- 33646413135
- Natural actor-critic
- Springer
- Peters J., Vijayakumar S., and Schaal S. Natural actor-critic. Proceedings of the European conference on machine learning (2005), Springer 280-291
- (2005) Proceedings of the European conference on machine learning , pp. 280-291
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

59
- 79958014714
- Rapbid synchronization and accurate phase-locking of rhythmic motor primitives
- Pongas D., Billard A., and Schaal S. Rapbid synchronization and accurate phase-locking of rhythmic motor primitives. Proceedings of the IEEE international conference on intelligent robots and systems (IROS 2005) Vol. 2005 (2005) 2911-2916
- (2005) Proceedings of the IEEE international conference on intelligent robots and systems (IROS 2005) , vol.2005 , pp. 2911-2916
- Pongas, D.¹ Billard, A.² Schaal, S.³

60
- 84864064043
- Natural actor-critic for road traffic optimisation
- Schoelkopf B., Platt J., and Hofmann T. (Eds), MIT Press, Cambridge, MA p. Online Preproceedings
- Richter S., Aberdeen D., and Yu J. Natural actor-critic for road traffic optimisation. In: Schoelkopf B., Platt J., and Hofmann T. (Eds). Advances in neural information processing systems Vol. 19 (2007), MIT Press, Cambridge, MA p. Online Preproceedings
- (2007) Advances in neural information processing systems , vol.19
- Richter, S.¹ Aberdeen, D.² Yu, J.³

61
- 0030703463
- Sadegh, P., & Spall, J. (1997). Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. In Proceedings of the american control conference (pp. 3582-3586)
- Sadegh, P., & Spall, J. (1997). Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. In Proceedings of the american control conference (pp. 3582-3586)

62
- 84902174443
- Reinforcement learning for biped locomotion
- Proceedings of the international conference on artificial neural networks (ICANN), Springer-Verlag
- Sato M., Nakamura Y., and Ishii S. Reinforcement learning for biped locomotion. Proceedings of the international conference on artificial neural networks (ICANN). Lecture notes in computer science (2002), Springer-Verlag 777-782
- (2002) Lecture notes in computer science , pp. 777-782
- Sato, M.¹ Nakamura, Y.² Ishii, S.³

63
- 84898995067
- Learning from demonstration
- Mozer M., Jordan M., and Petsche T. (Eds), MIT Press, Cambridge, MA
- Schaal S. Learning from demonstration. In: Mozer M., Jordan M., and Petsche T. (Eds). Advances in neural information processing systems (NIPS) Vol. 9 (1997), MIT Press, Cambridge, MA 1040-1046
- (1997) Advances in neural information processing systems (NIPS) , vol.9 , pp. 1040-1046
- Schaal, S.¹

64
- 84885081372
- Learning movement primitives
- International symposium on robotics research (ISRR2003), Springer, Ciena, Italy
- Schaal S., Peters J., Nakanishi J., and Ijspeert A. Learning movement primitives. International symposium on robotics research (ISRR2003). Springer tracts in advanced robotics (2004), Springer, Ciena, Italy 561-572
- (2004) Springer tracts in advanced robotics , pp. 561-572
- Schaal, S.¹ Peters, J.² Nakanishi, J.³ Ijspeert, A.⁴

65
- 0004039554
- MacGraw-Hill, Heidelberg, Germany
- Sciavicco L., and Siciliano B. Modeling and control of robot manipulators (2007), MacGraw-Hill, Heidelberg, Germany
- (2007) Modeling and control of robot manipulators
- Sciavicco, L.¹ Siciliano, B.²

66
- 85118461987
- Wiley, Hoboken, NJ
- Spall J.C. Introduction to stochastic search and optimization: Estimation, simulation, and control (2003), Wiley, Hoboken, NJ
- (2003) Introduction to stochastic search and optimization: Estimation, simulation, and control
- Spall, J.C.¹

67
- 0036896227
- On choosing and bounding probability metrics
- Su F., and Gibbs A. On choosing and bounding probability metrics. International Statistical Review 70 3 (2002) 419-435
- (2002) International Statistical Review , vol.70 , Issue.3 , pp. 419-435
- Su, F.¹ Gibbs, A.²

68
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Solla S.A., Leen T.K., and Mueller K.-R. (Eds), MIT Press, Denver, CO
- Sutton R.S., McAllester D., Singh S., and Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Solla S.A., Leen T.K., and Mueller K.-R. (Eds). Advances in neural information processing systems (NIPS) (2000), MIT Press, Denver, CO 1057-1063
- (2000) Advances in neural information processing systems (NIPS) , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

69
- 44949216119
- Tedrake, R., Zhang, T.W., & Seung, H.S. (2005). Learning to walk in 20 min. In Proceedings of the Yale workshop on adaptive and learning systems (pp. 10-22). Yale University, New Haven, New Haven, CT
- Tedrake, R., Zhang, T.W., & Seung, H.S. (2005). Learning to walk in 20 min. In Proceedings of the Yale workshop on adaptive and learning systems (pp. 10-22). Yale University, New Haven, New Haven, CT

70
- 34250613580
- Ueno, T., Nakamura, Y., Takuma, T., Shibata, T., Hosoda, K., & Ishii, S. (2006). Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 5226-5231)
- Ueno, T., Nakamura, Y., Takuma, T., Shibata, T., Hosoda, K., & Ishii, S. (2006). Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 5226-5231)

71
- 34247225903
- Springer-Verlag, Heidelberg, Germany
- Vachenauer P., Rade L., and Westergren B. Springers Mathematische Formeln: Taschenbuch für Ingenieure, Naturwissenschaftler, Informatiker, Wirtschaftswissenschaftler (2000), Springer-Verlag, Heidelberg, Germany
- (2000) Springers Mathematische Formeln: Taschenbuch für Ingenieure, Naturwissenschaftler, Informatiker, Wirtschaftswissenschaftler
- Vachenauer, P.¹ Rade, L.² Westergren, B.³

72
- 0027832075
- Trajectory formation of arm movement by a neural network with forward and inverse dynamics models
- Wada Y., and Kawato M. Trajectory formation of arm movement by a neural network with forward and inverse dynamics models. Systems and Computers in Japan 24 (1994) 37-50
- (1994) Systems and Computers in Japan , vol.24 , pp. 37-50
- Wada, Y.¹ Kawato, M.²

73
- 44949160165
- Weaver, L., & Tao, N. (2001a). The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the international conference on uncertainty in artificial intelligence (pp. 538-545). Vol. 17. Seattle, Washington
- Weaver, L., & Tao, N. (2001a). The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the international conference on uncertainty in artificial intelligence (pp. 538-545). Vol. 17. Seattle, Washington

74
- 44949182963
- Weaver, L., & Tao, N. (2001b). The variance minimizing constant reward baseline for gradient-based reinforcement learning. Technical Report 30. Australian National University (ANU)
- Weaver, L., & Tao, N. (2001b). The variance minimizing constant reward baseline for gradient-based reinforcement learning. Technical Report 30. Australian National University (ANU)

75
- 34548729060
- Changes in global policy analysis procedures suggested by new methods of optimization
- Werbos P. Changes in global policy analysis procedures suggested by new methods of optimization. Policy Analysis and Information Systems 3 1 (1979)
- (1979) Policy Analysis and Information Systems , vol.3 , Issue.1
- Werbos, P.¹

76
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 23 (1992)
- (1992) Machine Learning , vol.8 , Issue.23
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.