메뉴 건너뛰기




Volumn 21, Issue 4, 2008, Pages 682-697

Reinforcement learning of motor skills with policy gradients

Author keywords

Motor primitives; Motor skills; Natural Actor Critic; Natural gradients; Policy gradient methods; Reinforcement learning

Indexed keywords

ARTIFICIAL LIMBS; BEHAVIORAL RESEARCH; CONTINUOUS TIME SYSTEMS; ERROR ANALYSIS; GRADIENT METHODS; MOTOR TRANSPORTATION;

EID: 44949241322     PISSN: 08936080     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.neunet.2008.02.003     Document Type: Article
Times cited : (896)

References (76)
  • 1
    • 44949217877 scopus 로고    scopus 로고
    • Aberdeen, D. (2006). POMDPs and policy gradients, presentation at the Machine Learning Summer School (MLSS)
    • Aberdeen, D. (2006). POMDPs and policy gradients, presentation at the Machine Learning Summer School (MLSS)
  • 3
    • 0000396062 scopus 로고    scopus 로고
    • Natural gradient works efficiently in learning
    • Amari S. Natural gradient works efficiently in learning. Neural Computation 10 (1998) 251
    • (1998) Neural Computation , vol.10 , pp. 251
    • Amari, S.1
  • 4
    • 0039816976 scopus 로고
    • Using local trajectory optimizers to speed up global optimization in dynamic programming
    • Hanson J.E., Moody S.J., and Lippmann R.P. (Eds), Morgan Kaufmann
    • Atkeson C.G. Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Hanson J.E., Moody S.J., and Lippmann R.P. (Eds). Advances in neural information processing systems 6 (1994), Morgan Kaufmann 503-521
    • (1994) Advances in neural information processing systems 6 , pp. 503-521
    • Atkeson, C.G.1
  • 5
    • 84858765598 scopus 로고    scopus 로고
    • Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (pp. 1019-1024)
    • Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (pp. 1019-1024)
  • 6
    • 44949159194 scopus 로고    scopus 로고
    • Baird, L. (1993). Advantage updating. Technical Report WL-TR-93-1146. Wright laboratory, Wright-Patterson air force base. OH
    • Baird, L. (1993). Advantage updating. Technical Report WL-TR-93-1146. Wright laboratory, Wright-Patterson air force base. OH
  • 7
    • 0000810448 scopus 로고    scopus 로고
    • Statistical inference, occam's razor, and statistical mechanics on the space of probability distributions
    • Balasubramanian V. Statistical inference, occam's razor, and statistical mechanics on the space of probability distributions. Neural Computation 9 2 (1997) 349-368
    • (1997) Neural Computation , vol.9 , Issue.2 , pp. 349-368
    • Balasubramanian, V.1
  • 11
    • 41549101727 scopus 로고    scopus 로고
    • Minimum acceleration criterion with constraints implies bang-bang control as an underlying principle for optimal trajectories of arm reaching movements
    • Ben-Itzhak S., and Karniel A. Minimum acceleration criterion with constraints implies bang-bang control as an underlying principle for optimal trajectories of arm reaching movements. Neural Computation 20 3 (2008) 779-812
    • (2008) Neural Computation , vol.20 , Issue.3 , pp. 779-812
    • Ben-Itzhak, S.1    Karniel, A.2
  • 12
    • 44949086984 scopus 로고    scopus 로고
    • Benbrahim, H., Doleac, J., Franklin, J., & Selfridge, O. (1992). Real-time learning: A ball on a beam. In Proceedings of the international joint conference on neural networks (pp. 92-103)
    • Benbrahim, H., Doleac, J., Franklin, J., & Selfridge, O. (1992). Real-time learning: A ball on a beam. In Proceedings of the international joint conference on neural networks (pp. 92-103)
  • 13
  • 14
    • 15544375067 scopus 로고    scopus 로고
    • Statistical machine learning and combinatorial optimization
    • Springer-Verlag, Heidelberg, Germany
    • Berny A. Statistical machine learning and combinatorial optimization. Lecture notes in natural computing Vol. 33 (2000), Springer-Verlag, Heidelberg, Germany 287-306
    • (2000) Lecture notes in natural computing , vol.33 , pp. 287-306
    • Berny, A.1
  • 16
    • 29344443108 scopus 로고    scopus 로고
    • Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2005). Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In Proceedings of the national conference on artificial intelligence (pp. 1267-1273)
    • Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2005). Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In Proceedings of the national conference on artificial intelligence (pp. 1267-1273)
  • 17
    • 27944489902 scopus 로고    scopus 로고
    • Motor primitives in vertebrates and invertebrates
    • Flash T., and Hochner B. Motor primitives in vertebrates and invertebrates. Current Opinions in Neurobiology 15 (2005) 660-666
    • (2005) Current Opinions in Neurobiology , vol.15 , pp. 660-666
    • Flash, T.1    Hochner, B.2
  • 19
    • 0012260296 scopus 로고    scopus 로고
    • Feature article: Optimization for simulation: Theory vs. practice
    • Fu M.C. Feature article: Optimization for simulation: Theory vs. practice. INFORMS Journal on Computing 14 3 (2002) 192-215
    • (2002) INFORMS Journal on Computing , vol.14 , Issue.3 , pp. 192-215
    • Fu, M.C.1
  • 20
    • 0023543886 scopus 로고    scopus 로고
    • Glynn, P. (1987). Likelihood ratio gradient estimation: An overview. In Proceedings of the winter simulation conference (pp. 366-375)
    • Glynn, P. (1987). Likelihood ratio gradient estimation: An overview. In Proceedings of the winter simulation conference (pp. 366-375)
  • 21
    • 84976859194 scopus 로고
    • Likelihood ratio gradient estimation for stochastic systems
    • Glynn P. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33 10 (1990) 75-84
    • (1990) Communications of the ACM , vol.33 , Issue.10 , pp. 75-84
    • Glynn, P.1
  • 23
    • 84897694817 scopus 로고    scopus 로고
    • Variance reduction techniques for gradient estimates in reinforcement learning
    • Greensmith E., Bartlett P.L., and Baxter J. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5 (2004) 1471-1530
    • (2004) Journal of Machine Learning Research , vol.5 , pp. 1471-1530
    • Greensmith, E.1    Bartlett, P.L.2    Baxter, J.3
  • 24
    • 34948857495 scopus 로고    scopus 로고
    • Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. In Imitative Robots [Special issue]. RSJ Advanced Robotics, 21, 1521-1544
    • Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. In Imitative Robots [Special issue]. RSJ Advanced Robotics, 21, 1521-1544
  • 25
    • 0025600638 scopus 로고
    • A stochastic reinforcement learning algorithm for learning real-valued functions
    • Gullapalli V. A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks 3 6 (1990) 671-692
    • (1990) Neural Networks , vol.3 , Issue.6 , pp. 671-692
    • Gullapalli, V.1
  • 26
    • 44949145650 scopus 로고    scopus 로고
    • Gullapalli, V. (1992). Learning control under extreme uncertainty. In Advances in neural information processing systems (pp. 327-334)
    • Gullapalli, V. (1992). Learning control under extreme uncertainty. In Advances in neural information processing systems (pp. 327-334)
  • 30
    • 0036059542 scopus 로고    scopus 로고
    • Ijspeert, J.A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (pp. 1398-1403)
    • Ijspeert, J.A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (pp. 1398-1403)
  • 31
    • 84899019754 scopus 로고    scopus 로고
    • Learning attractor landscapes for learning motor primitives
    • Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
    • Ijspeert A., Nakanishi J., and Schaal S. Learning attractor landscapes for learning motor primitives. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in neural information processing systems Vol. 15 (2003), MIT Press, Cambridge, MA 1547-1554
    • (2003) Advances in neural information processing systems , vol.15 , pp. 1547-1554
    • Ijspeert, A.1    Nakanishi, J.2    Schaal, S.3
  • 33
    • 84943252297 scopus 로고    scopus 로고
    • Kakade, S. (2001). Optimizing average reward using discounted rewards. In Proceedings of the conference on computational learning theory (pp. 605-615)
    • Kakade, S. (2001). Optimizing average reward using discounted rewards. In Proceedings of the conference on computational learning theory (pp. 605-615)
  • 35
    • 44949231652 scopus 로고    scopus 로고
    • Kakade, S.M. (2003). On the sample complexity of reinforcement learning. Ph.D. thesis, Gatsby computational Neuroscience Unit. University College London, London, UK
    • Kakade, S.M. (2003). On the sample complexity of reinforcement learning. Ph.D. thesis, Gatsby computational Neuroscience Unit. University College London, London, UK
  • 36
    • 44949245319 scopus 로고    scopus 로고
    • Kimura, H., & Kobayashi, S. (1997). Reinforcement learning for locomotion of a two-linked robot arm. In Proceedings of the Europian workshop on learning robots (pp. 144-153)
    • Kimura, H., & Kobayashi, S. (1997). Reinforcement learning for locomotion of a two-linked robot arm. In Proceedings of the Europian workshop on learning robots (pp. 144-153)
  • 37
    • 44949221527 scopus 로고    scopus 로고
    • Kimura, H., & Kobayashi, S. (1998). Reinforcement learning for continuous action using stochastic gradient ascent. In Proceedings of the international conference on intelligent autonomous systems (IAS): Vol. 5 (pp. 288-295)
    • Kimura, H., & Kobayashi, S. (1998). Reinforcement learning for continuous action using stochastic gradient ascent. In Proceedings of the international conference on intelligent autonomous systems (IAS): Vol. 5 (pp. 288-295)
  • 38
    • 0033361754 scopus 로고    scopus 로고
    • Simulation-based optimization with stochastic approximation using common random numbers
    • Kleinman N., Spall J., and Naiman D. Simulation-based optimization with stochastic approximation using common random numbers. Management Science 45 (1999) 1570-1578
    • (1999) Management Science , vol.45 , pp. 1570-1578
    • Kleinman, N.1    Spall, J.2    Naiman, D.3
  • 39
    • 3042534761 scopus 로고    scopus 로고
    • Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE international conference on robotics and automation (pp. 2619-2624)
    • Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE international conference on robotics and automation (pp. 2619-2624)
  • 41
    • 44949089920 scopus 로고    scopus 로고
    • Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 354-361)
    • Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 354-361)
  • 42
    • 34250186688 scopus 로고    scopus 로고
    • Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2005). Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1594-1601)
    • Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2005). Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1594-1601)
  • 43
    • 0029471271 scopus 로고    scopus 로고
    • Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Osu, R. et al. (1995). A kendama learning robot based on a dynamic optimization theory. In Proceedings of the IEEE international workshop on robot and human communication (pp. 327-332)
    • Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Osu, R. et al. (1995). A kendama learning robot based on a dynamic optimization theory. In Proceedings of the IEEE international workshop on robot and human communication (pp. 327-332)
  • 44
    • 44949208863 scopus 로고    scopus 로고
    • Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Rieka, O. et al. (1996). A kendama learning robot based on a dynamic optimization principle. In Proceedings of the international conference on neural information processing (pp. 938-942)
    • Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Rieka, O. et al. (1996). A kendama learning robot based on a dynamic optimization principle. In Proceedings of the international conference on neural information processing (pp. 938-942)
  • 46
    • 9444286978 scopus 로고    scopus 로고
    • Mori, T., Nakamura, Y., aki Sato, M., & Ishii, S. (2004). Reinforcement learning for cpg-driven biped robot. In Proceedings of the national conference on artificial intelligence (pp. 623-630)
    • Mori, T., Nakamura, Y., aki Sato, M., & Ishii, S. (2004). Reinforcement learning for cpg-driven biped robot. In Proceedings of the national conference on artificial intelligence (pp. 623-630)
  • 47
    • 44949119864 scopus 로고    scopus 로고
    • Mori, T., Nakamura, Y., & Ishii, S. (2005). Efficient sample reuse by off-policy natural actor-critic learning. In Advances in neural information processing systems (NIPS '05 workshop presentation)
    • Mori, T., Nakamura, Y., & Ishii, S. (2005). Efficient sample reuse by off-policy natural actor-critic learning. In Advances in neural information processing systems (NIPS '05 workshop presentation)
  • 48
    • 84898963340 scopus 로고    scopus 로고
    • Minimax differential dynamic programming: an application to robust biped walking
    • Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
    • Morimoto J., and Atkeson C.A. Minimax differential dynamic programming: an application to robust biped walking. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in neural information processing systems 15 (2003), MIT Press, Cambridge, MA 1539-1546
    • (2003) Advances in neural information processing systems 15 , pp. 1539-1546
    • Morimoto, J.1    Atkeson, C.A.2
  • 49
    • 35048832428 scopus 로고    scopus 로고
    • Nakamura, Y., Mori, T., & Ishii, S. (2004). Natural policy gradient reinforcement learning for a CPG control of a biped robot. In Proceedings of the international conference on parallel problem solving from nature (pp. 972-981)
    • Nakamura, Y., Mori, T., & Ishii, S. (2004). Natural policy gradient reinforcement learning for a CPG control of a biped robot. In Proceedings of the international conference on parallel problem solving from nature (pp. 972-981)
  • 51
    • 44949098802 scopus 로고    scopus 로고
    • Ng, A.Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 406-415)
    • Ng, A.Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 406-415)
  • 52
    • 33646831159 scopus 로고    scopus 로고
    • An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm
    • Proceedings of the international conference on computational intelligence and security (CIS). Hao Y., Liu J., Wang Y., ming Cheung Y., Yin H., Jiao L., Ma J., and Jiao Y.-C. (Eds), Springer, Xi'an, China
    • Park J., Kim J., and Kang D. An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm. In: Hao Y., Liu J., Wang Y., ming Cheung Y., Yin H., Jiao L., Ma J., and Jiao Y.-C. (Eds). Proceedings of the international conference on computational intelligence and security (CIS). Lecture notes in computer science Vol. 3801 (2005), Springer, Xi'an, China 65-72
    • (2005) Lecture notes in computer science , vol.3801 , pp. 65-72
    • Park, J.1    Kim, J.2    Kang, D.3
  • 53
    • 44949159193 scopus 로고    scopus 로고
    • Peters, J. (2005). Machine learning of motor skills for robotics. Technical Report CS-05-867. University of Southern California, Los Angeles, CA
    • Peters, J. (2005). Machine learning of motor skills for robotics. Technical Report CS-05-867. University of Southern California, Los Angeles, CA
  • 54
    • 44949115480 scopus 로고    scopus 로고
    • Peters, J. (2007). Machine learning of motor skills for robotics. Ph.D. thesis University of Southern California, Los Angeles, CA, 90089, USA
    • Peters, J. (2007). Machine learning of motor skills for robotics. Ph.D. thesis University of Southern California, Los Angeles, CA, 90089, USA
  • 55
    • 34250635407 scopus 로고    scopus 로고
    • Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 2219-2225)
    • Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 2219-2225)
  • 56
    • 44949213264 scopus 로고    scopus 로고
    • Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103-123)
    • Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103-123)
  • 57
    • 33646413135 scopus 로고    scopus 로고
    • Peters, J., Vijayakumar, S., & Schaal, S. (2005a). Natural actor-critic. In Proceedings of the European machine learning conference (pp. 280-291)
    • Peters, J., Vijayakumar, S., & Schaal, S. (2005a). Natural actor-critic. In Proceedings of the European machine learning conference (pp. 280-291)
  • 60
    • 84864064043 scopus 로고    scopus 로고
    • Natural actor-critic for road traffic optimisation
    • Schoelkopf B., Platt J., and Hofmann T. (Eds), MIT Press, Cambridge, MA p. Online Preproceedings
    • Richter S., Aberdeen D., and Yu J. Natural actor-critic for road traffic optimisation. In: Schoelkopf B., Platt J., and Hofmann T. (Eds). Advances in neural information processing systems Vol. 19 (2007), MIT Press, Cambridge, MA p. Online Preproceedings
    • (2007) Advances in neural information processing systems , vol.19
    • Richter, S.1    Aberdeen, D.2    Yu, J.3
  • 61
    • 0030703463 scopus 로고    scopus 로고
    • Sadegh, P., & Spall, J. (1997). Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. In Proceedings of the american control conference (pp. 3582-3586)
    • Sadegh, P., & Spall, J. (1997). Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. In Proceedings of the american control conference (pp. 3582-3586)
  • 62
    • 84902174443 scopus 로고    scopus 로고
    • Reinforcement learning for biped locomotion
    • Proceedings of the international conference on artificial neural networks (ICANN), Springer-Verlag
    • Sato M., Nakamura Y., and Ishii S. Reinforcement learning for biped locomotion. Proceedings of the international conference on artificial neural networks (ICANN). Lecture notes in computer science (2002), Springer-Verlag 777-782
    • (2002) Lecture notes in computer science , pp. 777-782
    • Sato, M.1    Nakamura, Y.2    Ishii, S.3
  • 63
    • 84898995067 scopus 로고    scopus 로고
    • Learning from demonstration
    • Mozer M., Jordan M., and Petsche T. (Eds), MIT Press, Cambridge, MA
    • Schaal S. Learning from demonstration. In: Mozer M., Jordan M., and Petsche T. (Eds). Advances in neural information processing systems (NIPS) Vol. 9 (1997), MIT Press, Cambridge, MA 1040-1046
    • (1997) Advances in neural information processing systems (NIPS) , vol.9 , pp. 1040-1046
    • Schaal, S.1
  • 64
    • 84885081372 scopus 로고    scopus 로고
    • Learning movement primitives
    • International symposium on robotics research (ISRR2003), Springer, Ciena, Italy
    • Schaal S., Peters J., Nakanishi J., and Ijspeert A. Learning movement primitives. International symposium on robotics research (ISRR2003). Springer tracts in advanced robotics (2004), Springer, Ciena, Italy 561-572
    • (2004) Springer tracts in advanced robotics , pp. 561-572
    • Schaal, S.1    Peters, J.2    Nakanishi, J.3    Ijspeert, A.4
  • 67
    • 0036896227 scopus 로고    scopus 로고
    • On choosing and bounding probability metrics
    • Su F., and Gibbs A. On choosing and bounding probability metrics. International Statistical Review 70 3 (2002) 419-435
    • (2002) International Statistical Review , vol.70 , Issue.3 , pp. 419-435
    • Su, F.1    Gibbs, A.2
  • 68
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Solla S.A., Leen T.K., and Mueller K.-R. (Eds), MIT Press, Denver, CO
    • Sutton R.S., McAllester D., Singh S., and Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Solla S.A., Leen T.K., and Mueller K.-R. (Eds). Advances in neural information processing systems (NIPS) (2000), MIT Press, Denver, CO 1057-1063
    • (2000) Advances in neural information processing systems (NIPS) , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 69
    • 44949216119 scopus 로고    scopus 로고
    • Tedrake, R., Zhang, T.W., & Seung, H.S. (2005). Learning to walk in 20 min. In Proceedings of the Yale workshop on adaptive and learning systems (pp. 10-22). Yale University, New Haven, New Haven, CT
    • Tedrake, R., Zhang, T.W., & Seung, H.S. (2005). Learning to walk in 20 min. In Proceedings of the Yale workshop on adaptive and learning systems (pp. 10-22). Yale University, New Haven, New Haven, CT
  • 70
    • 34250613580 scopus 로고    scopus 로고
    • Ueno, T., Nakamura, Y., Takuma, T., Shibata, T., Hosoda, K., & Ishii, S. (2006). Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 5226-5231)
    • Ueno, T., Nakamura, Y., Takuma, T., Shibata, T., Hosoda, K., & Ishii, S. (2006). Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 5226-5231)
  • 72
    • 0027832075 scopus 로고
    • Trajectory formation of arm movement by a neural network with forward and inverse dynamics models
    • Wada Y., and Kawato M. Trajectory formation of arm movement by a neural network with forward and inverse dynamics models. Systems and Computers in Japan 24 (1994) 37-50
    • (1994) Systems and Computers in Japan , vol.24 , pp. 37-50
    • Wada, Y.1    Kawato, M.2
  • 73
    • 44949160165 scopus 로고    scopus 로고
    • Weaver, L., & Tao, N. (2001a). The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the international conference on uncertainty in artificial intelligence (pp. 538-545). Vol. 17. Seattle, Washington
    • Weaver, L., & Tao, N. (2001a). The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the international conference on uncertainty in artificial intelligence (pp. 538-545). Vol. 17. Seattle, Washington
  • 74
    • 44949182963 scopus 로고    scopus 로고
    • Weaver, L., & Tao, N. (2001b). The variance minimizing constant reward baseline for gradient-based reinforcement learning. Technical Report 30. Australian National University (ANU)
    • Weaver, L., & Tao, N. (2001b). The variance minimizing constant reward baseline for gradient-based reinforcement learning. Technical Report 30. Australian National University (ANU)
  • 75
    • 34548729060 scopus 로고
    • Changes in global policy analysis procedures suggested by new methods of optimization
    • Werbos P. Changes in global policy analysis procedures suggested by new methods of optimization. Policy Analysis and Information Systems 3 1 (1979)
    • (1979) Policy Analysis and Information Systems , vol.3 , Issue.1
    • Werbos, P.1
  • 76
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 23 (1992)
    • (1992) Machine Learning , vol.8 , Issue.23
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.