-
1
-
-
44949217877
-
-
Aberdeen, D. (2006). POMDPs and policy gradients, presentation at the Machine Learning Summer School (MLSS)
-
Aberdeen, D. (2006). POMDPs and policy gradients, presentation at the Machine Learning Summer School (MLSS)
-
-
-
-
3
-
-
0000396062
-
Natural gradient works efficiently in learning
-
Amari S. Natural gradient works efficiently in learning. Neural Computation 10 (1998) 251
-
(1998)
Neural Computation
, vol.10
, pp. 251
-
-
Amari, S.1
-
4
-
-
0039816976
-
Using local trajectory optimizers to speed up global optimization in dynamic programming
-
Hanson J.E., Moody S.J., and Lippmann R.P. (Eds), Morgan Kaufmann
-
Atkeson C.G. Using local trajectory optimizers to speed up global optimization in dynamic programming. In: Hanson J.E., Moody S.J., and Lippmann R.P. (Eds). Advances in neural information processing systems 6 (1994), Morgan Kaufmann 503-521
-
(1994)
Advances in neural information processing systems 6
, pp. 503-521
-
-
Atkeson, C.G.1
-
5
-
-
84858765598
-
-
Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (pp. 1019-1024)
-
Bagnell, J., & Schneider, J. (2003). Covariant policy search. In Proceedings of the international joint conference on artificial intelligence (pp. 1019-1024)
-
-
-
-
6
-
-
44949159194
-
-
Baird, L. (1993). Advantage updating. Technical Report WL-TR-93-1146. Wright laboratory, Wright-Patterson air force base. OH
-
Baird, L. (1993). Advantage updating. Technical Report WL-TR-93-1146. Wright laboratory, Wright-Patterson air force base. OH
-
-
-
-
7
-
-
0000810448
-
Statistical inference, occam's razor, and statistical mechanics on the space of probability distributions
-
Balasubramanian V. Statistical inference, occam's razor, and statistical mechanics on the space of probability distributions. Neural Computation 9 2 (1997) 349-368
-
(1997)
Neural Computation
, vol.9
, Issue.2
, pp. 349-368
-
-
Balasubramanian, V.1
-
8
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
Barto A.G., Sutton R.S., and Anderson C.W. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC 13 5 (1983) 115-133
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics SMC
, vol.13
, Issue.5
, pp. 115-133
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
11
-
-
41549101727
-
Minimum acceleration criterion with constraints implies bang-bang control as an underlying principle for optimal trajectories of arm reaching movements
-
Ben-Itzhak S., and Karniel A. Minimum acceleration criterion with constraints implies bang-bang control as an underlying principle for optimal trajectories of arm reaching movements. Neural Computation 20 3 (2008) 779-812
-
(2008)
Neural Computation
, vol.20
, Issue.3
, pp. 779-812
-
-
Ben-Itzhak, S.1
Karniel, A.2
-
12
-
-
44949086984
-
-
Benbrahim, H., Doleac, J., Franklin, J., & Selfridge, O. (1992). Real-time learning: A ball on a beam. In Proceedings of the international joint conference on neural networks (pp. 92-103)
-
Benbrahim, H., Doleac, J., Franklin, J., & Selfridge, O. (1992). Real-time learning: A ball on a beam. In Proceedings of the international joint conference on neural networks (pp. 92-103)
-
-
-
-
14
-
-
15544375067
-
Statistical machine learning and combinatorial optimization
-
Springer-Verlag, Heidelberg, Germany
-
Berny A. Statistical machine learning and combinatorial optimization. Lecture notes in natural computing Vol. 33 (2000), Springer-Verlag, Heidelberg, Germany 287-306
-
(2000)
Lecture notes in natural computing
, vol.33
, pp. 287-306
-
-
Berny, A.1
-
16
-
-
29344443108
-
-
Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2005). Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In Proceedings of the national conference on artificial intelligence (pp. 1267-1273)
-
Endo, G., Morimoto, J., Matsubara, T., Nakanishi, J., & Cheng, G. (2005). Learning cpg sensory feedback with policy gradient for biped locomotion for a full-body humanoid. In Proceedings of the national conference on artificial intelligence (pp. 1267-1273)
-
-
-
-
17
-
-
27944489902
-
Motor primitives in vertebrates and invertebrates
-
Flash T., and Hochner B. Motor primitives in vertebrates and invertebrates. Current Opinions in Neurobiology 15 (2005) 660-666
-
(2005)
Current Opinions in Neurobiology
, vol.15
, pp. 660-666
-
-
Flash, T.1
Hochner, B.2
-
19
-
-
0012260296
-
Feature article: Optimization for simulation: Theory vs. practice
-
Fu M.C. Feature article: Optimization for simulation: Theory vs. practice. INFORMS Journal on Computing 14 3 (2002) 192-215
-
(2002)
INFORMS Journal on Computing
, vol.14
, Issue.3
, pp. 192-215
-
-
Fu, M.C.1
-
20
-
-
0023543886
-
-
Glynn, P. (1987). Likelihood ratio gradient estimation: An overview. In Proceedings of the winter simulation conference (pp. 366-375)
-
Glynn, P. (1987). Likelihood ratio gradient estimation: An overview. In Proceedings of the winter simulation conference (pp. 366-375)
-
-
-
-
21
-
-
84976859194
-
Likelihood ratio gradient estimation for stochastic systems
-
Glynn P. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM 33 10 (1990) 75-84
-
(1990)
Communications of the ACM
, vol.33
, Issue.10
, pp. 75-84
-
-
Glynn, P.1
-
24
-
-
34948857495
-
-
Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. In Imitative Robots [Special issue]. RSJ Advanced Robotics, 21, 1521-1544
-
Guenter, F., Hersch, M., Calinon, S., & Billard, A. (2007). Reinforcement learning for imitating constrained reaching movements. In Imitative Robots [Special issue]. RSJ Advanced Robotics, 21, 1521-1544
-
-
-
-
25
-
-
0025600638
-
A stochastic reinforcement learning algorithm for learning real-valued functions
-
Gullapalli V. A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks 3 6 (1990) 671-692
-
(1990)
Neural Networks
, vol.3
, Issue.6
, pp. 671-692
-
-
Gullapalli, V.1
-
26
-
-
44949145650
-
-
Gullapalli, V. (1992). Learning control under extreme uncertainty. In Advances in neural information processing systems (pp. 327-334)
-
Gullapalli, V. (1992). Learning control under extreme uncertainty. In Advances in neural information processing systems (pp. 327-334)
-
-
-
-
27
-
-
0028381374
-
Aquiring robot skills via reinforcement learning
-
Gullapalli V., Franklin J., and Benbrahim H. Aquiring robot skills via reinforcement learning. IEEE Control Systems Journal, Special Issue on Robotics: Capturing Natural Motion 4 1 (1994) 13-24
-
(1994)
IEEE Control Systems Journal, Special Issue on Robotics: Capturing Natural Motion
, vol.4
, Issue.1
, pp. 13-24
-
-
Gullapalli, V.1
Franklin, J.2
Benbrahim, H.3
-
30
-
-
0036059542
-
-
Ijspeert, J.A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (pp. 1398-1403)
-
Ijspeert, J.A., Nakanishi, J., & Schaal, S. (2002). Movement imitation with nonlinear dynamical systems in humanoid robots. In Proceedings of IEEE international conference on robotics and automation (pp. 1398-1403)
-
-
-
-
31
-
-
84899019754
-
Learning attractor landscapes for learning motor primitives
-
Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
-
Ijspeert A., Nakanishi J., and Schaal S. Learning attractor landscapes for learning motor primitives. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in neural information processing systems Vol. 15 (2003), MIT Press, Cambridge, MA 1547-1554
-
(2003)
Advances in neural information processing systems
, vol.15
, pp. 1547-1554
-
-
Ijspeert, A.1
Nakanishi, J.2
Schaal, S.3
-
32
-
-
0004291983
-
-
American Elsevier Publishing Company, Inc, New York, NY
-
Jacobson D.H., and Mayne D.Q. Differential dynamic programming (1970), American Elsevier Publishing Company, Inc, New York, NY
-
(1970)
Differential dynamic programming
-
-
Jacobson, D.H.1
Mayne, D.Q.2
-
33
-
-
84943252297
-
-
Kakade, S. (2001). Optimizing average reward using discounted rewards. In Proceedings of the conference on computational learning theory (pp. 605-615)
-
Kakade, S. (2001). Optimizing average reward using discounted rewards. In Proceedings of the conference on computational learning theory (pp. 605-615)
-
-
-
-
35
-
-
44949231652
-
-
Kakade, S.M. (2003). On the sample complexity of reinforcement learning. Ph.D. thesis, Gatsby computational Neuroscience Unit. University College London, London, UK
-
Kakade, S.M. (2003). On the sample complexity of reinforcement learning. Ph.D. thesis, Gatsby computational Neuroscience Unit. University College London, London, UK
-
-
-
-
36
-
-
44949245319
-
-
Kimura, H., & Kobayashi, S. (1997). Reinforcement learning for locomotion of a two-linked robot arm. In Proceedings of the Europian workshop on learning robots (pp. 144-153)
-
Kimura, H., & Kobayashi, S. (1997). Reinforcement learning for locomotion of a two-linked robot arm. In Proceedings of the Europian workshop on learning robots (pp. 144-153)
-
-
-
-
37
-
-
44949221527
-
-
Kimura, H., & Kobayashi, S. (1998). Reinforcement learning for continuous action using stochastic gradient ascent. In Proceedings of the international conference on intelligent autonomous systems (IAS): Vol. 5 (pp. 288-295)
-
Kimura, H., & Kobayashi, S. (1998). Reinforcement learning for continuous action using stochastic gradient ascent. In Proceedings of the international conference on intelligent autonomous systems (IAS): Vol. 5 (pp. 288-295)
-
-
-
-
38
-
-
0033361754
-
Simulation-based optimization with stochastic approximation using common random numbers
-
Kleinman N., Spall J., and Naiman D. Simulation-based optimization with stochastic approximation using common random numbers. Management Science 45 (1999) 1570-1578
-
(1999)
Management Science
, vol.45
, pp. 1570-1578
-
-
Kleinman, N.1
Spall, J.2
Naiman, D.3
-
39
-
-
3042534761
-
-
Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE international conference on robotics and automation (pp. 2619-2624)
-
Kohl, N., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE international conference on robotics and automation (pp. 2619-2624)
-
-
-
-
41
-
-
44949089920
-
-
Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 354-361)
-
Lawrence, G., Cowan, N., & Russell, S. (2003). Efficient gradient estimation for motor control learning. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 354-361)
-
-
-
-
42
-
-
34250186688
-
-
Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2005). Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1594-1601)
-
Mitsunaga, N., Smith, C., Kanda, T., Ishiguro, H., & Hagita, N. (2005). Robot behavior adaptation for human-robot interaction based on policy gradient reinforcement learning. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 1594-1601)
-
-
-
-
43
-
-
0029471271
-
-
Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Osu, R. et al. (1995). A kendama learning robot based on a dynamic optimization theory. In Proceedings of the IEEE international workshop on robot and human communication (pp. 327-332)
-
Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Osu, R. et al. (1995). A kendama learning robot based on a dynamic optimization theory. In Proceedings of the IEEE international workshop on robot and human communication (pp. 327-332)
-
-
-
-
44
-
-
44949208863
-
-
Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Rieka, O. et al. (1996). A kendama learning robot based on a dynamic optimization principle. In Proceedings of the international conference on neural information processing (pp. 938-942)
-
Miyamoto, H., Gandolfo, F., Gomi, H., Schaal, S., Koike, Y., & Rieka, O. et al. (1996). A kendama learning robot based on a dynamic optimization principle. In Proceedings of the international conference on neural information processing (pp. 938-942)
-
-
-
-
46
-
-
9444286978
-
-
Mori, T., Nakamura, Y., aki Sato, M., & Ishii, S. (2004). Reinforcement learning for cpg-driven biped robot. In Proceedings of the national conference on artificial intelligence (pp. 623-630)
-
Mori, T., Nakamura, Y., aki Sato, M., & Ishii, S. (2004). Reinforcement learning for cpg-driven biped robot. In Proceedings of the national conference on artificial intelligence (pp. 623-630)
-
-
-
-
47
-
-
44949119864
-
-
Mori, T., Nakamura, Y., & Ishii, S. (2005). Efficient sample reuse by off-policy natural actor-critic learning. In Advances in neural information processing systems (NIPS '05 workshop presentation)
-
Mori, T., Nakamura, Y., & Ishii, S. (2005). Efficient sample reuse by off-policy natural actor-critic learning. In Advances in neural information processing systems (NIPS '05 workshop presentation)
-
-
-
-
48
-
-
84898963340
-
Minimax differential dynamic programming: an application to robust biped walking
-
Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
-
Morimoto J., and Atkeson C.A. Minimax differential dynamic programming: an application to robust biped walking. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in neural information processing systems 15 (2003), MIT Press, Cambridge, MA 1539-1546
-
(2003)
Advances in neural information processing systems 15
, pp. 1539-1546
-
-
Morimoto, J.1
Atkeson, C.A.2
-
49
-
-
35048832428
-
-
Nakamura, Y., Mori, T., & Ishii, S. (2004). Natural policy gradient reinforcement learning for a CPG control of a biped robot. In Proceedings of the international conference on parallel problem solving from nature (pp. 972-981)
-
Nakamura, Y., Mori, T., & Ishii, S. (2004). Natural policy gradient reinforcement learning for a CPG control of a biped robot. In Proceedings of the international conference on parallel problem solving from nature (pp. 972-981)
-
-
-
-
50
-
-
2942603368
-
Learning from demonstration and adaptation of biped locomotion
-
Nakanishi J., Morimoto J., Endo G., Cheng G., Schaal S., and Kawato M. Learning from demonstration and adaptation of biped locomotion. Robotics and Autonomous Systems 47 2-3 (2004) 79-91
-
(2004)
Robotics and Autonomous Systems
, vol.47
, Issue.2-3
, pp. 79-91
-
-
Nakanishi, J.1
Morimoto, J.2
Endo, G.3
Cheng, G.4
Schaal, S.5
Kawato, M.6
-
51
-
-
44949098802
-
-
Ng, A.Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 406-415)
-
Ng, A.Y., & Jordan, M. (2000). PEGASUS: A policy search method for large MDPs and POMDPs. In Proceedings of the international conference on uncertainty in artificial intelligence (pp. 406-415)
-
-
-
-
52
-
-
33646831159
-
An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm
-
Proceedings of the international conference on computational intelligence and security (CIS). Hao Y., Liu J., Wang Y., ming Cheung Y., Yin H., Jiao L., Ma J., and Jiao Y.-C. (Eds), Springer, Xi'an, China
-
Park J., Kim J., and Kang D. An RLS-based natural actor-critic algorithm for locomotion of a two-linked robot arm. In: Hao Y., Liu J., Wang Y., ming Cheung Y., Yin H., Jiao L., Ma J., and Jiao Y.-C. (Eds). Proceedings of the international conference on computational intelligence and security (CIS). Lecture notes in computer science Vol. 3801 (2005), Springer, Xi'an, China 65-72
-
(2005)
Lecture notes in computer science
, vol.3801
, pp. 65-72
-
-
Park, J.1
Kim, J.2
Kang, D.3
-
53
-
-
44949159193
-
-
Peters, J. (2005). Machine learning of motor skills for robotics. Technical Report CS-05-867. University of Southern California, Los Angeles, CA
-
Peters, J. (2005). Machine learning of motor skills for robotics. Technical Report CS-05-867. University of Southern California, Los Angeles, CA
-
-
-
-
54
-
-
44949115480
-
-
Peters, J. (2007). Machine learning of motor skills for robotics. Ph.D. thesis University of Southern California, Los Angeles, CA, 90089, USA
-
Peters, J. (2007). Machine learning of motor skills for robotics. Ph.D. thesis University of Southern California, Los Angeles, CA, 90089, USA
-
-
-
-
55
-
-
34250635407
-
-
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 2219-2225)
-
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 2219-2225)
-
-
-
-
56
-
-
44949213264
-
-
Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103-123)
-
Peters, J., Vijayakumar, S., & Schaal, S. (2003). Reinforcement learning for humanoid robotics. In Proceedings of the IEEE-RAS international conference on humanoid robots (HUMANOIDS) (pp. 103-123)
-
-
-
-
57
-
-
33646413135
-
-
Peters, J., Vijayakumar, S., & Schaal, S. (2005a). Natural actor-critic. In Proceedings of the European machine learning conference (pp. 280-291)
-
Peters, J., Vijayakumar, S., & Schaal, S. (2005a). Natural actor-critic. In Proceedings of the European machine learning conference (pp. 280-291)
-
-
-
-
60
-
-
84864064043
-
Natural actor-critic for road traffic optimisation
-
Schoelkopf B., Platt J., and Hofmann T. (Eds), MIT Press, Cambridge, MA p. Online Preproceedings
-
Richter S., Aberdeen D., and Yu J. Natural actor-critic for road traffic optimisation. In: Schoelkopf B., Platt J., and Hofmann T. (Eds). Advances in neural information processing systems Vol. 19 (2007), MIT Press, Cambridge, MA p. Online Preproceedings
-
(2007)
Advances in neural information processing systems
, vol.19
-
-
Richter, S.1
Aberdeen, D.2
Yu, J.3
-
61
-
-
0030703463
-
-
Sadegh, P., & Spall, J. (1997). Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. In Proceedings of the american control conference (pp. 3582-3586)
-
Sadegh, P., & Spall, J. (1997). Optimal random perturbations for stochastic approximation using a simultaneous perturbation gradient approximation. In Proceedings of the american control conference (pp. 3582-3586)
-
-
-
-
62
-
-
84902174443
-
Reinforcement learning for biped locomotion
-
Proceedings of the international conference on artificial neural networks (ICANN), Springer-Verlag
-
Sato M., Nakamura Y., and Ishii S. Reinforcement learning for biped locomotion. Proceedings of the international conference on artificial neural networks (ICANN). Lecture notes in computer science (2002), Springer-Verlag 777-782
-
(2002)
Lecture notes in computer science
, pp. 777-782
-
-
Sato, M.1
Nakamura, Y.2
Ishii, S.3
-
63
-
-
84898995067
-
Learning from demonstration
-
Mozer M., Jordan M., and Petsche T. (Eds), MIT Press, Cambridge, MA
-
Schaal S. Learning from demonstration. In: Mozer M., Jordan M., and Petsche T. (Eds). Advances in neural information processing systems (NIPS) Vol. 9 (1997), MIT Press, Cambridge, MA 1040-1046
-
(1997)
Advances in neural information processing systems (NIPS)
, vol.9
, pp. 1040-1046
-
-
Schaal, S.1
-
64
-
-
84885081372
-
Learning movement primitives
-
International symposium on robotics research (ISRR2003), Springer, Ciena, Italy
-
Schaal S., Peters J., Nakanishi J., and Ijspeert A. Learning movement primitives. International symposium on robotics research (ISRR2003). Springer tracts in advanced robotics (2004), Springer, Ciena, Italy 561-572
-
(2004)
Springer tracts in advanced robotics
, pp. 561-572
-
-
Schaal, S.1
Peters, J.2
Nakanishi, J.3
Ijspeert, A.4
-
66
-
-
0013025914
-
-
Wiley, Hoboken, NJ
-
Spall J.C. Introduction to stochastic search and optimization: Estimation, simulation, and control (2003), Wiley, Hoboken, NJ
-
(2003)
Introduction to stochastic search and optimization: Estimation, simulation, and control
-
-
Spall, J.C.1
-
67
-
-
0036896227
-
On choosing and bounding probability metrics
-
Su F., and Gibbs A. On choosing and bounding probability metrics. International Statistical Review 70 3 (2002) 419-435
-
(2002)
International Statistical Review
, vol.70
, Issue.3
, pp. 419-435
-
-
Su, F.1
Gibbs, A.2
-
68
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Solla S.A., Leen T.K., and Mueller K.-R. (Eds), MIT Press, Denver, CO
-
Sutton R.S., McAllester D., Singh S., and Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Solla S.A., Leen T.K., and Mueller K.-R. (Eds). Advances in neural information processing systems (NIPS) (2000), MIT Press, Denver, CO 1057-1063
-
(2000)
Advances in neural information processing systems (NIPS)
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
69
-
-
44949216119
-
-
Tedrake, R., Zhang, T.W., & Seung, H.S. (2005). Learning to walk in 20 min. In Proceedings of the Yale workshop on adaptive and learning systems (pp. 10-22). Yale University, New Haven, New Haven, CT
-
Tedrake, R., Zhang, T.W., & Seung, H.S. (2005). Learning to walk in 20 min. In Proceedings of the Yale workshop on adaptive and learning systems (pp. 10-22). Yale University, New Haven, New Haven, CT
-
-
-
-
70
-
-
34250613580
-
-
Ueno, T., Nakamura, Y., Takuma, T., Shibata, T., Hosoda, K., & Ishii, S. (2006). Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 5226-5231)
-
Ueno, T., Nakamura, Y., Takuma, T., Shibata, T., Hosoda, K., & Ishii, S. (2006). Fast and stable learning of quasi-passive dynamic walking by an unstable biped robot based on off-policy natural actor-critic. In Proceedings of the IEEE/RSJ international conference on intelligent robots and systems (pp. 5226-5231)
-
-
-
-
71
-
-
34247225903
-
-
Springer-Verlag, Heidelberg, Germany
-
Vachenauer P., Rade L., and Westergren B. Springers Mathematische Formeln: Taschenbuch für Ingenieure, Naturwissenschaftler, Informatiker, Wirtschaftswissenschaftler (2000), Springer-Verlag, Heidelberg, Germany
-
(2000)
Springers Mathematische Formeln: Taschenbuch für Ingenieure, Naturwissenschaftler, Informatiker, Wirtschaftswissenschaftler
-
-
Vachenauer, P.1
Rade, L.2
Westergren, B.3
-
72
-
-
0027832075
-
Trajectory formation of arm movement by a neural network with forward and inverse dynamics models
-
Wada Y., and Kawato M. Trajectory formation of arm movement by a neural network with forward and inverse dynamics models. Systems and Computers in Japan 24 (1994) 37-50
-
(1994)
Systems and Computers in Japan
, vol.24
, pp. 37-50
-
-
Wada, Y.1
Kawato, M.2
-
73
-
-
44949160165
-
-
Weaver, L., & Tao, N. (2001a). The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the international conference on uncertainty in artificial intelligence (pp. 538-545). Vol. 17. Seattle, Washington
-
Weaver, L., & Tao, N. (2001a). The optimal reward baseline for gradient-based reinforcement learning. In: Proceedings of the international conference on uncertainty in artificial intelligence (pp. 538-545). Vol. 17. Seattle, Washington
-
-
-
-
74
-
-
44949182963
-
-
Weaver, L., & Tao, N. (2001b). The variance minimizing constant reward baseline for gradient-based reinforcement learning. Technical Report 30. Australian National University (ANU)
-
Weaver, L., & Tao, N. (2001b). The variance minimizing constant reward baseline for gradient-based reinforcement learning. Technical Report 30. Australian National University (ANU)
-
-
-
-
75
-
-
34548729060
-
Changes in global policy analysis procedures suggested by new methods of optimization
-
Werbos P. Changes in global policy analysis procedures suggested by new methods of optimization. Policy Analysis and Information Systems 3 1 (1979)
-
(1979)
Policy Analysis and Information Systems
, vol.3
, Issue.1
-
-
Werbos, P.1
-
76
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8 23 (1992)
-
(1992)
Machine Learning
, vol.8
, Issue.23
-
-
Williams, R.J.1
|