-
1
-
-
14344253499
-
Policy-gradient algorithms for partially observable markov decision processes
-
Ph.D. thesis, Australian National University
-
Aberdeen, D. (2003). Policy-gradient algorithms for partially observable markov decision processes. Ph.D. thesis, Australian National University.
-
(2003)
-
-
Aberdeen, D.1
-
2
-
-
0003272616
-
Reinforcement learning in POMDPs via direct gradient ascent
-
Morgan Kaufmann, San Francisco, CA
-
Baxter J., Bartlett P.L. Reinforcement learning in POMDPs via direct gradient ascent. Proc. 17th international conf. on machine learning 2000, 41-48. Morgan Kaufmann, San Francisco, CA.
-
(2000)
Proc. 17th international conf. on machine learning
, pp. 41-48
-
-
Baxter, J.1
Bartlett, P.L.2
-
4
-
-
77950299438
-
Institute of Automatic Control Engineering
-
TU München, Germany
-
Buss, M., & Hirche, S. (2008). Institute of Automatic Control Engineering, TU München, Germany. http://www.lsr.ei.tum.de/.
-
(2008)
-
-
Buss, M.1
Hirche, S.2
-
5
-
-
0035377566
-
Completely derandomized self-adaptation in evolution strategies
-
Hansen N., Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 2001, 9(2):159-195.
-
(2001)
Evolutionary Computation
, vol.9
, Issue.2
, pp. 159-195
-
-
Hansen, N.1
Ostermeier, A.2
-
6
-
-
0001887517
-
Attractor dynamics and parallelism in a connectionist sequential machine
-
In Proc. of the eighth annual conference of the cognitive science society
-
Jordan, M. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proc. of the eighth annual conference of the cognitive science society (pp. 531-546) Vol. 8.
-
(1986)
, vol.8
, pp. 531-546
-
-
Jordan, M.1
-
7
-
-
77950299711
-
Making a robot learn to play soccer
-
In Proceedings of the 30th annual German conference on artificial intelligence (KI-2007)
-
Müller, H., Lauer, M., Hafner, R., Lange, S., Merke, A., & Riedmiller, M. (2007). Making a robot learn to play soccer. In Proceedings of the 30th annual German conference on artificial intelligence (KI-2007).
-
(2007)
-
-
Müller, H.1
Lauer, M.2
Hafner, R.3
Lange, S.4
Merke, A.5
Riedmiller, M.6
-
9
-
-
34250635407
-
Policy gradient methods for robotics
-
In IROS-2006
-
Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In IROS-2006(pp. 2219-2225).
-
(2006)
, pp. 2219-2225
-
-
Peters, J.1
Schaal, S.2
-
10
-
-
40649106649
-
Natural actor-critic
-
Peters J., Schaal S. Natural actor-critic. Neurocomputing 2008, 71:1180-1190.
-
(2008)
Neurocomputing
, vol.71
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
11
-
-
44949241322
-
Reinforcement learning of motor skills with policy gradients
-
Peters J., Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks 2008, 682-697.
-
(2008)
Neural Networks
, pp. 682-697
-
-
Peters, J.1
Schaal, S.2
-
13
-
-
34548763245
-
Evaluation of policy gradient methods and variants on the cart-pole benchmark
-
In ADPRL-2007
-
Riedmiller, M., Peters, J., & Schaal, S. (2007). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In ADPRL-2007.
-
(2007)
-
-
Riedmiller, M.1
Peters, J.2
Schaal, S.3
-
14
-
-
56049089041
-
State-dependent exploration for policy gradient methods
-
W.D. (Ed.) European conference on machine learning and principles and practice of knowledge discovery in databases 2008, Part II
-
Rückstieß T., Felder M., Schmidhuber J. State-dependent exploration for policy gradient methods. LNAI 2008, Vol. 5212:234-249. W.D. (Ed.).
-
(2008)
LNAI
, vol.5212
, pp. 234-249
-
-
Rückstieß, T.1
Felder, M.2
Schmidhuber, J.3
-
15
-
-
33745327217
-
Fast online policy gradient learning with smd gain vector adaptation
-
MIT Press, Cambridge, MA, Y. Weiss, B. Schölkopf, J. Platt (Eds.)
-
Schraudolph N., Yu J., Aberdeen D. Fast online policy gradient learning with smd gain vector adaptation. Advances in neural information processing systems 2006, Vol. 18. MIT Press, Cambridge, MA. Y. Weiss, B. Schölkopf, J. Platt (Eds.).
-
(2006)
Advances in neural information processing systems
, vol.18
-
-
Schraudolph, N.1
Yu, J.2
Aberdeen, D.3
-
17
-
-
77950299646
-
-
PGPE-Policy Gradients with Parameter-based exploration-demonstration video: Learning in robot simulatons
-
Sehnke, F. (2009). PGPE-Policy Gradients with Parameter-based exploration-demonstration video: Learning in robot simulatons. http://www.pybrain.org/videos/jnn10/.
-
(2009)
-
-
Sehnke, F.1
-
18
-
-
0000284219
-
An overview of the simultaneous perturbation method for efficient optimization
-
Spall J. An overview of the simultaneous perturbation method for efficient optimization. Johns Hopkins APL Technical Digest 1998, 19(4):482-492.
-
(1998)
Johns Hopkins APL Technical Digest
, vol.19
, Issue.4
, pp. 482-492
-
-
Spall, J.1
-
19
-
-
0032117046
-
Implementation of the simultaneous perturbation algorithm for stochastic optimization
-
Spall J. Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on Aerospace and Electronic Systems 1998, 34(3):817-823.
-
(1998)
IEEE Transactions on Aerospace and Electronic Systems
, vol.34
, Issue.3
, pp. 817-823
-
-
Spall, J.1
-
20
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
In NIPS-1999
-
Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In NIPS-1999(pp. 1057-1063).
-
(2000)
, pp. 1057-1063
-
-
Sutton, R.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
21
-
-
58849139307
-
Institute of Applied Mechanics
-
TU München, Germany
-
Ulbrich, H. (2008). Institute of Applied Mechanics, TU München, Germany. http://www.amm.mw.tum.de/.
-
(2008)
-
-
Ulbrich, H.1
-
22
-
-
56449125944
-
-
In Proceedings of the 10th international conference on parallel problem solving from nature
-
Wierstra, D., Schaul, T., Peters, J., & Schmidhuber, J. (2008). Fitness expectation maximization. In Proceedings of the 10th international conference on parallel problem solving from nature.
-
(2008)
Fitness expectation maximization
-
-
Wierstra, D.1
Schaul, T.2
Peters, J.3
Schmidhuber, J.4
-
23
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams R. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 1992, 8:229-256.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.1
|