-
1
-
-
0000396062
-
Natural gradient works efficiently in learning
-
S. Amari. Natural Gradient Works Efficiently in Learning. Neural Computation, 10:251-276, 1998.
-
(1998)
Neural Computation
, vol.10
, pp. 251-276
-
-
Amari, S.1
-
3
-
-
84858765598
-
Covariant policy search
-
J. Bagnell and J. Schneider. Covariant Policy Search. IJCAI, 18:1019-1024, 2003.
-
(2003)
IJCAI
, vol.18
, pp. 1019-1024
-
-
Bagnell, J.1
Schneider, J.2
-
8
-
-
70349984547
-
Natural actor-critic algorithms
-
S. Bhatnagar, R. Sutton, M. Ghavamzadeh, and L. Mark. Natural Actor-Critic Algorithms. Automatica, 45:2471-2482, 2009.
-
(2009)
Automatica
, vol.45
, pp. 2471-2482
-
-
Bhatnagar, S.1
Sutton, R.2
Ghavamzadeh, M.3
Mark, L.4
-
10
-
-
0346982426
-
Using expectation-maximization for reinforcement learning
-
P. Dayan and G. E. Hinton. Using Expectation-Maximization for Reinforcement Learning. Neural Computation, 9:271-278, 1997.
-
(1997)
Neural Computation
, vol.9
, pp. 271-278
-
-
Dayan, P.1
Hinton, G.E.2
-
11
-
-
0002629270
-
Maximum likelihood from incomplete data via the em algorithm
-
A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological), 39(1):1-38, 1977.
-
(1977)
Journal of the Royal Statistical Society. Series B (Methodological)
, vol.39
, Issue.1
, pp. 1-38
-
-
Dempster, A.P.1
Laird, N.M.2
Rubin, D.B.3
-
13
-
-
80053139999
-
Efficient inference for Markov control problems
-
T. Furmston and D. Barber. Efficient Inference for Markov Control Problems. UAI, 29:221-229, 2011.
-
(2011)
UAI
, vol.29
, pp. 221-229
-
-
Furmston, T.1
Barber, D.2
-
14
-
-
84976859194
-
Likelihood ratio gradient estimation for stochastic systems
-
P.W. Glynn. Likelihood Ratio Gradient Estimation for Stochastic Systems. Communications of the ACM, 33:97-84, 1990.
-
(1990)
Communications of the ACM
, vol.33
, pp. 97-84
-
-
Glynn, P.W.1
-
15
-
-
84897694817
-
Variance reduction techniques for gradient based estimates in reinforcement learning
-
E. Greensmith, P. Bartlett, and J. Baxter. Variance Reduction Techniques For Gradient Based Estimates in Reinforcement Learning. Journal of Machine Learning Research, 5:1471-1530, 2004.
-
(2004)
Journal of Machine Learning Research
, vol.5
, pp. 1471-1530
-
-
Greensmith, E.1
Bartlett, P.2
Baxter, J.3
-
16
-
-
84898930479
-
A natural policy gradient
-
S. Kakade. A Natural Policy Gradient. NIPS, 14:1531-1538, 2002.
-
(2002)
NIPS
, vol.14
, pp. 1531-1538
-
-
Kakade, S.1
-
18
-
-
78049390740
-
Policy search for motor primitives in robotics
-
J. Kober and J. Peters. Policy Search for Motor Primitives in Robotics. Machine Learning, 84(1-2):171-203, 2011.
-
(2011)
Machine Learning
, vol.84
, Issue.1-2
, pp. 171-203
-
-
Kober, J.1
Peters, J.2
-
22
-
-
33646430192
-
Learning finite-state controllers for partially observable environments
-
N. Meuleau, L. Peshkin, K. Kim, and L. Kaelbling. Learning Finite-State Controllers for Partially Observable Environments. UAI, 15:427-436, 1999.
-
(1999)
UAI
, vol.15
, pp. 427-436
-
-
Meuleau, N.1
Peshkin, L.2
Kim, K.3
Kaelbling, L.4
-
24
-
-
40649106649
-
Natural actor-critic
-
J. Peters and S. Schaal. Natural Actor-Critic. Neurocomputing, 71(7-9):1180-1190, 2008.
-
(2008)
Neurocomputing
, vol.71
, Issue.7-9
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
26
-
-
84864064043
-
Natural actor-critic for road traffic optimisation
-
S. Richter, D. Aberdeen, and J. Yu. Natural Actor-Critic for Road Traffic Optimisation. NIPS, 19:1169-1176, 2007.
-
(2007)
NIPS
, vol.19
, pp. 1169-1176
-
-
Richter, S.1
Aberdeen, D.2
Yu, J.3
-
27
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy Gradient Methods for Reinforcement Learning with Function Approximation. NIPS, 13:1057-1063, 2000.
-
(2000)
NIPS
, vol.13
, pp. 1057-1063
-
-
Sutton, R.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
29
-
-
70349327392
-
Learning model-free robot control by a Monte Carlo em algorithm
-
N. Vlassis, M. Toussaint, G. Kontes, and S. Piperidis. Learning Model-Free Robot Control by a Monte Carlo EM Algorithm. Autonomous Robots, 27(2):123-130, 2009.
-
(2009)
Autonomous Robots
, vol.27
, Issue.2
, pp. 123-130
-
-
Vlassis, N.1
Toussaint, M.2
Kontes, G.3
Piperidis, S.4
-
30
-
-
21444437925
-
The optimal reward baseline for gradient based reinforcement learning
-
L. Weaver and N. Tao. The Optimal Reward Baseline for Gradient Based Reinforcement Learning. UAI, 17(29):538-545, 2001.
-
(2001)
UAI
, vol.17
, Issue.29
, pp. 538-545
-
-
Weaver, L.1
Tao, N.2
-
31
-
-
0000337576
-
Simple statistical gradient following algorithms for connectionist reinforcement learning
-
R. Williams. Simple Statistical Gradient Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8:229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.1
|