-
2
-
-
0036477347
-
Estimation and approximation bounds for gradient-based reinforcement learning
-
To appear
-
P. L. Bartlett and J. Baxter. Estimation and approximation bounds for gradient-based reinforcement learning. Journal of Computer and Systems Sciences, 2002. To appear.
-
(2002)
Journal of Computer and Systems Sciences
-
-
Bartlett, P.L.1
Baxter, J.2
-
3
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
A. G. Barto, R. S. Sutton, and C. W. Anderson. Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, SMC-13:834-846, 1983.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics, SMC
, vol.13
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
5
-
-
0013495368
-
Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments
-
J. Baxter, P. L. Bartlett, and L. Weaver. Infinite-horizon gradient-based policy search: II. Gradient ascent algorithms and experiments. Journal of Artificial Intelligence Research, 15:351-381, 2001.
-
(2001)
Journal of Artificial Intelligence Research
, vol.15
, pp. 351-381
-
-
Baxter, J.1
Bartlett, P.L.2
Weaver, L.3
-
7
-
-
84976859194
-
Likelihood ratio gradient estimation for stochastic systems
-
P. W. Glynn. Likelihood ratio gradient estimation for stochastic systems. Communications of the ACM, 33:75-84, 1990.
-
(1990)
Communications of the ACM
, vol.33
, pp. 75-84
-
-
Glynn, P.W.1
-
8
-
-
84898983933
-
Variance reduction techniques for gradient estimates in reinforcement learning
-
E. Greensmith, P. L. Bartlett, and J. Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Technical report, ANU, 2002.
-
(2002)
Technical Report, ANU
-
-
Greensmith, E.1
Bartlett, P.L.2
Baxter, J.3
-
9
-
-
0001251942
-
Reinforcement learning in POMDPs with function approximation
-
D. H. Fisher, editor
-
H. Kimura, K. Miyazaki, and S. Kobayashi. Reinforcement learning in POMDPs with function approximation. In D. H. Fisher, editor, Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97), pages 152-160, 1997.
-
(1997)
Proceedings of the Fourteenth International Conference on Machine Learning (ICML'97)
, pp. 152-160
-
-
Kimura, H.1
Miyazaki, K.2
Kobayashi, S.3
-
12
-
-
0012260708
-
How to optimize complex stochastic systems from a single sample path by the score function method
-
R. Y. Rubinstein. How to optimize complex stochastic systems from a single sample path by the score function method. Ann. Oper. Res., 27:175-211, 1991.
-
(1991)
Ann. Oper. Res.
, vol.27
, pp. 175-211
-
-
Rubinstein, R.Y.1
-
14
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
MIT Press
-
R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In Advances in Neural Information Processing Systems 12, pages 1057-1063. MIT Press, 2000.
-
(2000)
Advances in Neural Information Processing Systems
, vol.12
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
15
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
R. J. Williams. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Machine Learning, 8:229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
|