-
1
-
-
0004102479
-
-
Cambridge, MA, MIT Press
-
R. Sutton, A. Barto, Reinforcement learning: An introduction, Cambridge, MA, MIT Press, 1998.
-
(1998)
Reinforcement learning: An introduction
-
-
Sutton, R.1
Barto, A.2
-
2
-
-
2142812536
-
Learning without state-estimation in partially observable Markovian decision processes
-
Morgan Kaufmann Publishers, San Francisco, CA
-
S. P. Singh, T. Jaakkola, M. I. Jordan, Learning without state-estimation in partially observable Markovian decision processes. In: International Conference on Machine Learning (ICML 1994), Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 284-292.
-
(1994)
International Conference on Machine Learning (ICML 1994)
, pp. 284-292
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
3
-
-
0013495368
-
Experiments with infinite-horizon, policy-gradient estimation
-
J. Baxter, P. Bartlett, L. Weaver, Experiments with infinite-horizon, policy-gradient estimation, Journal of Artificial Intelligence Research 15 (2001) 351-381.
-
(2001)
Journal of Artificial Intelligence Research
, vol.15
, pp. 351-381
-
-
Baxter, J.1
Bartlett, P.2
Weaver, L.3
-
4
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
MIT Press
-
R. Sutton, D. McAllester, S. Singh, Y. Mansour, Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems (NIPS), MIT Press, 2001, pp. 1057-1063.
-
(2001)
Advances in Neural Information Processing Systems (NIPS)
, pp. 1057-1063
-
-
Sutton, R.1
McAllester, D.2
Singh, S.3
Mansour, Y.4
-
5
-
-
85162049326
-
Incremental natural actor-critic algorithms
-
MIT Press
-
S. Bhatnagar, R. Sutton, M. Ghavamzadeh, M. Lee, Incremental natural actor-critic algorithms. In: Advances in Neural Information Processing Systems (NIPS), MIT Press, 2007, pp. 105-112.
-
(2007)
Advances in Neural Information Processing Systems (NIPS)
, pp. 105-112
-
-
Bhatnagar, S.1
Sutton, R.2
Ghavamzadeh, M.3
Lee, M.4
-
7
-
-
34250635407
-
Policy gradient methods for robotics
-
Beijing, China
-
J. Peters, S. Schaal, Policy gradient methods for robotics. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Beijing, China, 2006, pp. 2219 - 2225.
-
(2006)
Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS)
, pp. 2219-2225
-
-
Peters, J.1
Schaal, S.2
-
9
-
-
44949241322
-
Reinforcement learning of motor skills with policy gradients
-
J. Peters, S. Schaal, Reinforcement learning of motor skills with policy gradients, Neural Networks 21 (4) (2008) 682-97.
-
(2008)
Neural Networks
, vol.21
, Issue.4
, pp. 682-697
-
-
Peters, J.1
Schaal, S.2
-
10
-
-
0031343491
-
Biped dynamic walking using reinforcement learning
-
H. Benbrahim, J. Franklin, Biped dynamic walking using reinforcement learning, Robotics and Autonomous Systems 22 (3-4) (1997) 283-302.
-
(1997)
Robotics and Autonomous Systems
, vol.22
, Issue.3-4
, pp. 283-302
-
-
Benbrahim, H.1
Franklin, J.2
-
14
-
-
40649106649
-
Natural Actor Critic
-
J. Peters, S. Schaal, Natural Actor Critic, Neurocomputing 71 (7-9) (2008) 1180-1190.
-
(2008)
Neurocomputing
, vol.71
, Issue.7-9
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
15
-
-
0025600638
-
A stochastic reinforcement learning algorithm for learning real-valued functions
-
V. Gullapalli, A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Networks 3 (6) (1990) 671-692.
-
(1990)
Neural Networks
, vol.3
, Issue.6
, pp. 671-692
-
-
Gullapalli, V.1
-
16
-
-
0003572965
-
-
Ph.D. thesis, University of Massachusetts, Amherst, MA., USA
-
V. Gullapalli, Reinforcement learning and its application to control, Ph.D. thesis, University of Massachusetts, Amherst, MA., USA (1992).
-
(1992)
Reinforcement learning and its application to control
-
-
Gullapalli, V.1
-
17
-
-
0025503558
-
Back propagation through time: What it does and how to do it
-
P. Werbos, Back propagation through time: What it does and how to do it. In: Proceedings of the IEEE, Vol. 78, 1990, pp. 1550-1560.
-
(1990)
Proceedings of the IEEE
, vol.78
, pp. 1550-1560
-
-
Werbos, P.1
-
18
-
-
0002103968
-
Learning finite-state controllers for partially observable environments
-
Morgan Kaufmann Publishers
-
N. Meuleau, L. Peshkin, K.-E. Kim, L. P. Kaelbling, Learning finite-state controllers for partially observable environments. In: Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI '99). Morgan Kaufmann Publishers, 1999, pp. 427-436.
-
(1999)
Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI '99)
, pp. 427-436
-
-
Meuleau, N.1
Peshkin, L.2
Kim, K.-E.3
Kaelbling, L.P.4
-
20
-
-
84899015857
-
Reinforcement learning with Long Short-Term Memory
-
MIT Press
-
B. Bakker, Reinforcement learning with Long Short-Term Memory. In: Advances in Neural Information Processing Systems 14, MIT Press, 2002, pp. 1475-1482.
-
(2002)
Advances in Neural Information Processing Systems
, vol.14
, pp. 1475-1482
-
-
Bakker, B.1
-
22
-
-
0001202594
-
A learning algorithm for continually running fully recurrent networks
-
R. J. Williams, D. Zipser, A learning algorithm for continually running fully recurrent networks, Neural Computation 1 (2) (1989) 270-280.
-
(1989)
Neural Computation
, vol.1
, Issue.2
, pp. 270-280
-
-
Williams, R.J.1
Zipser, D.2
-
23
-
-
0028392483
-
Learning long-term dependencies with gradient descent is difficult
-
Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks 5 (2) (1994) 157-166.
-
(1994)
IEEE Transactions on Neural Networks
, vol.5
, Issue.2
, pp. 157-166
-
-
Bengio, Y.1
Simard, P.2
Frasconi, P.3
-
24
-
-
0041914606
-
Gradient flow in recurrent nets: the difficulty of learning long-term dependencies
-
S. C. Kremer, J. F. Kolen (Eds.), IEEE Press
-
S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: S. C. Kremer, J. F. Kolen (Eds.), A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001, pp. 237-244.
-
(2001)
A Field Guide to Dynamical Recurrent Neural Networks
, pp. 237-244
-
-
Hochreiter, S.1
Bengio, Y.2
Frasconi, P.3
Schmidhuber, J.4
-
26
-
-
77957276363
-
-
RNN overview
-
J. Schmidhuber, RNN overview, http://www.idsia.ch/~juergen/rnn.html (2004).
-
(2004)
-
-
Schmidhuber, J.1
-
27
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning 8 (1992) 229-256.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
-
28
-
-
0026626840
-
Evolving neural network controllers for unstable systems
-
Piscataway, NJ: IEEE
-
A. Wieland, Evolving neural network controllers for unstable systems. In: Proceedings of the International Joint Conference on Neural Networks (Seattle, WA), Piscataway, NJ: IEEE, 1991, pp. 667-673.
-
(1991)
Proceedings of the International Joint Conference on Neural Networks (Seattle, WA)
, pp. 667-673
-
-
Wieland, A.1
-
29
-
-
85138579181
-
Learning policies for partially observable environments: Scaling up
-
A. Prieditis, S. Russell (Eds.), Morgan Kaufmann Publishers, San Francisco, CA
-
M. Littman, A. Cassandra, L. Kaelbling, Learning policies for partially observable environments: Scaling up. In: A. Prieditis, S. Russell (Eds.), Machine Learning. Proceedings of the Twelfth International Conference. Morgan Kaufmann Publishers, San Francisco, CA, 1995, pp. 362-370.
-
(1995)
Machine Learning Proceedings of the Twelfth International Conference
, pp. 362-370
-
-
Littman, M.1
Cassandra, A.2
Kaelbling, L.3
-
31
-
-
77957288806
-
-
Torcs, The open racing car simulator
-
Torcs, The open racing car simulator, http://torcs.sourceforge.net/ (2007).
-
(2007)
-
-
|