-
1
-
-
0020970738
-
Neuronlike adaptive elements that can solve difficult learning control problems
-
Barto, Andrew G, Sutton, Richard S, and Anderson, Charles W. Neuronlike adaptive elements that can solve difficult learning control problems. Systems, Man and Cybernetics, IEEE Transactions on, (5):834–846, 1983.
-
(1983)
Systems, Man and Cybernetics, IEEE Transactions on
, Issue.5
, pp. 834-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
2
-
-
0003272616
-
Reinforcement learning in POMDPs via direct gradient ascent
-
Baxter, Jonathan and Bartlett, Peter L. Reinforcement learning in POMDPs via direct gradient ascent. In ICML, pp. 41–48, 2000.
-
(2000)
ICML
, pp. 41-48
-
-
Baxter, J.1
Bartlett, P.L.2
-
4
-
-
79951481923
-
Convergent temporal-difference learning with arbitrary smooth function approximation
-
Bhatnagar, Shalabh, Precup, Doina, Silver, David, Sutton, Richard S, Maei, Hamid R, and Szepesvári, Csaba. Convergent temporal-difference learning with arbitrary smooth function approximation. In Advances in Neural Information Processing Systems, pp. 1204–1212, 2009.
-
(2009)
Advances in Neural Information Processing Systems
, pp. 1204-1212
-
-
Bhatnagar, S.1
Precup, D.2
Silver, D.3
Sutton, R.S.4
Maei, H.R.5
Szepesvári, C.6
-
5
-
-
84897694817
-
Variance reduction techniques for gradient estimates in reinforcement learning
-
Greensmith, Evan, Bartlett, Peter L, and Baxter, Jonathan. Variance reduction techniques for gradient estimates in reinforcement learning. The Journal of Machine Learning Research, 5:1471–1530, 2004.
-
(2004)
The Journal of Machine Learning Research
, vol.5
, pp. 1471-1530
-
-
Greensmith, E.1
Bartlett, P.L.2
Baxter, J.3
-
6
-
-
79958779459
-
Reinforcement learning in feedback control
-
Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine learning, 84 (1-2):137–169, 2011.
-
(2011)
Machine Learning
, vol.84
, Issue.1-2
, pp. 137-169
-
-
Hafner, R.1
Riedmiller, M.2
-
7
-
-
85056837310
-
-
arXiv preprint
-
Heess, Nicolas, Wayne, Greg, Silver, David, Lillicrap, Timothy, Tassa, Yuval, and Erez, Tom. Learning continuous control policies by stochastic value gradients. arXiv preprint arXiv:1510.09142, 2015.
-
(2015)
Learning Continuous Control Policies by Stochastic Value Gradients
-
-
Heess, N.1
Wayne, G.2
Silver, D.3
Lillicrap, T.4
Tassa, Y.5
Erez, T.6
-
9
-
-
33646243319
-
A natural policy gradient
-
Kakade, Sham. A natural policy gradient. In NIPS, volume 14, pp. 1531–1538, 2001a.
-
(2001)
NIPS
, vol.14
, pp. 1531-1538
-
-
Kakade, S.1
-
10
-
-
84943252297
-
Optimizing average reward using discounted rewards
-
Springer
-
Kakade, Sham. Optimizing average reward using discounted rewards. In Computational Learning Theory, pp. 605–615. Springer, 2001b.
-
(2001)
Computational Learning Theory
, pp. 605-615
-
-
Kakade, S.1
-
11
-
-
0008336447
-
An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function
-
Kimura, Hajime and Kobayashi, Shigenobu. An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function. In ICML, pp. 278–286, 1998.
-
(1998)
ICML
, pp. 278-286
-
-
Kimura, H.1
Kobayashi, S.2
-
13
-
-
84965135289
-
-
arXiv preprint
-
Lillicrap, Timothy P, Hunt, Jonathan J, Pritzel, Alexander, Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David, and Wierstra, Daan. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
-
(2015)
Continuous Control with Deep Reinforcement Learning
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
14
-
-
0037288469
-
Approximate gradient methods in policy-space optimization of markov reward processes
-
Marbach, Peter and Tsitsiklis, John N. Approximate gradient methods in policy-space optimization of markov reward processes. Discrete Event Dynamic Systems, 13(1-2):111–148, 2003.
-
(2003)
Discrete Event Dynamic Systems
, vol.13
, Issue.1-2
, pp. 111-148
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
15
-
-
84937350040
-
Steps toward artificial intelligence
-
Minsky, Marvin. Steps toward artificial intelligence. Proceedings of the IRE, 49(1):8–30, 1961.
-
(1961)
Proceedings of the IRE
, vol.49
, Issue.1
, pp. 8-30
-
-
Minsky, M.1
-
16
-
-
0141596576
-
Policy invariance under reward transformations: Theory and application to reward shaping
-
Ng, Andrew Y, Harada, Daishi, and Russell, Stuart. Policy invariance under reward transformations: Theory and application to reward shaping. In ICML, volume 99, pp. 278–287, 1999.
-
(1999)
ICML
, vol.99
, pp. 278-287
-
-
Ng, A.Y.1
Harada, D.2
Russell, S.3
-
17
-
-
40649106649
-
Natural actor-critic
-
Peters, Jan and Schaal, Stefan. Natural actor-critic. Neurocomputing, 71(7):1180–1190, 2008.
-
(2008)
Neurocomputing
, vol.71
, Issue.7
, pp. 1180-1190
-
-
Peters, J.1
Schaal, S.2
-
18
-
-
84965149509
-
-
arXiv preprint
-
Schulman, John, Levine, Sergey, Moritz, Philipp, Jordan, Michael I, and Abbeel, Pieter. Trust region policy optimization. arXiv preprint arXiv:1502.05477, 2015.
-
(2015)
Trust Region Policy Optimization
-
-
Schulman, J.1
Levine, S.2
Moritz, P.3
Jordan, M.I.4
Abbeel, P.5
-
20
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Citeseer
-
Sutton, Richard S, McAllester, David A, Singh, Satinder P, and Mansour, Yishay. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, pp. 1057–1063. Citeseer, 1999.
-
(1999)
NIPS
, vol.99
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
22
-
-
84872292044
-
MujoCo: A physics engine for model-based control
-
Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
23
-
-
71749106087
-
Real-time reinforcement learning by sequential actor–critics and experience replay
-
Wawrzynski, ´ Paweł. Real-time reinforcement learning by sequential actor–critics and experience replay. Neural Networks, 22(10):1484–1497, 2009.
-
(2009)
Neural Networks
, vol.22
, Issue.10
, pp. 1484-1497
-
-
Wawrzynski, P.1
-
24
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Williams, Ronald J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
|