-
1
-
-
0031073475
-
Locally weighted learning for control
-
Springer
-
Atkeson, Christopher G, Moore, Andrew W, and Schaal, Stefan. Locally weighted learning for control. In Lazy learning, pp. 75-113. Springer, 1997.
-
(1997)
Lazy Learning
, pp. 75-113
-
-
Atkeson, C.G.1
Moore, A.W.2
Schaal, S.3
-
2
-
-
0004370245
-
-
Technical report, DTIC Document
-
Baird III, Leemon C. Advantage updating. Technical report, DTIC Document, 1993.
-
(1993)
Advantage Updating
-
-
Baird, L.C.1
-
3
-
-
84998965670
-
The importance of experience replay database composition in deep reinforcement learning
-
NIPS
-
de Bruin, Tim, Kober, Jens, Tuyls, Karl, and Babuska, Robert. The importance of experience replay database composition in deep reinforcement learning. Deep Reinforcement Learning Workshop, NIPS, 2015.
-
(2015)
Deep Reinforcement Learning Workshop
-
-
De Bruin, T.1
Kober, J.2
Tuyls, K.3
Babuska, R.4
-
5
-
-
84903590417
-
A survey on policy search for robotics
-
Deisenroth, Marc Peter, Neumann, Gerhard, Peters, Jan, et al. A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1-2): 1-142, 2013.
-
(2013)
Foundations and Trends in Robotics
, vol.2
, Issue.1-2
, pp. 1-142
-
-
Deisenroth, M.P.1
Neumann, G.2
Peters, J.3
-
7
-
-
79958779459
-
Reinforcement learning in feedback control
-
Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine learning, 84(1-2): 137-169, 2011.
-
(2011)
Machine Learning
, vol.84
, Issue.1-2
, pp. 137-169
-
-
Hafner, R.1
Riedmiller, M.2
-
8
-
-
0003996286
-
Multi-player residual advantage learning with general function approximation
-
OH
-
Harmon, Mance E and Baird III, Leemon C. Multi-player residual advantage learning with general function approximation. Wright Laboratory, WL/AACF, Wright-Patterson Air Force Base, OH, pp. 45433-7308, 1996.
-
(1996)
Wright Laboratory, WL/AACF, Wright-Patterson Air Force Base
, pp. 45433-47308
-
-
Harmon, M.E.1
Baird, L.C.2
-
10
-
-
84965103751
-
Learning continuous control policies by stochastic value gradients
-
Heess, Nicolas, Wayne, Gregory, Silver, David, Lillicrap, Tim, Erez, Tom, and Tassa, Yuval. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems (NIPS), pp. 2926-2934, 2015.
-
(2015)
Advances in Neural Information Processing Systems (NIPS)
, pp. 2926-2934
-
-
Heess, N.1
Wayne, G.2
Silver, D.3
Lillicrap, T.4
Erez, T.5
Tassa, Y.6
-
12
-
-
84892593209
-
Reinforcement learning in robotics: A survey
-
Springer
-
Kober, Jens and Peters, Jan. Reinforcement learning in robotics: A survey. In Reinforcement Learning, pp. 579-610. Springer, 2012.
-
(2012)
Reinforcement Learning
, pp. 579-610
-
-
Kober, J.1
Peters, J.2
-
14
-
-
84883060087
-
Evolving large-scale neural networks for vision-based reinforcement learning
-
ACM
-
Koutnfk, Jan, Cuccu, Giuseppe, Schmidhuber, Jiirgen, and Gomez, Faustino. Evolving large-scale neural networks for vision-based reinforcement learning. In Proceedings of the 15th annual conference on Genetic and evolutionary computation, pp. 1061-1068. ACM, 2013.
-
(2013)
Proceedings of the 15th Annual Conference on Genetic and Evolutionary Computation
, pp. 1061-1068
-
-
Koutnfk, J.1
Cuccu, G.2
Schmidhuber, J.3
Gomez, F.4
-
16
-
-
84937822296
-
Learning neural network policies with guided policy search under unknown dynamics
-
Levine, Sergey and Abbeel, Pieter. Learning neural network policies with guided policy search under unknown dynamics. In Advances in Neural Information Processing Systems (NIPS), pp. 1071-1079, 2014.
-
(2014)
Advances in Neural Information Processing Systems (NIPS)
, pp. 1071-1079
-
-
Levine, S.1
Abbeel, P.2
-
18
-
-
84979924150
-
End-to-end training of deep visuomotor policies
-
Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. JMLR 17, 2016.
-
(2016)
JMLR
, vol.17
-
-
Levine, S.1
Finn, C.2
Darrell, T.3
Abbeel, P.4
-
19
-
-
17444424051
-
Iterative linear quadratic regulator design for nonlinear biological movement systems
-
Li, Weiwei and Todorov, Emanuel. Iterative linear quadratic regulator design for nonlinear biological movement systems. In ICINCO (1), pp. 222-229, 2004.
-
(2004)
ICINCO
, Issue.1
, pp. 222-229
-
-
Li, W.1
Todorov, E.2
-
20
-
-
85083953657
-
Continuous control with deep reinforcement learning
-
Lillicrap, Timothy P, Hunt, Jonathan J, Pritzel, Alexander, Heess, Nicolas, Erez, Tom, Tassa, Yuval, Silver, David, and Wierstra, Daan. Continuous control with deep reinforcement learning. International Conference on Learning Representations (ICLR), 2016.
-
(2016)
International Conference on Learning Representations (ICLR)
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
21
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
24
-
-
85167411371
-
Relative entropy policy search
-
Atlanta
-
Peters, Jan, Mulling, Katharina, and Altun, Yasemin. Relative entropy policy search. In AAAI. Atlanta, 2010.
-
(2010)
AAAI
-
-
Peters, J.1
Mulling, K.2
Altun, Y.3
-
25
-
-
84959314908
-
-
and, Robotics
-
Rawlik, Konrad, Toussaint, Marc, and Vijayakumar, Sethu. On stochastic optimal control and reinforcement learning by approximate inference. Robotics, pp. 353, 2013.
-
(2013)
On Stochastic Optimal Control and Reinforcement Learning by Approximate Inference
, pp. 353
-
-
Rawlik, K.1
Toussaint, M.2
Vijayakumar Sethu3
-
26
-
-
84980041049
-
-
arXiv preprint arXiv: 1511.05952
-
Schaul, Tom, Quan, John, Antonoglou, Ioannis, and Silver, David. Prioritized experience replay. arXiv preprint arXiv: 1511.05952, 2015.
-
(2015)
Prioritized Experience Replay
-
-
Schaul, T.1
Quan, J.2
Antonoglou, I.3
Silver, D.4
-
28
-
-
84969963490
-
Trust region policy optimization
-
Schulman, John, Levine, Sergey, Abbeel, Pieter, Jordan, Michael I., and Moritz, Philipp. Trust region policy optimization. In International Conference on Machine Learning (ICML), pp. 1889-1897, 2015.
-
(2015)
International Conference on Machine Learning (ICML)
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.I.4
Moritz, P.5
-
29
-
-
85083954383
-
High-dimensional continuous control using generalized advantage estimation
-
Schulman, John, Moritz, Philipp, Levine, Sergey, Jordan, Michael, and Abbeel, Pieter. High-dimensional continuous control using generalized advantage estimation. International Conference on Learning Representations (ICLR), 2016.
-
(2016)
International Conference on Learning Representations (ICLR)
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
30
-
-
84919793697
-
Deterministic policy gradient algorithms
-
Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In International Conference on Machine Learning (ICML), 2014.
-
(2014)
International Conference on Machine Learning (ICML)
-
-
Silver, D.1
Lever, G.2
Heess, N.3
Degris, T.4
Wierstra, D.5
Riedmiller, M.6
-
31
-
-
85132026293
-
Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
-
Sutton, Richard S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In International Conference on Machine Learning (ICML), pp. 216-224, 1990.
-
(1990)
International Conference on Machine Learning (ICML)
, pp. 216-224
-
-
Sutton, R.S.1
-
32
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Sutton, Richard S, McAllester, David A, Singh, Satin-der P, Mansour, Yishay, et al. Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (NIPS), volume 99, pp. 1057-1063, 1999.
-
(1999)
Advances in Neural Information Processing Systems (NIPS)
, vol.99
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.-D.P.3
Mansour, Y.4
-
33
-
-
84872363924
-
Synthesis and stabilization of complex behaviors through online trajectory optimization
-
IEEE
-
Tassa, Yuval, Erez, Tom, and Todorov, Emanuel. Synthesis and stabilization of complex behaviors through online trajectory optimization. In International Conference on Intelligent Robots and Systems (IROS), pp. 4906-4913. IEEE, 2012.
-
(2012)
International Conference on Intelligent Robots and Systems (IROS)
, pp. 4906-4913
-
-
Tassa, Y.1
Erez, T.2
Todorov, E.3
-
34
-
-
84872292044
-
Mujoco: A physics engine for model-based control
-
IEEE
-
Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In International Conference on Intelligent Robots and Systems (IROS), pp. 5026-5033. IEEE, 2012.
-
(2012)
International Conference on Intelligent Robots and Systems (IROS)
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
37
-
-
84965129327
-
Embed to control: A locally linear latent dynamics model for control from raw images
-
Watter, Manuel, Springenberg, Jost, Boedecker, Joschka, and Riedmiller, Martin. Embed to control: A locally linear latent dynamics model for control from raw images. In Advances in Neural Information Processing Systems (NIPS), pp. 2728-2736, 2015.
-
(2015)
Advances in Neural Information Processing Systems (NIPS)
, pp. 2728-2736
-
-
Watter, M.1
Springenberg, J.2
Boedecker, J.3
Riedmiller, M.4
|