-
3
-
-
84903590417
-
A survey on policy search for robotics
-
Deisenroth, Marc Peter, Neumann, Gerhard, Peters, Jan, et al. A survey on policy search for robotics. Foundations and Trends in Robotics, 2(1-2):1–142, 2013.
-
(2013)
Foundations and Trends in Robotics
, vol.2
, Issue.1-2
, pp. 1-142
-
-
Deisenroth, M.P.1
Neumann, G.2
Peters, J.3
-
4
-
-
77953260848
-
States versus rewards: Dis-sociable neural prediction error signals underlying model-based and model-free reinforcement learning
-
Gläscher, Jan, Daw, Nathaniel, Dayan, Peter, and O’Doherty, John P. States versus rewards: dis-sociable neural prediction error signals underlying model-based and model-free reinforcement learning. Neuron, 66(4):585–595, 2010.
-
(2010)
Neuron
, vol.66
, Issue.4
, pp. 585-595
-
-
Gläscher, J.1
Daw, N.2
Dayan, P.3
O’Doherty, J.P.4
-
5
-
-
84862294866
-
Deep sparse rectifier networks
-
Glorot, Xavier, Bordes, Antoine, and Bengio, Yoshua. Deep sparse rectifier networks. In Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP Volume, volume 15, pp. 315–323, 2011.
-
(2011)
Proceedings of the 14th International Conference on Artificial Intelligence and Statistics. JMLR W&CP
, vol.15
, pp. 315-323
-
-
Glorot, X.1
Bordes, A.2
Bengio, Y.3
-
6
-
-
79958779459
-
Reinforcement learning in feedback control
-
Hafner, Roland and Riedmiller, Martin. Reinforcement learning in feedback control. Machine learning, 84(1-2):137–169, 2011.
-
(2011)
Machine Learning
, vol.84
, Issue.1-2
, pp. 137-169
-
-
Hafner, R.1
Riedmiller, M.2
-
8
-
-
84998919856
-
Memory-based control with recurrent neural networks
-
Heess, N., Hunt, J. J, Lillicrap, T. P, and Silver, D. Memory-based control with recurrent neural networks. NIPS Deep Reinforcement Learning Workshop (arXiv:1512.04455), 2015.
-
(2015)
NIPS Deep Reinforcement Learning Workshop
-
-
Heess, N.1
Hunt, J.J.2
Lillicrap, T.P.3
Silver, D.4
-
9
-
-
84965103751
-
Learning continuous control policies by stochastic value gradients
-
Heess, Nicolas, Wayne, Gregory, Silver, David, Lillicrap, Tim, Erez, Tom, and Tassa, Yuval. Learning continuous control policies by stochastic value gradients. In Advances in Neural Information Processing Systems, pp. 2926–2934, 2015.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 2926-2934
-
-
Heess, N.1
Wayne, G.2
Silver, D.3
Lillicrap, T.4
Erez, T.5
Tassa, Y.6
-
12
-
-
84905695541
-
Evolving deep unsupervised convolutional networks for vision-based reinforcement learning
-
Koutník, Jan, Schmidhuber, Jürgen, and Gomez, Faustino. Evolving deep unsupervised convolutional networks for vision-based reinforcement learning. In Proceedings of the 2014 conference on Genetic and evolutionary computation, pp. 541–548. ACM, 2014a.
-
(2014)
Proceedings of the 2014 Conference on Genetic and Evolutionary Computation
, pp. 541-548
-
-
Koutník, J.1
Schmidhuber, J.2
Gomez, F.3
-
13
-
-
84959255008
-
Online evolution of deep convolutional network for vision-based reinforcement learning
-
Springer
-
Koutník, Jan, Schmidhuber, Jürgen, and Gomez, Faustino. Online evolution of deep convolutional network for vision-based reinforcement learning. In From Animals to Animats 13, pp. 260–269. Springer, 2014b.
-
(2014)
From Animals to Animats
, vol.13
, pp. 260-269
-
-
Koutník, J.1
Schmidhuber, J.2
Gomez, F.3
-
14
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
-
(2012)
Advances in Neural Information Processing Systems
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
15
-
-
84943767635
-
-
arXiv preprint
-
Levine, Sergey, Finn, Chelsea, Darrell, Trevor, and Abbeel, Pieter. End-to-end training of deep visuomotor policies. arXiv preprint arXiv:1504.00702, 2015.
-
(2015)
End-to-End Training of Deep Visuomotor Policies
-
-
Levine, S.1
Finn, C.2
Darrell, T.3
Abbeel, P.4
-
16
-
-
84904867557
-
-
arXiv preprint
-
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Graves, Alex, Antonoglou, Ioannis, Wierstra, Daan, and Riedmiller, Martin. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602, 2013.
-
(2013)
Playing Atari with Deep Reinforcement Learning
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Graves, A.4
Antonoglou, I.5
Wierstra, D.6
Riedmiller, M.7
-
17
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Mnih, Volodymyr, Kavukcuoglu, Koray, Silver, David, Rusu, Andrei A, Veness, Joel, Bellemare, Marc G, Graves, Alex, Riedmiller, Martin, Fidjeland, Andreas K, Ostrovski, Georg, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
18
-
-
0031236002
-
Adaptive critic designs
-
Prokhorov, Danil V, Wunsch, Donald C, et al. Adaptive critic designs. Neural Networks, IEEE Transactions on, 8(5):997–1007, 1997.
-
(1997)
Neural Networks, IEEE Transactions on
, vol.8
, Issue.5
, pp. 997-1007
-
-
Prokhorov, D.V.1
Wunsch, D.C.2
-
19
-
-
84965157716
-
Gradient estimation using stochastic computation graphs
-
Schulman, John, Heess, Nicolas, Weber, Theophane, and Abbeel, Pieter. Gradient estimation using stochastic computation graphs. In Advances in Neural Information Processing Systems, pp. 3510–3522, 2015a.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 3510-3522
-
-
Schulman, J.1
Heess, N.2
Weber, T.3
Abbeel, P.4
-
20
-
-
84965149509
-
-
arXiv preprint
-
Schulman, John, Levine, Sergey, Moritz, Philipp, Jordan, Michael I, and Abbeel, Pieter. Trust region policy optimization. arXiv preprint arXiv:1502.05477, 2015b.
-
(2015)
Trust Region Policy Optimization
-
-
Schulman, J.1
Levine, S.2
Moritz, P.3
Jordan, M.I.4
Abbeel, P.5
-
21
-
-
84919793697
-
Deterministic policy gradient algorithms
-
Silver, David, Lever, Guy, Heess, Nicolas, Degris, Thomas, Wierstra, Daan, and Riedmiller, Martin. Deterministic policy gradient algorithms. In ICML, 2014.
-
(2014)
ICML
-
-
Silver, D.1
Lever, G.2
Heess, N.3
Degris, T.4
Wierstra, D.5
Riedmiller, M.6
-
22
-
-
84872363924
-
Synthesis and stabilization of complex behaviors through online trajectory optimization
-
Tassa, Yuval, Erez, Tom, and Todorov, Emanuel. Synthesis and stabilization of complex behaviors through online trajectory optimization. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 4906–4913. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 4906-4913
-
-
Tassa, Y.1
Erez, T.2
Todorov, E.3
-
23
-
-
23944452693
-
A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems
-
Todorov, Emanuel and Li, Weiwei. A generalized iterative lqg method for locally-optimal feedback control of constrained nonlinear stochastic systems. In American Control Conference, 2005. Proceedings of the 2005, pp. 300–306. IEEE, 2005.
-
(2005)
American Control Conference, 2005. Proceedings of the 2005
, pp. 300-306
-
-
Todorov, E.1
Li, W.2
-
24
-
-
84872292044
-
MujoCo: A physics engine for model-based control
-
Todorov, Emanuel, Erez, Tom, and Tassa, Yuval. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
25
-
-
36149005118
-
On the theory of the brownian motion
-
Uhlenbeck, George E and Ornstein, Leonard S. On the theory of the brownian motion. Physical review, 36(5):823, 1930.
-
(1930)
Physical Review
, vol.36
, Issue.5
, pp. 823
-
-
Uhlenbeck, G.E.1
Ornstein, L.S.2
-
28
-
-
71749106087
-
Real-time reinforcement learning by sequential actor–critics and experience replay
-
Wawrzynski, ´ Paweł. Real-time reinforcement learning by sequential actor–critics and experience replay. Neural Networks, 22(10):1484–1497, 2009.
-
(2009)
Neural Networks
, vol.22
, Issue.10
, pp. 1484-1497
-
-
Wawrzynski, P.1
-
29
-
-
85029148817
-
Control policy with autocorrelated noise in reinforcement learning for robotics
-
Wawrzynski, ´ Paweł. Control policy with autocorrelated noise in reinforcement learning for robotics. International Journal of Machine Learning and Computing, 5:91–95, 2015.
-
(2015)
International Journal of Machine Learning and Computing
, vol.5
, pp. 91-95
-
-
Wawrzynski, P.1
-
30
-
-
84875884428
-
Autonomous reinforcement learning with experience replay
-
Wawrzynski, ´ Paweł and Tanwani, Ajay Kumar. Autonomous reinforcement learning with experience replay. Neural Networks, 41:156–167, 2013.
-
(2013)
Neural Networks
, vol.41
, pp. 156-167
-
-
Wawrzynski, P.1
Tanwani, A.K.2
|