-
1
-
-
33749242451
-
Using inaccurate models in reinforcement learning
-
Pieter Abbeel, Morgan Quigley, and Andrew Y. Ng. Using inaccurate models in reinforcement learning. In ICML, 2006.
-
(2006)
ICML
-
-
Abbeel, P.1
Quigley, M.2
Ng, A.Y.3
-
2
-
-
63149159130
-
A survey of robot learning from demonstration
-
Brenna D. Argall, Sonia Chernova, Manuela Veloso, and Brett Browning. A survey of robot learning from demonstration. Robotics and Autonomous Systems, 57(5):469 - 483, 2009.
-
(2009)
Robotics and Autonomous Systems
, vol.57
, Issue.5
, pp. 469-483
-
-
Argall, B.D.1
Chernova, S.2
Veloso, M.3
Browning, B.4
-
3
-
-
85015444377
-
-
Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. OpenAI Gym, 2016.
-
(2016)
OpenAI Gym
-
-
Brockman, G.1
Cheung, V.2
Pettersson, L.3
Schneider, J.4
Schulman, J.5
Tang, J.6
Zaremba, W.7
-
5
-
-
84903590417
-
A survey on policy search for robotics
-
Marc Peter Deisenroth, Gerhard Neumann, and Jan Peters. A survey on policy search for robotics. Foundations and Trends in Robotics, 2(12):1-142, 2013.
-
(2013)
Foundations and Trends in Robotics
, vol.2
, Issue.12
, pp. 1-142
-
-
Deisenroth, M.P.1
Neumann, G.2
Peters, J.3
-
6
-
-
77249117255
-
Percentile optimization for markov decision processes with parameter uncertainty
-
Erick Delage and Shie Mannor. Percentile optimization for markov decision processes with parameter uncertainty. Operations Research, 58(1):203-213, 2010.
-
(2010)
Operations Research
, vol.58
, Issue.1
, pp. 203-213
-
-
Delage, E.1
Mannor, S.2
-
7
-
-
84999018287
-
Benchmarking deep reinforcement learning for continuous control
-
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In ICML, 2016.
-
(2016)
ICML
-
-
Duan, Y.1
Chen, X.2
Houthooft, R.3
Schulman, J.4
Abbeel, P.5
-
8
-
-
1942421168
-
Design for an optimal probe
-
Michael O. Duff. Design for an optimal probe. In ICML, 2003.
-
(2003)
ICML
-
-
Duff, M.O.1
-
11
-
-
84973621947
-
Bayesian reinforcement learning: A survey
-
Mohammad Ghavamzadeh, Shie Mannor, Joelle Pineau, and Aviv Tamar. Bayesian reinforcement learning: A survey. Foundations and Trends in Machine Learning, 8(5-6):359-483, 2015.
-
(2015)
Foundations and Trends in Machine Learning
, vol.8
, Issue.5-6
, pp. 359-483
-
-
Ghavamzadeh, M.1
Mannor, S.2
Pineau, J.3
Tamar, A.4
-
12
-
-
33646243319
-
A natural policy gradient
-
Sham Kakade. A natural policy gradient. In NIPS, 2001.
-
(2001)
NIPS
-
-
Kakade, S.1
-
14
-
-
1942514728
-
Approximately optimal approximate reinforcement learning
-
Sham Kakade and John Langford. Approximately optimal approximate reinforcement learning. In ICML, 2002.
-
(2002)
ICML
-
-
Kakade, S.1
Langford, J.2
-
15
-
-
84897529781
-
Guided policy search
-
Sergey Levine and Vladlen Koltun. Guided policy search. In ICML, 2013.
-
(2013)
ICML
-
-
Levine, S.1
Koltun, V.2
-
16
-
-
84965135289
-
-
ArXiv e-prints, September
-
T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess, T. Erez, Y. Tassa, D. Silver, and D. Wierstra. Continuous control with deep reinforcement learning. ArXiv e-prints, September 2015.
-
(2015)
Continuous Control with Deep Reinforcement Learning
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
17
-
-
84899014168
-
Reinforcement learning in robust markov decision processes
-
Shiau Hong Lim, Huan Xu, and Shie Mannor. Reinforcement learning in robust markov decision processes. In NIPS. 2013.
-
(2013)
NIPS
-
-
Lim, S.H.1
Xu, H.2
Mannor, S.3
-
18
-
-
0003473124
-
-
Birkhäuser Boston, Boston, MA
-
Lennart Ljung. System Identification, pp. 163-173. Birkhäuser Boston, Boston, MA, 1998.
-
(1998)
System Identification
, pp. 163-173
-
-
Ljung, L.1
-
19
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Feb
-
Volodymyr Mnih et al. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, Feb 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
-
20
-
-
84958149573
-
Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids
-
I. Mordatch, K. Lowrey, and E. Todorov. Ensemble-CIO: Full-body dynamic motion planning that transfers to physical humanoids. In IROS, 2015a.
-
(2015)
IROS
-
-
Mordatch, I.1
Lowrey, K.2
Todorov, E.3
-
21
-
-
84965182099
-
Interactive control of diverse complex characters with neural networks
-
Igor Mordatch, Kendall Lowrey, Galen Andrew, Zoran Popovic, and Emanuel V. Todorov. Interactive control of diverse complex characters with neural networks. In NIPS. 2015b.
-
(2015)
NIPS
-
-
Mordatch, I.1
Lowrey, K.2
Andrew, G.3
Popovic, Z.4
Todorov, E.V.5
-
22
-
-
14344250395
-
Robust control of markov decision processes with uncertain transition matrices
-
Arnab Nilim and Laurent El Ghaoui. Robust control of markov decision processes with uncertain transition matrices. Operations Research, 53(5):780-798, 2005.
-
(2005)
Operations Research
, vol.53
, Issue.5
, pp. 780-798
-
-
Nilim, A.1
Ghaoui, L.E.2
-
24
-
-
33750724397
-
Point-based value iteration for continuous pomdps
-
Josep M. Porta, Nikos A. Vlassis, Matthijs T. J. Spaan, and Pascal Poupart. Point-based value iteration for continuous pomdps. Journal of Machine Learning Research, 7:2329-2367, 2006.
-
(2006)
Journal of Machine Learning Research
, vol.7
, pp. 2329-2367
-
-
Porta, J.M.1
Vlassis, N.A.2
Spaan, M.T.J.3
Poupart, P.4
-
25
-
-
33749251297
-
An analytic solution to discrete Bayesian reinforcement learning
-
Pascal Poupart, Nikos A. Vlassis, Jesse Hoey, and Kevin Regan. An analytic solution to discrete bayesian reinforcement learning. In ICML, 2006.
-
(2006)
ICML
-
-
Poupart, P.1
Vlassis, N.A.2
Hoey, J.3
Regan, K.4
-
26
-
-
51649091499
-
Bayesian reinforcement learning in continuous pomdps with application to robot navigation
-
S. Ross, B. Chaib-draa, and J. Pineau. Bayesian reinforcement learning in continuous pomdps with application to robot navigation. In ICRA, 2008.
-
(2008)
ICRA
-
-
Ross, S.1
Chaib-Draa, B.2
Pineau, J.3
-
27
-
-
84867115891
-
Agnostic system identification for model-based reinforcement learning
-
Stephane Ross and Drew Bagnell. Agnostic system identification for model-based reinforcement learning. In ICML, 2012.
-
(2012)
ICML
-
-
Ross, S.1
Bagnell, D.2
-
28
-
-
84969963490
-
Trust region policy optimization
-
John Schulman, Sergey Levine, Philipp Moritz, Michael Jordan, and Pieter Abbeel. Trust region policy optimization. In ICML, 2015.
-
(2015)
ICML
-
-
Schulman, J.1
Levine, S.2
Moritz, P.3
Jordan, M.4
Abbeel, P.5
-
29
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
Jan
-
David Silver et al. Mastering the game of go with deep neural networks and tree search. Nature, 529 (7587):484-489, Jan 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
-
32
-
-
68949157375
-
Transfer learning for reinforcement learning domains: A survey
-
December
-
Matthew E. Taylor and Peter Stone. Transfer learning for reinforcement learning domains: A survey. Journal of Machine Learning Research, 10:1633-1685, December 2009.
-
(2009)
Journal of Machine Learning Research
, vol.10
, pp. 1633-1685
-
-
Taylor, M.E.1
Stone, P.2
-
35
-
-
85042936847
-
-
Springer Berlin Heidelberg, Berlin, Heidelberg
-
Nikos Vlassis, Mohammad Ghavamzadeh, Shie Mannor, and Pascal Poupart. Bayesian Reinforcement Learning, pp. 359-386. Springer Berlin Heidelberg, Berlin, Heidelberg, 2012.
-
(2012)
Bayesian Reinforcement Learning
, pp. 359-386
-
-
Vlassis, N.1
Ghavamzadeh, M.2
Mannor, S.3
Poupart, P.4
-
36
-
-
84871677137
-
Optimizing walking controllers for uncertain inputs and environments
-
Jack M. Wang, David J. Fleet, and Aaron Hertzmann. Optimizing walking controllers for uncertain inputs and environments. ACM Trans. Graph., 2010.
-
(2010)
ACM Trans. Graph.
-
-
Wang, J.M.1
Fleet, D.J.2
Hertzmann, A.3
-
37
-
-
71749106087
-
Real-time reinforcement learning by sequential actor-critics and experience replay
-
Pawel Wawrzynski. Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks, 22:1484-1497, 2009.
-
(2009)
Neural Networks
, vol.22
, pp. 1484-1497
-
-
Wawrzynski, P.1
-
38
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 229-256
-
-
Williams, R.J.1
-
39
-
-
0003585352
-
-
Prentice-Hall, Inc., Upper Saddle River, NJ, USA
-
Kemin Zhou, John C. Doyle, and Keith Glover. Robust and Optimal Control. Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1996. ISBN 0-13-456567-3.
-
(1996)
Robust and Optimal Control
-
-
Zhou, K.1
Doyle, J.C.2
Glover, K.3
|