-
1
-
-
14344251217
-
Apprenticeship learning via inverse reinforcement learning
-
New York, NY, USA: ACM
-
Abbeel, P., and Ng, A. Y. 2004. Apprenticeship learning via inverse reinforcement learning. In Proceedings of the Twenty-first International Conference on Machine Learning, 1-. New York, NY, USA: ACM.
-
(2004)
Proceedings of the Twenty-First International Conference on Machine Learning
, pp. 1
-
-
Abbeel, P.1
Ng, A.Y.2
-
2
-
-
80053440459
-
Apprenticeship learning about multiple intentions
-
Babes, M.; Marivate, V.; Subramanian, K.; and Littman, M. L. 2011. Apprenticeship learning about multiple intentions. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), 897-904.
-
(2011)
Proceedings of the 28th International Conference on Machine Learning (ICML-11)
, pp. 897-904
-
-
Babes, M.1
Marivate, V.2
Subramanian, K.3
Littman, M.L.4
-
3
-
-
85030457046
-
The option-critic architecture
-
Bacon, P.-L.; Harb, J.; and Precup, D. 2017. The option-critic architecture. In AAAI, 1726-1734.
-
(2017)
AAAI
, pp. 1726-1734
-
-
Bacon, P.-L.1
Harb, J.2
Precup, D.3
-
5
-
-
85015444377
-
-
Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; and Zaremba, W. 2016. OpenAI Gym.
-
(2016)
OpenAI Gym
-
-
Brockman, G.1
Cheung, V.2
Pettersson, L.3
Schneider, J.4
Schulman, J.5
Tang, J.6
Zaremba, W.7
-
6
-
-
84877772241
-
Nonparametric Bayesian inverse reinforcement learning for multiple reward functions
-
Pereira, F.; Burges, C. J. C.; Bottou, L.; and Wein-berger, K. Q., eds, Curran Associates, Inc
-
Choi, J., and eung Kim, K. 2012. Nonparametric bayesian inverse reinforcement learning for multiple reward functions. In Pereira, F.; Burges, C. J. C.; Bottou, L.; and Wein-berger, K. Q., eds., Advances in Neural Information Processing Systems 25. Curran Associates, Inc. 305-313.
-
(2012)
Advances in Neural Information Processing Systems
, vol.25
, pp. 305-313
-
-
Choi, J.1
Eung Kim, K.2
-
7
-
-
85030460628
-
-
arXiv preprint
-
Christiano, P.; Shah, Z.; Mordatch, I.; Schneider, J.; Blackwell, T.; Tobin, J.; Abbeel, P.; and Zaremba, W. 2016. Transfer from simulation to real world through learning deep inverse dynamics model. arXiv preprint arXiv:1610.03518.
-
(2016)
Transfer from Simulation to Real World Through Learning Deep Inverse Dynamics Model
-
-
Christiano, P.1
Shah, Z.2
Mordatch, I.3
Schneider, J.4
Blackwell, T.5
Tobin, J.6
Abbeel, P.7
Zaremba, W.8
-
10
-
-
0002278788
-
Hierarchical reinforcement learning with the MAXQ value function decomposition
-
Dietterich, T. G. 2000. Hierarchical reinforcement learning with the MAXQ value function decomposition. Journal of Artificial Intelligence Research 13:227-303.
-
(2000)
Journal of Artificial Intelligence Research
, vol.13
, pp. 227-303
-
-
Dietterich, T.G.1
-
11
-
-
84937849144
-
Generative adversarial nets
-
Goodfellow, I. J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; and Bengio, Y. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems, 2672-2680.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 2672-2680
-
-
Goodfellow, I.J.1
Pouget-Abadie, J.2
Mirza, M.3
Xu, B.4
Warde-Farley, D.5
Ozair, S.6
Courville, A.7
Bengio, Y.8
-
12
-
-
85047005895
-
-
arXiv preprint
-
Hausman, K.; Chebotar, Y.; Schaal, S.; Sukhatme, G.; and Lim, J. 2017. Multi-modal imitation learning from unstructured demonstrations using generative adversarial nets. arXiv preprint arXiv:1705.10479.
-
(2017)
Multi-Modal Imitation Learning from Unstructured Demonstrations Using Generative Adversarial Nets
-
-
Hausman, K.1
Chebotar, Y.2
Schaal, S.3
Sukhatme, G.4
Lim, J.5
-
13
-
-
85059142263
-
Benchmark environments for multitask learning in continuous domains
-
Henderson, P.; Chang, W.-D.; Shkurti, F.; Hansen, J.; Meger, D.; and Dudek, G. 2017. Benchmark environments for multitask learning in continuous domains. ICML Lifelong Learning: A Reinforcement Learning Approach Workshop.
-
(2017)
ICML Lifelong Learning: A Reinforcement Learning Approach Workshop
-
-
Henderson, P.1
Chang, W.-D.2
Shkurti, F.3
Hansen, J.4
Meger, D.5
Dudek, G.6
-
15
-
-
0001940458
-
Adaptive mixtures of local experts
-
Jacobs, R. A.; Jordan, M. I.; Nowlan, S. J.; and Hinton, G. E. 1991. Adaptive mixtures of local experts. Neural computation 3(1):79-87.
-
(1991)
Neural Computation
, vol.3
, Issue.1
, pp. 79-87
-
-
Jacobs, R.A.1
Jordan, M.I.2
Nowlan, S.J.3
Hinton, G.E.4
-
16
-
-
85044518938
-
-
arXiv preprint
-
Krishnan, S.; Garg, A.; Liaw, R.; Miller, L.; Pokorny, F. T.; and Goldberg, K. 2016. Hirl: Hierarchical inverse reinforcement learning for long-horizon tasks with delayed rewards. arXiv preprint arXiv:1604.06508.
-
(2016)
Hirl: Hierarchical Inverse Reinforcement Learning for Long-Horizon Tasks with Delayed Rewards
-
-
Krishnan, S.1
Garg, A.2
Liaw, R.3
Miller, L.4
Pokorny, F.T.5
Goldberg, K.6
-
19
-
-
85047019729
-
-
arXiv preprint
-
Merel, J.; Tassa, Y.; Srinivasan, S.; Lemmon, J.; Wang, Z.; Wayne, G.; and Heess, N. 2017. Learning human behaviors from motion capture by adversarial imitation. arXiv preprint arXiv:1707.02201.
-
(2017)
Learning Human Behaviors from Motion Capture by Adversarial Imitation
-
-
Merel, J.1
Tassa, Y.2
Srinivasan, S.3
Lemmon, J.4
Wang, Z.5
Wayne, G.6
Heess, N.7
-
20
-
-
84971448181
-
Asynchronous methods for deep reinforcement learning
-
Mnih, V.; Badia, A. P.; Mirza, M.; Graves, A.; Lillicrap, T.; Harley, T.; Silver, D.; and Kavukcuoglu, K. 2016. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, 1928-1937.
-
(2016)
International Conference on Machine Learning
, pp. 1928-1937
-
-
Mnih, V.1
Badia, A.P.2
Mirza, M.3
Graves, A.4
Lillicrap, T.5
Harley, T.6
Silver, D.7
Kavukcuoglu, K.8
-
21
-
-
0042547347
-
Algorithms for inverse reinforcement learning
-
San Francisco, CA, USA: Morgan Kaufmann Publishers Inc
-
Ng, A. Y., and Russell, S. J. 2000. Algorithms for inverse reinforcement learning. In Proceedings of the Seventeenth International Conference on Machine Learning, 663-670. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
-
(2000)
Proceedings of the Seventeenth International Conference on Machine Learning
, pp. 663-670
-
-
Ng, A.Y.1
Russell, S.J.2
-
22
-
-
84969963490
-
Trust region policy optimization
-
Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; and Moritz, P. 2015. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 1889-1897.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning (ICML-15)
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.4
Moritz, P.5
-
23
-
-
85083954383
-
High-dimensional continuous control using generalized advantage estimation
-
Schulman, J.; Moritz, P.; Levine, S.; Jordan, M.; and Abbeel, P. 2016. High-dimensional continuous control using generalized advantage estimation. In Proceedings of the International Conference on Learning Representations (ICLR).
-
(2016)
Proceedings of the International Conference on Learning Representations (ICLR)
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
24
-
-
85041194636
-
-
arXiv preprint
-
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; and Klimov, O. 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
-
(2017)
Proximal Policy Optimization Algorithms
-
-
Schulman, J.1
Wolski, F.2
Dhariwal, P.3
Radford, A.4
Klimov, O.5
-
26
-
-
85088226307
-
-
arXiv preprint
-
Shazeer, N.; Mirhoseini, A.; Maziarz, K.; Davis, A.; Le, Q.; Hinton, G.; and Dean, J. 2017. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538.
-
(2017)
Outrageously Large Neural Networks: the Sparsely-Gated Mixture-of-Experts Layer
-
-
Shazeer, N.1
Mirhoseini, A.2
Maziarz, K.3
Davis, A.4
Le, Q.5
Hinton, G.6
Dean, J.7
-
27
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Sutton, R. S.; McAllester, D. A.; Singh, S. P.; and Mansour, Y. 2000. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, 1057-1063.
-
(2000)
Advances in Neural Information Processing Systems
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
28
-
-
0033170372
-
Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
-
Sutton, R. S.; Precup, D.; and Singh, S. 1999. between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial intelligence 112(1-2):181-211.
-
(1999)
Artificial Intelligence
, vol.112
, Issue.1-2
, pp. 181-211
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
29
-
-
84872292044
-
MujoCo: A physics engine for model-based control
-
Vilamoura, Algarve, Portugal, October 7-12, 2012
-
Todorov, E.; Erez, T.; and Tassa, Y. 2012. MuJoCo: A physics engine for model-based control. In 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012, Vilamoura, Algarve, Portugal, October 7-12, 2012, 5026-5033.
-
(2012)
2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2012
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
30
-
-
85033441620
-
-
arXiv preprint
-
van Seijen, H.; Fatemi, M.; Romoff, J.; Laroche, R.; Barnes, T.; and Tsang, J. 2017. Hybrid reward architecture for reinforcement learning. arXiv preprint arXiv:1706.04208.
-
(2017)
Hybrid Reward Architecture for Reinforcement Learning
-
-
Van Seijen, H.1
Fatemi, M.2
Romoff, J.3
Laroche, R.4
Barnes, T.5
Tsang, J.6
-
31
-
-
85049556018
-
-
arXiv preprint
-
Wang, Z.; Merel, J.; Reed, S.; Wayne, G.; de Freitas, N.; and Heess, N. 2017. Robust imitation of diverse behaviors. arXiv preprint arXiv:1707.02747.
-
(2017)
Robust Imitation of Diverse Behaviors
-
-
Wang, Z.1
Merel, J.2
Reed, S.3
Wayne, G.4
De Freitas, N.5
Heess, N.6
-
32
-
-
57749097473
-
Maximum entropy inverse reinforcement learning
-
AAAI Press
-
Ziebart, B. D.; Maas, A.; Bagnell, J. A.; and Dey, A. K. 2008. Maximum entropy inverse reinforcement learning. In Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, 1433-1438. AAAI Press.
-
(2008)
Proceedings of the 23rd National Conference on Artificial Intelligence -
, vol.3
, pp. 1433-1438
-
-
Ziebart, B.D.1
Maas, A.2
Bagnell, J.A.3
Dey, A.K.4
|