-
1
-
-
85054761730
-
-
arXiv preprint
-
Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. arXiv preprint arXiv:1707.01495, 2017.
-
(2017)
Hindsight Experience Replay
-
-
Andrychowicz, M.1
Wolski, F.2
Ray, A.3
Schneider, J.4
Fong, R.5
Welinder, P.6
McGrew, B.7
Tobin, J.8
Abbeel, P.9
Zaremba, W.10
-
2
-
-
40949147745
-
A comprehensive survey of multiagent reinforcement learning
-
2008
-
Lucian Busoniu, Robert Babuska, and Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008, 2008.
-
(2008)
IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews
, vol.38
, Issue.2
-
-
Busoniu, L.1
Babuska, R.2
De Schutter, B.3
-
3
-
-
1942470793
-
Multitask learning
-
Springer
-
Rich Caruana. Multitask learning. In Learning to learn, pp. 95–133. Springer, 1998.
-
(1998)
Learning to Learn
, pp. 95-133
-
-
Caruana, R.1
-
4
-
-
85019241632
-
Benchmarking deep reinforcement learning for continuous control
-
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, pp. 1329–1338, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 1329-1338
-
-
Duan, Y.1
Chen, X.2
Houthooft, R.3
Schulman, J.4
Abbeel, P.5
-
5
-
-
85054801920
-
-
arXiv preprint
-
Jakob Foerster, Richard Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326, 2017a.
-
(2017)
Learning with Opponent-Learning Awareness
-
-
Foerster, J.1
Chen, R.2
Al-Shedivat, M.3
Whiteson, S.4
Abbeel, P.5
Mordatch, I.6
-
6
-
-
85046125163
-
-
arXiv preprint
-
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926, 2017b.
-
(2017)
Counterfactual Multi-Agent Policy Gradients
-
-
Foerster, J.1
Farquhar, G.2
Afouras, T.3
Nardelli, N.4
Whiteson, S.5
-
7
-
-
84937849144
-
Generative adversarial nets
-
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 2672-2680
-
-
Goodfellow, I.1
Pouget-Abadie, J.2
Mirza, M.3
Xu, B.4
Warde-Farley, D.5
Ozair, S.6
Courville, A.7
Bengio, Y.8
-
8
-
-
85040911431
-
Opponent modeling in deep reinforcement learning
-
He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daumé III. Opponent modeling in deep reinforcement learning. In International Conference on Machine Learning, pp. 1804–1813, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 1804-1813
-
-
He, H.1
Boyd-Graber, J.2
Kwok, K.3
Daumé, H.4
-
9
-
-
85044446086
-
-
arXiv preprint
-
Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
-
(2017)
Emergence of Locomotion Behaviours in Rich Environments
-
-
Heess, N.1
Sriram, S.2
Lemmon, J.3
Merel, J.4
Wayne, G.5
Tassa, Y.6
Erez, T.7
Wang, Z.8
Eslami, A.9
Riedmiller, M.10
-
12
-
-
84965135289
-
-
arXiv preprint
-
Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
-
(2015)
Continuous Control with Deep Reinforcement Learning
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
Heess, N.4
Erez, T.5
Tassa, Y.6
Silver, D.7
Wierstra, D.8
-
14
-
-
85018878907
-
Stein variational gradient descent: A general purpose Bayesian inference algorithm
-
Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances In Neural Information Processing Systems, pp. 2378–2386, 2016.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 2378-2386
-
-
Liu, Q.1
Wang, D.2
-
15
-
-
85041351193
-
-
arXiv preprint
-
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275, 2017.
-
(2017)
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
-
-
Lowe, R.1
Wu, Y.2
Tamar, A.3
Harb, J.4
Abbeel, P.5
Mordatch, I.6
-
16
-
-
84857861863
-
Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems
-
Laetitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(1):1–31, 2012.
-
(2012)
The Knowledge Engineering Review
, vol.27
, Issue.1
, pp. 1-31
-
-
Matignon, L.1
Laurent, G.J.2
Le Fort-Piat, N.3
-
17
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Belle-mare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Belle-Mare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
18
-
-
84971448181
-
Asynchronous methods for deep reinforcement learning
-
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 1928-1937
-
-
Mnih, V.1
Badia, A.P.2
Mirza, M.3
Graves, A.4
Lillicrap, T.5
Harley, T.6
Silver, D.7
Kavukcuoglu, K.8
-
19
-
-
85062216559
-
-
OpenAI. OpenAI Dota 2 1v1 bot, 2017. URL https://openai.com/the-international/.
-
(2017)
OpenAI Dota 2 1v1 Bot
-
-
-
20
-
-
26444601262
-
Cooperative multi-agent learning: The state of the art
-
Liviu Panait and Sean Luke. Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems, 11(3):387–434, 2005.
-
(2005)
Autonomous Agents and Multi-Agent Systems
, vol.11
, Issue.3
, pp. 387-434
-
-
Panait, L.1
Luke, S.2
-
21
-
-
85028023087
-
Supervision via competition: Robot adversaries for learning tasks
-
Lerrel Pinto, James Davidson, and Abhinav Gupta. Supervision via competition: Robot adversaries for learning tasks. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 1601–1608. IEEE, 2017.
-
(2017)
Robotics and Automation (ICRA), 2017 IEEE International Conference on
, pp. 1601-1608
-
-
Pinto, L.1
Davidson, J.2
Gupta, A.3
-
22
-
-
84969963490
-
Trust region policy optimization
-
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1889–1897, 2015a.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning (ICML-15)
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.4
Moritz, P.5
-
23
-
-
84993963574
-
-
arXiv preprint
-
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
-
(2015)
High-Dimensional Continuous Control Using Generalized Advantage Estimation
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
24
-
-
85041194636
-
-
arXiv preprint
-
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
-
(2017)
Proximal Policy Optimization Algorithms
-
-
Schulman, J.1
Wolski, F.2
Dhariwal, P.3
Radford, A.4
Klimov, O.5
-
25
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
29
-
-
85017018413
-
Multiagent cooperation and competition with deep reinforcement learning
-
Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
-
(2017)
PloS One
, vol.12
, Issue.4
-
-
Tampuu, A.1
Matiisen, T.2
Kodelja, D.3
Kuzovkin, I.4
Korjus, K.5
Aru, J.6
Aru, J.7
Vicente, R.8
-
31
-
-
0029276036
-
Temporal difference learning and td-gammon
-
Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3): 58–68, 1995.
-
(1995)
Communications of the ACM
, vol.38
, Issue.3
, pp. 58-68
-
-
Tesauro, G.1
-
32
-
-
84872292044
-
MujoCo: A physics engine for model-based control
-
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
33
-
-
77956285052
-
Character animation in two-player adversarial games
-
Kevin Wampler, Erik Andersen, Evan Herbst, Yongjoon Lee, and Zoran Popović. Character animation in two-player adversarial games. ACM Transactions on Graphics (TOG), 29(3):26, 2010.
-
(2010)
ACM Transactions on Graphics (TOG)
, vol.29
, Issue.3
, pp. 26
-
-
Wampler, K.1
Andersen, E.2
Herbst, E.3
Lee, Y.4
Popović, Z.5
-
34
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
|