-
2
-
-
85019172761
-
Learning to learn by gradient descent by gradient descent
-
Marcin Andrychowicz, Misha Denil, Sergio Gomez, Matthew W Hoffman, David Pfau, Tom Schaul, and Nando de Freitas. Learning to learn by gradient descent by gradient descent. In Advances in Neural Information Processing Systems, pp. 3981–3989, 2016.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 3981-3989
-
-
Andrychowicz, M.1
Denil, M.2
Gomez, S.3
Hoffman, M.W.4
Pfau, D.5
Schaul, T.6
De Freitas, N.7
-
3
-
-
85083954226
-
Emergent complexity via multi-agent competition
-
Trapit Bansal, Jakub Pachocki, Szymon Sidor, Ilya Sutskever, and Igor Mordatch. Emergent complexity via multi-agent competition. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=Sy0GnUxCb.
-
(2018)
International Conference on Learning Representations
-
-
Bansal, T.1
Pachocki, J.2
Sidor, S.3
Sutskever, I.4
Mordatch, I.5
-
4
-
-
85047008902
-
On the optimization of a synaptic learning rule
-
Univ. of Texas
-
Samy Bengio, Yoshua Bengio, Jocelyn Cloutier, and Jan Gecsei. On the optimization of a synaptic learning rule. In Preprints Conf. Optimality in Artificial and Biological Neural Networks, pp. 6–8. Univ. of Texas, 1992.
-
(1992)
Preprints Conf. Optimality in Artificial and Biological Neural Networks
, pp. 6-8
-
-
Bengio, S.1
Bengio, Y.2
Cloutier, J.3
Gecsei, J.4
-
5
-
-
84921824478
-
-
Université de Montréal, Département d’informatique et de recherche opérationnelle
-
Yoshua Bengio, Samy Bengio, and Jocelyn Cloutier. Learning a synaptic learning rule. Université de Montréal, Département d’informatique et de recherche opérationnelle, 1990.
-
(1990)
Learning A Synaptic Learning Rule
-
-
Bengio, Y.1
Bengio, S.2
Cloutier, J.3
-
7
-
-
84871781883
-
An overview of recent progress in the study of distributed multi-agent coordination
-
Yongcan Cao, Wenwu Yu, Wei Ren, and Guanrong Chen. An overview of recent progress in the study of distributed multi-agent coordination. IEEE Transactions on Industrial informatics, 9(1): 427–438, 2013.
-
(2013)
IEEE Transactions on Industrial Informatics
, vol.9
, Issue.1
, pp. 427-438
-
-
Cao, Y.1
Yu, W.2
Ren, W.3
Chen, G.4
-
8
-
-
1942470793
-
Multitask learning
-
Springer
-
Rich Caruana. Multitask learning. In Learning to learn, pp. 95–133. Springer, 1998.
-
(1998)
Learning to Learn
, pp. 95-133
-
-
Caruana, R.1
-
9
-
-
34147159616
-
Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents
-
Vincent Conitzer and Tuomas Sandholm. Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Machine Learning, 67(1-2):23–43, 2007.
-
(2007)
Machine Learning
, vol.67
, Issue.1-2
, pp. 23-43
-
-
Conitzer, V.1
Sandholm, T.2
-
10
-
-
84930637712
-
Robots that can adapt like animals
-
Antoine Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret. Robots that can adapt like animals. Nature, 521(7553):503–507, 2015.
-
(2015)
Nature
, vol.521
, Issue.7553
, pp. 503-507
-
-
Cully, A.1
Clune, J.2
Tarapore, D.3
Mouret, J.-B.4
-
11
-
-
34250692605
-
Dealing with nonstationary environments using context detection
-
Bruno C Da Silva, Eduardo W Basso, Ana LC Bazzan, and Paulo M Engel. Dealing with nonstationary environments using context detection. In Proceedings of the 23rd international conference on Machine learning, pp. 217–224. ACM, 2006.
-
(2006)
Proceedings of the 23rd International Conference on Machine Learning
, pp. 217-224
-
-
Da Silva, B.C.1
Basso, E.W.2
Bazzan, A.L.C.3
Engel, P.M.4
-
16
-
-
85046125163
-
-
arXiv preprint
-
Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926, 2017a.
-
(2017)
Counterfactual Multi-Agent Policy Gradients
-
-
Foerster, J.1
Farquhar, G.2
Afouras, T.3
Nardelli, N.4
Whiteson, S.5
-
17
-
-
85054801920
-
-
arXiv preprint
-
Jakob N Foerster, Richard Y Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326, 2017b.
-
(2017)
Learning with Opponent-Learning Awareness
-
-
Foerster, J.N.1
Chen, R.Y.2
Al-Shedivat, M.3
Whiteson, S.4
Abbeel, P.5
Mordatch, I.6
-
18
-
-
85083953531
-
Recasting gradient-based meta-learning as hierarchical bayes
-
Erin Grant, Chelsea Finn, Sergey Levine, Trevor Darrell, and Thomas Griffiths. Recasting gradient-based meta-learning as hierarchical bayes. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=BJ_UL-k0b.
-
(2018)
International Conference on Learning Representations
-
-
Grant, E.1
Finn, C.2
Levine, S.3
Darrell, T.4
Griffiths, T.5
-
22
-
-
84979924150
-
End-to-end training of deep visuomotor policies
-
Sergey Levine, Chelsea Finn, Trevor Darrell, and Pieter Abbeel. End-to-end training of deep visuomotor policies. Journal of Machine Learning Research, 17(39):1–40, 2016.
-
(2016)
Journal of Machine Learning Research
, vol.17
, Issue.39
, pp. 1-40
-
-
Levine, S.1
Finn, C.2
Darrell, T.3
Abbeel, P.4
-
23
-
-
84997050331
-
-
arXiv preprint
-
Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. Deep reinforcement learning for dialogue generation. arXiv preprint arXiv:1606.01541, 2016.
-
(2016)
Deep Reinforcement Learning for Dialogue Generation
-
-
Li, J.1
Monroe, W.2
Ritter, A.3
Galley, M.4
Gao, J.5
Jurafsky, D.6
-
25
-
-
85041351193
-
-
arXiv preprint
-
Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275, 2017.
-
(2017)
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
-
-
Lowe, R.1
Wu, Y.2
Tamar, A.3
Harb, J.4
Abbeel, P.5
Mordatch, I.6
-
26
-
-
77957064197
-
Catastrophic interference in connectionist networks: The sequential learning problem
-
Michael McCloskey and Neal J Cohen. Catastrophic interference in connectionist networks: The sequential learning problem. Psychology of learning and motivation, 24:109–165, 1989.
-
(1989)
Psychology of Learning and Motivation
, vol.24
, pp. 109-165
-
-
McCloskey, M.1
Cohen, N.J.2
-
28
-
-
84954123466
-
Never ending learning
-
Tom M Mitchell, William W Cohen, Estevam R Hruschka Jr, Partha Pratim Talukdar, Justin Betteridge, Andrew Carlson, Bhavana Dalvi Mishra, Matthew Gardner, Bryan Kisiel, Jayant Krishnamurthy, et al. Never ending learning. In AAAI, pp. 2302–2310, 2015.
-
(2015)
AAAI
, pp. 2302-2310
-
-
Mitchell, T.M.1
Cohen, W.W.2
Hruschka, E.R.3
Talukdar, P.P.4
Betteridge, J.5
Carlson, A.6
Mishra, B.D.7
Gardner, M.8
Kisiel, B.9
Krishnamurthy, J.10
-
29
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
-
30
-
-
85029221663
-
-
arXiv preprint
-
Peng Peng, Quan Yuan, Ying Wen, Yaodong Yang, Zhenkun Tang, Haitao Long, and Jun Wang. Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069, 2017.
-
(2017)
Multiagent Bidirectionally-Coordinated Nets for Learning to Play Starcraft Combat Games
-
-
Peng, P.1
Yuan, Q.2
Wen, Y.3
Yang, Y.4
Tang, Z.5
Long, H.6
Wang, J.7
-
33
-
-
0031189347
-
Child: A first step towards continual learning
-
Mark B Ring. CHILD: A first step towards continual learning. Machine Learning, 28(1):77–104, 1997.
-
(1997)
Machine Learning
, vol.28
, Issue.1
, pp. 77-104
-
-
Ring, M.B.1
-
34
-
-
85040308896
-
Meta-learning with memory-augmented neural networks
-
Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. Meta-learning with memory-augmented neural networks. In International conference on machine learning, pp. 1842–1850, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 1842-1850
-
-
Santoro, A.1
Bartunov, S.2
Botvinick, M.3
Wierstra, D.4
Lillicrap, T.5
-
35
-
-
0008006333
-
Evolutionary principles in self-referential learning
-
Diploma thesis, Institut f. Informatik, Tech. Univ. Munich
-
Jurgen Schmidhuber. Evolutionary principles in self-referential learning. On learning how to learn: The meta-meta-... hook.) Diploma thesis, Institut f. Informatik, Tech. Univ. Munich, 1987.
-
(1987)
On Learning How to Learn: The Meta-Meta-... Hook
-
-
Schmidhuber, J.1
-
36
-
-
0346377064
-
Learning to control fast-weight memories: An alternative to dynamic recurrent networks
-
Jürgen Schmidhuber. Learning to control fast-weight memories: An alternative to dynamic recurrent networks. Learning, 4(1), 1992.
-
(1992)
Learning
, vol.4
, Issue.1
-
-
Schmidhuber, J.1
-
37
-
-
84969963490
-
Trust region policy optimization
-
John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1889–1897, 2015a.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning (ICML-15)
, pp. 1889-1897
-
-
Schulman, J.1
Levine, S.2
Abbeel, P.3
Jordan, M.4
Moritz, P.5
-
38
-
-
84993963574
-
-
arXiv preprint
-
John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
-
(2015)
High-Dimensional Continuous Control Using Generalized Advantage Estimation
-
-
Schulman, J.1
Moritz, P.2
Levine, S.3
Jordan, M.4
Abbeel, P.5
-
39
-
-
85041194636
-
-
arXiv preprint
-
John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
-
(2017)
Proximal Policy Optimization Algorithms
-
-
Schulman, J.1
Wolski, F.2
Dhariwal, P.3
Radford, A.4
Klimov, O.5
-
40
-
-
84883265722
-
Lifelong machine learning systems: Beyond learning algorithms
-
Daniel L Silver, Qiang Yang, and Lianghao Li. Lifelong machine learning systems: Beyond learning algorithms. In AAAI Spring Symposium: Lifelong Machine Learning, volume 13, pp. 05, 2013.
-
(2013)
AAAI Spring Symposium: Lifelong Machine Learning
, vol.13
, pp. 05
-
-
Silver, D.L.1
Yang, Q.2
Li, L.3
-
41
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
44
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Richard S Sutton, David A McAllester, Satinder P Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In Advances in neural information processing systems, pp. 1057–1063, 2000.
-
(2000)
Advances in Neural Information Processing Systems
, pp. 1057-1063
-
-
Sutton, R.S.1
McAllester, D.A.2
Singh, S.P.3
Mansour, Y.4
-
46
-
-
0010687621
-
Lifelong learning algorithms
-
Sebastian Thrun. Lifelong learning algorithms. Learning to learn, 8:181–209, 1998.
-
(1998)
Learning to Learn
, vol.8
, pp. 181-209
-
-
Thrun, S.1
-
48
-
-
84872292044
-
MujoCo: A physics engine for model-based control
-
Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
-
(2012)
Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on
, pp. 5026-5033
-
-
Todorov, E.1
Erez, T.2
Tassa, Y.3
-
49
-
-
85018863845
-
Matching networks for one shot learning
-
Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. Matching networks for one shot learning. In Advances in Neural Information Processing Systems, pp. 3630–3638, 2016.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 3630-3638
-
-
Vinyals, O.1
Blundell, C.2
Lillicrap, T.3
Wierstra, D.4
-
50
-
-
85028474927
-
-
arXiv preprint
-
Jane X Wang, Zeb Kurth-Nelson, Dhruva Tirumala, Hubert Soyer, Joel Z Leibo, Remi Munos, Charles Blundell, Dharshan Kumaran, and Matt Botvinick. Learning to reinforcement learn. arXiv preprint arXiv:1611.05763, 2016.
-
(2016)
Learning to Reinforcement Learn
-
-
Wang, J.X.1
Kurth-Nelson, Z.2
Tirumala, D.3
Soyer, H.4
Leibo, J.Z.5
Munos, R.6
Blundell, C.7
Kumaran, D.8
Botvinick, M.9
-
51
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
-
52
-
-
85099723578
-
Multi-agent learning with policy prediction
-
Chongjie Zhang and Victor R Lesser. Multi-agent learning with policy prediction. In AAAI, 2010.
-
(2010)
AAAI
-
-
Zhang, C.1
Lesser, V.R.2
|