SCOPUS 정보 검색 플랫폼

6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings

Volumn , Issue , 2018, Pages

Emergent complexity via multi-agent competition

(5) Bansal, Trapit a Pachocki, Jakub b Sidor, Szymon b Sutskever, Ilya b Mordatch, Igor b

a UNIVERSITY OF MASSACHUSETTS (United States)

b OpenAI LLC (United States)

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

COMPLEX ENVIRONMENTS; EMERGENT COMPLEXITY; LEVEL OF DIFFICULTIES; MULTI AGENT; MULTI-AGENT ENVIRONMENT; SELF-PLAY; SKILL LEVELS;

MULTI AGENT SYSTEMS;

EID: 85083954226 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (183)

References (34)

1
- 85054761730
- arXiv preprint
- Marcin Andrychowicz, Filip Wolski, Alex Ray, Jonas Schneider, Rachel Fong, Peter Welinder, Bob McGrew, Josh Tobin, Pieter Abbeel, and Wojciech Zaremba. Hindsight experience replay. arXiv preprint arXiv:1707.01495, 2017.
- (2017) Hindsight Experience Replay
- Andrychowicz, M.¹ Wolski, F.² Ray, A.³ Schneider, J.⁴ Fong, R.⁵ Welinder, P.⁶ McGrew, B.⁷ Tobin, J.⁸ Abbeel, P.⁹ Zaremba, W.¹⁰

2
- 40949147745
- A comprehensive survey of multiagent reinforcement learning
- 2008
- Lucian Busoniu, Robert Babuska, and Bart De Schutter. A comprehensive survey of multiagent reinforcement learning. IEEE Transactions on Systems, Man, And Cybernetics-Part C: Applications and Reviews, 38 (2), 2008, 2008.
- (2008) IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews , vol.38 , Issue.2
- Busoniu, L.¹ Babuska, R.² De Schutter, B.³

3
- 1942470793
- Multitask learning
- Springer
- Rich Caruana. Multitask learning. In Learning to learn, pp. 95–133. Springer, 1998.
- (1998) Learning to Learn , pp. 95-133
- Caruana, R.¹

4
- 85019241632
- Benchmarking deep reinforcement learning for continuous control
- Yan Duan, Xi Chen, Rein Houthooft, John Schulman, and Pieter Abbeel. Benchmarking deep reinforcement learning for continuous control. In International Conference on Machine Learning, pp. 1329–1338, 2016.
- (2016) International Conference on Machine Learning , pp. 1329-1338
- Duan, Y.¹ Chen, X.² Houthooft, R.³ Schulman, J.⁴ Abbeel, P.⁵

5
- 85054801920
- arXiv preprint
- Jakob Foerster, Richard Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, and Igor Mordatch. Learning with opponent-learning awareness. arXiv preprint arXiv:1709.04326, 2017a.
- (2017) Learning with Opponent-Learning Awareness
- Foerster, J.¹ Chen, R.² Al-Shedivat, M.³ Whiteson, S.⁴ Abbeel, P.⁵ Mordatch, I.⁶

6
- 85046125163
- arXiv preprint
- Jakob Foerster, Gregory Farquhar, Triantafyllos Afouras, Nantas Nardelli, and Shimon Whiteson. Counterfactual multi-agent policy gradients. arXiv preprint arXiv:1705.08926, 2017b.
- (2017) Counterfactual Multi-Agent Policy Gradients
- Foerster, J.¹ Farquhar, G.² Afouras, T.³ Nardelli, N.⁴ Whiteson, S.⁵

7
- 84937849144
- Generative adversarial nets
- Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 2672-2680
- Goodfellow, I.¹ Pouget-Abadie, J.² Mirza, M.³ Xu, B.⁴ Warde-Farley, D.⁵ Ozair, S.⁶ Courville, A.⁷ Bengio, Y.⁸

8
- 85040911431
- Opponent modeling in deep reinforcement learning
- He He, Jordan Boyd-Graber, Kevin Kwok, and Hal Daumé III. Opponent modeling in deep reinforcement learning. In International Conference on Machine Learning, pp. 1804–1813, 2016.
- (2016) International Conference on Machine Learning , pp. 1804-1813
- He, H.¹ Boyd-Graber, J.² Kwok, K.³ Daumé, H.⁴

9
- 85044446086
- arXiv preprint
- Nicolas Heess, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, Ali Eslami, Martin Riedmiller, et al. Emergence of locomotion behaviours in rich environments. arXiv preprint arXiv:1707.02286, 2017.
- (2017) Emergence of Locomotion Behaviours in Rich Environments
- Heess, N.¹ Sriram, S.² Lemmon, J.³ Merel, J.⁴ Wayne, G.⁵ Tassa, Y.⁶ Erez, T.⁷ Wang, Z.⁸ Eslami, A.⁹ Riedmiller, M.¹⁰

10
- 85015264671
- arXiv preprint
- Johannes Heinrich and David Silver. Deep reinforcement learning from self-play in imperfect-information games. arXiv preprint arXiv:1603.01121, 2016.
- (2016) Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
- Heinrich, J.¹ Silver, D.²

11
- 84941620184
- arXiv preprint
- Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

12
- 84965135289
- arXiv preprint
- Timothy P Lillicrap, Jonathan J Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, and Daan Wierstra. Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- (2015) Continuous Control with Deep Reinforcement Learning
- Lillicrap, T.P.¹ Hunt, J.J.² Pritzel, A.³ Heess, N.⁴ Erez, T.⁵ Tassa, Y.⁶ Silver, D.⁷ Wierstra, D.⁸

13
- 85149834820
- Markov games as a framework for multi-agent reinforcement learning
- Michael L Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the eleventh international conference on machine learning, volume 157, pp. 157–163, 1994.
- (1994) Proceedings of the Eleventh International Conference on Machine Learning , vol.157 , pp. 157-163
- Littman, M.L.¹

14
- 85018878907
- Stein variational gradient descent: A general purpose Bayesian inference algorithm
- Qiang Liu and Dilin Wang. Stein variational gradient descent: A general purpose bayesian inference algorithm. In Advances In Neural Information Processing Systems, pp. 2378–2386, 2016.
- (2016) Advances in Neural Information Processing Systems , pp. 2378-2386
- Liu, Q.¹ Wang, D.²

15
- 85041351193
- arXiv preprint
- Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, and Igor Mordatch. Multi-agent actor-critic for mixed cooperative-competitive environments. arXiv preprint arXiv:1706.02275, 2017.
- (2017) Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments
- Lowe, R.¹ Wu, Y.² Tamar, A.³ Harb, J.⁴ Abbeel, P.⁵ Mordatch, I.⁶

16
- 84857861863
- Independent reinforcement learners in cooperative Markov games: A survey regarding coordination problems
- Laetitia Matignon, Guillaume J Laurent, and Nadine Le Fort-Piat. Independent reinforcement learners in cooperative Markov games: a survey regarding coordination problems. The Knowledge Engineering Review, 27(1):1–31, 2012.
- (2012) The Knowledge Engineering Review , vol.27 , Issue.1 , pp. 1-31
- Matignon, L.¹ Laurent, G.J.² Le Fort-Piat, N.³

17
- 84924051598
- Human-level control through deep reinforcement learning
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Belle-mare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Belle-Mare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰

18
- 84971448181
- Asynchronous methods for deep reinforcement learning
- Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In International Conference on Machine Learning, pp. 1928–1937, 2016.
- (2016) International Conference on Machine Learning , pp. 1928-1937
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

19
- 85062216559
- OpenAI. OpenAI Dota 2 1v1 bot, 2017. URL https://openai.com/the-international/.
- (2017) OpenAI Dota 2 1v1 Bot

20
- 26444601262
- Cooperative multi-agent learning: The state of the art
- Liviu Panait and Sean Luke. Cooperative multi-agent learning: The state of the art. Autonomous agents and multi-agent systems, 11(3):387–434, 2005.
- (2005) Autonomous Agents and Multi-Agent Systems , vol.11 , Issue.3 , pp. 387-434
- Panait, L.¹ Luke, S.²

21
- 85028023087
- Supervision via competition: Robot adversaries for learning tasks
- Lerrel Pinto, James Davidson, and Abhinav Gupta. Supervision via competition: Robot adversaries for learning tasks. In Robotics and Automation (ICRA), 2017 IEEE International Conference on, pp. 1601–1608. IEEE, 2017.
- (2017) Robotics and Automation (ICRA), 2017 IEEE International Conference on , pp. 1601-1608
- Pinto, L.¹ Davidson, J.² Gupta, A.³

22
- 84969963490
- Trust region policy optimization
- John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1889–1897, 2015a.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML-15) , pp. 1889-1897
- Schulman, J.¹ Levine, S.² Abbeel, P.³ Jordan, M.⁴ Moritz, P.⁵

23
- 84993963574
- arXiv preprint
- John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015b.
- (2015) High-Dimensional Continuous Control Using Generalized Advantage Estimation
- Schulman, J.¹ Moritz, P.² Levine, S.³ Jordan, M.⁴ Abbeel, P.⁵

24
- 85041194636
- arXiv preprint
- John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017.
- (2017) Proximal Policy Optimization Algorithms
- Schulman, J.¹ Wolski, F.² Dhariwal, P.³ Radford, A.⁴ Klimov, O.⁵

25
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484–489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

26
- 85028095895
- Evolving virtual creatures
- Karl Sims. Evolving virtual creatures. In Proceedings of the 21st annual conference on Computer graphics and interactive techniques, pp. 15–22. ACM, 1994.
- (1994) Proceedings of the 21st Annual Conference on Computer Graphics and Interactive Techniques , pp. 15-22
- Sims, K.¹

27
- 4344679259
- Competitive coevolution through evolutionary com-plexification
- Kenneth O Stanley and Risto Miikkulainen. Competitive coevolution through evolutionary com-plexification. Journal of Artificial Intelligence Research, 21:63–100, 2004.
- (2004) Journal of Artificial Intelligence Research , vol.21 , pp. 63-100
- Stanley, K.O.¹ Miikkulainen, R.²

28
- 85046937806
- arXiv preprint
- Sainbayar Sukhbaatar, Ilya Kostrikov, Arthur Szlam, and Rob Fergus. Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407, 2017.
- (2017) Intrinsic Motivation and Automatic Curricula Via Asymmetric Self-Play
- Sukhbaatar, S.¹ Kostrikov, I.² Szlam, A.³ Fergus, R.⁴

29
- 85017018413
- Multiagent cooperation and competition with deep reinforcement learning
- Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, and Raul Vicente. Multiagent cooperation and competition with deep reinforcement learning. PloS one, 12(4):e0172395, 2017.
- (2017) PloS One , vol.12 , Issue.4
- Tampuu, A.¹ Matiisen, T.² Kodelja, D.³ Kuzovkin, I.⁴ Korjus, K.⁵ Aru, J.⁶ Aru, J.⁷ Vicente, R.⁸

30
- 85152198941
- Multi-agent reinforcement learning: Independent vs. Cooperative agents
- Ming Tan. Multi-agent reinforcement learning: Independent vs. cooperative agents. In Proceedings of the tenth international conference on machine learning, pp. 330–337, 1993.
- (1993) Proceedings of the Tenth International Conference on Machine Learning , pp. 330-337
- Tan, M.¹

31
- 0029276036
- Temporal difference learning and td-gammon
- Gerald Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3): 58–68, 1995.
- (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

32
- 84872292044
- MujoCo: A physics engine for model-based control
- Emanuel Todorov, Tom Erez, and Yuval Tassa. Mujoco: A physics engine for model-based control. In Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, pp. 5026–5033. IEEE, 2012.
- (2012) Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on , pp. 5026-5033
- Todorov, E.¹ Erez, T.² Tassa, Y.³

33
- 77956285052
- Character animation in two-player adversarial games
- Kevin Wampler, Erik Andersen, Evan Herbst, Yongjoon Lee, and Zoran Popović. Character animation in two-player adversarial games. ACM Transactions on Graphics (TOG), 29(3):26, 2010.
- (2010) ACM Transactions on Graphics (TOG) , vol.29 , Issue.3 , pp. 26
- Wampler, K.¹ Andersen, E.² Herbst, E.³ Lee, Y.⁴ Popović, Z.⁵

34
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Ronald J Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229–256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.