SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Reinforcement learning with unsupervised auxiliary tasks

(7) Jaderberg, Max a Mnih, Volodymyr a Czarnecki, Wojciech Marian a Schaul, Tom a Leibo, Joel Z a Silver, David a Kavukcuoglu, Koray a

a DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

DEEP LEARNING; INTELLIGENT AGENTS; MACHINE LEARNING;

EXTRINSIC REWARDS; FIRST PERSON; HUMAN PERFORMANCE; REINFORCEMENT LEARNING AGENT; REWARD FUNCTION; STATE OF THE ART; TRAINING SIGNAL;

REINFORCEMENT LEARNING;

EID: 85088229768 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (530)

References (34)

1
- 85030460808
- arXiv preprint
- André Barreto, Rémi Munos, Tom Schaul, and David Silver. Successor features for transfer in reinforcement learning. arXiv preprint arXiv:1606.05312, 2016.
- (2016) Successor Features for Transfer in Reinforcement Learning
- Barreto, A.¹ Munos, R.² Schaul, T.³ Silver, D.⁴

2
- 84998969754
- The arcade learning environment: An evaluation platform for general agents
- Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research, 2012.
- (2012) Journal of Artificial Intelligence Research
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

3
- 85070993309
- contributors
- OpenArena contributors. The openarena manual. 2005. URL http://openarena.wikia.com/wiki/Manual.
- (2005) The Openarena Manual

4
- 0001158047
- Improving generalization for temporal difference learning: The successor representation
- Peter Dayan. Improving generalization for temporal difference learning: The successor representation. Neural Computation, 5(4):613-624, 1993.
- (1993) Neural Computation , vol.5 , Issue.4 , pp. 613-624
- Dayan, P.¹

5
- 0034293152
- Learning to forget: Continual prediction with lstm
- Felix A Gers, Jürgen Schmidhuber, and Fred Cummins. Learning to forget: Continual prediction with lstm. Neural computation, 12(10):2451-2471, 2000.
- (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
- Gers, F.A.¹ Schmidhuber, J.² Cummins, F.³

6
- 85071026926
- Id software
- id software. Quake3. 1999. URL https://github.com/id-Software/Quake-III-Arena.
- (1999) Quake3

7
- 85015426298
- ´ arXiv preprint
- Michal Kempka, Marek Wydmuch, Grzegorz Runc, Jakub Toczek, and Wojciech Jaskowski. ´ Viz-doom: A doom-based ai research platform for visual reinforcement learning. arXiv preprint arXiv:1605.02097, 2016.
- (2016) Viz-Doom: A Doom-Based Ai Research Platform for Visual Reinforcement Learning
- Kempka, M.¹ Wydmuch, M.² Runc, G.³ Toczek, J.⁴ Jaskowski, W.⁵

8
- 80055032021
- Skill discovery in continuous reinforcement learning domains using skill chaining
- George Konidaris and Andre S Barreto. Skill discovery in continuous reinforcement learning domains using skill chaining. In Advances in Neural Information Processing Systems, pp. 1015-1023, 2009.
- (2009) Advances in Neural Information Processing Systems , pp. 1015-1023
- Konidaris, G.¹ Barreto, A.S.²

9
- 85041964859
- arXiv preprint
- Tejas D Kulkarni, Ardavan Saeedi, Simanta Gautam, and Samuel J Gershman. Deep successor reinforcement learning. arXiv preprint arXiv:1606.02396, 2016.
- (2016) Deep Successor Reinforcement Learning
- Kulkarni, T.D.¹ Saeedi, A.² Gautam, S.³ Gershman, S.J.⁴

10
- 85039903894
- CoRR, abs/1609.05521
- Guillaume Lample and Devendra Singh Chaplot. Playing FPS games with deep reinforcement learning. CoRR, abs/1609.05521, 2016.
- (2016) Playing FPS Games with Deep Reinforcement Learning
- Lample, G.¹ Chaplot, D.S.²

11
- 84998780737
- arXiv preprint
- Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, and Ji He. Recurrent reinforcement learning: A hybrid approach. arXiv preprint arXiv:1509.03044, 2015.
- (2015) Recurrent Reinforcement Learning: A Hybrid Approach
- Li, X.¹ Li, L.² Gao, J.³ He, X.⁴ Chen, J.⁵ Deng, L.⁶ He, J.⁷

12
- 0012331016
- Technical report, Carnegie Mellon University, School of Computer Science
- Long-Ji Lin and Tom M Mitchell. Memory approaches to reinforcement learning in non-markovian domains. Technical report, Carnegie Mellon University, School of Computer Science, 1992.
- (1992) Memory Approaches to Reinforcement Learning in Non-Markovian Domains
- Lin, L.-J.¹ Mitchell, T.M.²

13
- 85031110463
- Piotr Mirowski, Razvan Pascanu, Fabio Viola, Andrea Banino, Hubert Soyer, Andy Ballard, Misha Denil, Ross Goroshin, Laurent Sifre, Koray Kavukcuoglu, Dharshan Kumaran, and Raia Hadsell. Learning to navigate in complex environments. 2016.
- (2016) Learning to Navigate in Complex Environments
- Mirowski, P.¹ Pascanu, R.² Viola, F.³ Banino, A.⁴ Soyer, H.⁵ Ballard, A.⁶ Denil, M.⁷ Goroshin, R.⁸ Sifre, L.⁹ Kavukcuoglu, K.¹⁰ Kumaran, D.¹¹ Hadsell, R.¹²

14
- 84904867557
- Playing atari with deep reinforcement learning
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning. In NIPS Deep Learning Workshop. 2013.
- (2013) NIPS Deep Learning Workshop
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.⁷

15
- 84924051598
- Human-level control through deep reinforcement learning
- 02
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 02 2015. URL http://dx.doi.org/10.1038/nature14236.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

16
- 84999036937
- Asynchronous methods for deep reinforcement learning
- Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML), pp. 1928-1937, 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning (ICML) , pp. 1928-1937
- Mnih, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Lillicrap, T.P.⁵ Harley, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

17
- 84965178314
- Action-conditional video prediction using deep networks in atari games
- Junhyuk Oh, Xiaoxiao Guo, Honglak Lee, Richard L Lewis, and Satinder Singh. Action-conditional video prediction using deep networks in atari games. In Advances in Neural Information Processing Systems, pp. 2863-2871, 2015.
- (2015) Advances in Neural Information Processing Systems , pp. 2863-2871
- Oh, J.¹ Guo, X.² Lee, H.³ Lewis, R.L.⁴ Singh, S.⁵

18
- 84999048282
- arXiv preprint
- Junhyuk Oh, Valliappa Chockalingam, Satinder Singh, and Honglak Lee. Control of memory, active perception, and action in minecraft. arXiv preprint arXiv:1605.09128, 2016.
- (2016) Control of Memory, Active Perception, and Action in Minecraft
- Oh, J.¹ Chockalingam, V.² Singh, S.³ Lee, H.⁴

19
- 84937060789
- Hip-pocampal place cells construct reward related sequences through unexplored space
- H Freyja Olafsdottir, Caswell Barry, Aman B Saleem, Demis Hassabis, and Hugo J Spiers. Hip-pocampal place cells construct reward related sequences through unexplored space. Elife, 4: e06063, 2015.
- (2015) Elife , vol.4
- Freyja Olafsdottir, H.¹ Barry, C.² Saleem, A.B.³ Hassabis, D.⁴ Spiers, H.J.⁵

20
- 0000955979
- Incremental multi-step q-learning
- Jing Peng and Ronald J Williams. Incremental multi-step q-learning. Machine Learning, 22(1-3): 283-290, 1996.
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 283-290
- Peng, J.¹ Williams, R.J.²

21
- 84869780901
- The future of memory: Remembering, imagining, and the brain
- Daniel L Schacter, Donna Rose Addis, Demis Hassabis, Victoria C Martin, R Nathan Spreng, and Karl K Szpunar. The future of memory: remembering, imagining, and the brain. Neuron, 76(4): 677-694, 2012.
- (2012) Neuron , vol.76 , Issue.4 , pp. 677-694
- Schacter, D.L.¹ Addis, D.R.² Hassabis, D.³ Martin, V.C.⁴ Nathan Spreng, R.⁵ Szpunar, K.K.⁶

22
- 84969760283
- Universal value function approxima-tors
- Tom Schaul, Daniel Horgan, Karol Gregor, and David Silver. Universal value function approxima-tors. In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 1312-1320, 2015a.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML-15) , pp. 1312-1320
- Schaul, T.¹ Horgan, D.² Gregor, K.³ Silver, D.⁴

23
- 84980041049
- arXiv preprint
- Tom Schaul, John Quan, Ioannis Antonoglou, and David Silver. Prioritized experience replay. arXiv preprint arXiv:1511.05952, 2015b.
- (2015) Prioritized Experience Replay
- Schaul, T.¹ Quan, J.² Antonoglou, I.³ Silver, D.⁴

24
- 77956578648
- Formal theory of creativity, fun, and intrinsic motivation (1990-2010)
- Jürgen Schmidhuber. Formal theory of creativity, fun, and intrinsic motivation (1990-2010). IEEE Transactions on Autonomous Mental Development, 2(3):230-247, 2010.
- (2010) IEEE Transactions on Autonomous Mental Development , vol.2 , Issue.3 , pp. 230-247
- Schmidhuber, J.¹

25
- 84907555023
- arXiv preprint
- David Silver and Kamil Ciosek. Compositional planning using optimal option models. arXiv preprint arXiv:1206.6473, 2012.
- (2012) Compositional Planning Using Optimal Option Models
- Silver, D.¹ Ciosek, K.²

26
- 84963949906
- Mastering the game of go with deep neural networks and tree search
- David Silver, Aja Huang, Chris J Maddison, Arthur Guez, Laurent Sifre, George Van Den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, et al. Mastering the game of go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
- (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
- Silver, D.¹ Huang, A.² Maddison, C.J.³ Guez, A.⁴ Sifre, L.⁵ Van Den Driessche, G.⁶ Schrittwieser, J.⁷ Antonoglou, I.⁸ Panneershelvam, V.⁹ Lanctot, M.¹⁰

27
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Richard S Sutton, David A McAllester, Satinder P Singh, Yishay Mansour, et al. Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, pp. 1057-1063, 1999a.
- (1999) NIPS , vol.99 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.A.² Singh, S.P.³ Mansour, Y.⁴

28
- 0033170372
- Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning
- Richard S Sutton, Doina Precup, and Satinder Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learning. Artificial intelligence, 1999b.
- (1999) Artificial Intelligence
- Sutton, R.S.¹ Precup, D.² Singh, S.³

29
- 84899464022
- Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
- International Foundation for Autonomous Agents and Multiagent Systems
- Richard S Sutton, Joseph Modayil, Michael Delp, Thomas Degris, Patrick M Pilarski, Adam White, and Doina Precup. Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In The 10th International Conference on Autonomous Agents and Multiagent Systems-Volume 2, pp. 761-768. International Foundation for Autonomous Agents and Multiagent Systems, 2011.
- (2011) The 10th International Conference on Autonomous Agents and Multiagent Systems- , vol.2 , pp. 761-768
- Sutton, R.S.¹ Modayil, J.² Delp, M.³ Degris, T.⁴ Pilarski, P.M.⁵ White, A.⁶ Precup, D.⁷

30
- 85019201204
- arXiv preprint
- Chen Tessler, Shahar Givony, Tom Zahavy, Daniel J Mankowitz, and Shie Mannor. A deep hierarchical approach to lifelong learning in minecraft. arXiv preprint arXiv:1604.07255, 2016.
- (2016) A Deep Hierarchical Approach to Lifelong Learning in Minecraft
- Tessler, C.¹ Givony, S.² Zahavy, T.³ Mankowitz, D.J.⁴ Mannor, S.⁵

31
- 84998996757
- Dueling network architectures for deep reinforcement learning
- Z. Wang, N. de Freitas, and M. Lanctot. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML), 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning (ICML)
- Wang, Z.¹ De Freitas, N.² Lanctot, M.³

32
- 0004049893
- PhD thesis, University of Cambridge England
- Christopher John Cornish Hellaby Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

33
- 85013998007
- CoRR, abs/1509.06824
- Christopher Xie, Sachin Patil, Teodor Mihai Moldovan, Sergey Levine, and Pieter Abbeel. Model-based reinforcement learning with parametrized physical models and optimism-driven exploration. CoRR, abs/1509.06824, 2015.
- (2015) Model-Based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
- Xie, C.¹ Patil, S.² Moldovan, T.M.³ Levine, S.⁴ Abbeel, P.⁵

34
- 84998679057
- Graying the black box: Understanding dqns
- Tom Zahavy, Nir Ben Zrihem, and Shie Mannor. Graying the black box: Understanding dqns. In Proceedings of the 33rd International Conference on Machine Learning, 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning
- Zahavy, T.¹ Zrihem, N.B.² Mannor, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.