SCOPUS 정보 검색 플랫폼

Volumn 518, Issue 7540, 2015, Pages 529-533

Human-level control through deep reinforcement learning

(19) Mnih, Volodymyr a Kavukcuoglu, Koray a Silver, David a Rusu, Andrei A a Veness, Joel a Bellemare, Marc G a Graves, Alex a Riedmiller, Martin a Fidjeland, Andreas K a Ostrovski, Georg a Petersen, Stig a Beattie, Charles a Sadik, Amir a Antonoglou, Ioannis a King, Helen a Kumaran, Dharshan a Wierstra, Daan a Legg, Shane a Hassabis, Demis a

a DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ARTIFICIAL INTELLIGENCE; BEHAVIORAL ECOLOGY; GAME THEORY; HARMONIC ANALYSIS; HIERARCHICAL SYSTEM; NEUROLOGY; PARAMETERIZATION; SENSORY SYSTEM;

ANIMAL BEHAVIOR; ARTICLE; HUMAN; LEARNING ALGORITHM; NONHUMAN; PRIORITY JOURNAL; REINFORCEMENT; SENSORY STIMULATION; ALGORITHM; ARTIFICIAL INTELLIGENCE; ARTIFICIAL NEURAL NETWORK; PSYCHOLOGICAL MODEL; RECREATION; REWARD;

ANIMALIA;

ALGORITHMS; ARTIFICIAL INTELLIGENCE; HUMANS; MODELS, PSYCHOLOGICAL; NEURAL NETWORKS (COMPUTER); REINFORCEMENT (PSYCHOLOGY); REWARD; VIDEO GAMES;

EID: 84924051598 PISSN: 00280836 EISSN: 14764687 Source Type: Journal
DOI: 10.1038/nature14236 Document Type: Article

Times cited : (27782)

References (33)

1
- 0004102479
- MIT Press
- Sutton, R. & Barto, A. Reinforcement Learning: An Introduction (MIT Press, 1998).
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

2
- 0003809346
- Macmillan
- Thorndike, E. L. Animal Intelligence: Experimental studies (Macmillan, 1911).
- (1911) Animal Intelligence: Experimental Studies
- Thorndike E. ., L.¹

3
- 0030896968
- A neural substrate of prediction and reward
- Schultz, W., Dayan, P. & Montague, P. R. A neural substrate of prediction and reward. Science 275, 1593-1599 (1997).
- (1997) Science , vol.275 , pp. 1593-1599
- Schultz, W.¹ Dayan, P.² Montague, P.R.³

4
- 24644511277
- Object recognition with features inspired by visual cortex
- Serre, T., Wolf, L. & Poggio, T. Object recognition with features inspired by visual cortex. Proc. IEEE. Comput. Soc. Conf. Comput. Vis. Pattern. Recognit. 994-1000 (2005).
- (2005) Proc. IEEE. Comput. Soc. Conf. Comput. Vis. Pattern. Recognit. , pp. 994-1000
- Serre, T.¹ Wolf, L.² Poggio, T.³

5
- 0019152630
- Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
- Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193-202 (1980).
- (1980) Biol. Cybern. , vol.36 , pp. 193-202
- Fukushima, K.¹

6
- 0029276036
- Temporal difference learning and TD-Gammon
- Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58-68 (1995).
- (1995) Commun. ACM , vol.38 , pp. 58-68
- Tesauro, G.¹

7
- 67650996818
- Reinforcement learning for robot soccer
- Riedmiller, M., Gabel, T., Hafner, R. & Lange, S. Reinforcement learning for robot soccer. Auton. Robots 27, 55-73 (2009).
- (2009) Auton. Robots , vol.27 , pp. 55-73
- Riedmiller, M.¹ Gabel, T.² Hafner, R.³ Lange, S.⁴

8
- 56449093331
- An object-oriented representation for efficient reinforcement learning
- Diuk, C., Cohen, A. & Littman, M. L. An object-oriented representation for efficient reinforcement learning. Proc. Int. Conf. Mach. Learn. 240-247 (2008).
- (2008) Proc. Int. Conf. Mach. Learn. , pp. 240-247
- Diuk, C.¹ Cohen, A.² Littman, M.L.³

9
- 69349090197
- Learning deep architectures for AI
- Bengio, Y. Learning deep architectures for AI. Foundations and Trends in Machine Learning 2, 1-127 (2009).
- (2009) Foundations and Trends in Machine Learning , vol.2 , pp. 1-127
- Bengio, Y.¹

10
- 84878919540
- ImageNet classification with deep convolutional neural networks
- Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. Adv.Neural Inf. Process. Syst.25, 1106-1114 (2012).
- (2012) Adv.Neural Inf. Process. Syst. , vol.25 , pp. 1106-1114
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.³

11
- 33746600649
- Reducing the dimensionality of data with neural networks
- Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 313, 504-507 (2006).
- (2006) Science , vol.313 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

12
- 84879976780
- The arcade learning environment: An evaluation platform for general agents
- Bellemare, M. G., Naddaf, Y., Veness, J. & Bowling, M. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res. 47, 253-279 (2013).
- (2013) J. Artif. Intell. Res. , vol.47 , pp. 253-279
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

13
- 36749026673
- Universal Intelligence: A definition of machine intelligence
- Legg, S. & Hutter, M. Universal Intelligence: a definition of machine intelligence. Minds Mach. 17, 391-444 (2007).
- (2007) Minds Mach. , vol.17 , pp. 391-444
- Legg, S.¹ Hutter, M.²

14
- 21344445698
- General game playing: Overview of the AAAI competition
- Genesereth, M., Love, N. & Pell, B. General game playing: overview of the AAAI competition. AI Mag. 26, 62-72 (2005).
- (2005) AI Mag. , vol.26 , pp. 62-72
- Genesereth, M.¹ Love, N.² Pell, B.³

15
- 84868289914
- Investigating contingency awareness using Atari 2600 games
- Bellemare, M. G., Veness, J. & Bowling, M. Investigating contingency awareness using Atari 2600 games. Proc. Conf. AAAI. Artif. Intell. 864-871 (2012).
- (2012) Proc. Conf. AAAI. Artif. Intell. , pp. 864-871
- Bellemare, M.G.¹ Veness, J.² Bowling, M.³

16
- 0003444646
- MIT Press
- McClelland, J. L., Rumelhart, D. E. & Group, T. P. R. Parallel Distributed Processing: Explorations in the Microstructure of Cognition (MIT Press, 1986).
- (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition
- McClelland, J.L.¹ Rumelhart, D.E.²

17
- 0032203257
- Gradient-based learning applied to document recognition
- LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278-2324 (1998).
- (1998) Proc. IEEE , vol.86 , pp. 2278-2324
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

18
- 0345285364
- Shape and arrangement of columns in cat?s striate cortex
- Hubel, D. H. & Wiesel, T. N. Shape and arrangement of columns in cat?s striate cortex. J. Physiol. 165, 559-568 (1963).
- (1963) J. Physiol. , vol.165 , pp. 559-568
- Hubel, D.H.¹ Wiesel, T.N.²

19
- 34249833101
- Q-learning
- Watkins, C. J. & Dayan, P. Q-learning. Mach. Learn. 8, 279-292 (1992).
- (1992) Mach. Learn. , vol.8 , pp. 279-292
- Watkins, C.J.¹ Dayan, P.²

20
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J. & Roy, B. V. An analysis of temporal-difference learning with function approximation. IEEE Trans. Automat. Contr. 42, 674-690 (1997).
- (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 674-690
- Tsitsiklis, J.¹ Roy, B.V.²

21
- 0029340352
- Why there are complementary learning systems in the hippocampus and neocortex: Insights fromthe successes and failures of connectionist models of learning and memory
- McClelland, J. L., McNaughton, B. L.& O?Reilly, R. C.Why there are complementary learning systems in the hippocampus and neocortex: insights fromthe successes and failures of connectionist models of learning and memory. Psychol. Rev. 102, 419-457 (1995).
- (1995) Psychol. Rev. , vol.102 , pp. 419-457
- McClelland, J.L.¹ McNaughton, B.L.² Oreilly, R.C.³

22
- 77952892063
- Play it again: Reactivation of waking experience and memory
- O?Neill, J., Pleydell-Bouverie, B., Dupret, D. & Csicsvari, J. Play it again: reactivation of waking experience and memory. Trends Neurosci. 33, 220-229 (2010).
- (2010) Trends Neurosci. , vol.33 , pp. 220-229
- Oneill, J.¹ Pleydell-Bouverie, B.² Dupret, D.³ Csicsvari, J.⁴

23
- 0003673017
- Technical Report, DTIC Document
- Lin, L.-J. Reinforcement learning for robots using neural networks. Technical Report, DTIC Document (1993).
- (1993) Reinforcement Learning for Robots Using Neural Networks
- Lin, L.-J.¹

24
- 33646398129
- Neural fitted Q iteration - First experiences with a data efficient neural reinforcement learning method
- Springer
- Riedmiller, M. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. Mach. Learn.: ECML, 3720, 317-328 (Springer, 2005).
- (2005) Mach. Learn.: ECML , vol.3720 , pp. 317-328
- Riedmiller, M.¹

25
- 57249084011
- Visualizing high-dimensional data using t-SNE
- Van der Maaten, L. J. P. & Hinton, G. E. Visualizing high-dimensional data using t-SNE. J. Mach. Learn. Res. 9, 2579-2605 (2008).
- (2008) J. Mach. Learn. Res. , vol.9 , pp. 2579-2605
- Van Der Maaten, L.J.P.¹ Hinton, G.E.²

26
- 79959451979
- Deep auto-encoder neural networks in reinforcement learning
- Lange, S. & Riedmiller, M. Deep auto-encoder neural networks in reinforcement learning. Proc. Int. Jt. Conf. Neural. Netw. 1-8 (2010).
- (2010) Proc. Int. Jt. Conf. Neural. Netw. , pp. 1-8
- Lange, S.¹ Riedmiller, M.²

27
- 67349117811
- Reinforcement learning can account for associative and perceptual learning on a visual decision task
- Law, C.-T. & Gold, J. I. Reinforcement learning can account for associative and perceptual learning on a visual decision task. Nature Neurosci. 12, 655 (2009).
- (2009) Nature Neurosci. , vol.12 , pp. 655
- Law, C.-T.¹ Gold, J.I.²

28
- 0037122807
- Visual categorization shapes feature selectivity in the primate temporal cortex
- Sigala, N. & Logothetis, N. K. Visual categorization shapes feature selectivity in the primate temporal cortex. Nature 415, 318-320 (2002).
- (2002) Nature , vol.415 , pp. 318-320
- Sigala, N.¹ Logothetis, N.K.²

29
- 84866690401
- Biasing the content of hippocampal replay during sleep
- Bendor, D.& Wilson, M. A. Biasing the content of hippocampal replay during sleep. Nature Neurosci. 15, 1439-1444 (2012).
- (2012) Nature Neurosci. , vol.15 , pp. 1439-1444
- Bendor, D.¹ Wilson, M.A.²

30
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- Moore, A.&Atkeson, C. Prioritized sweeping: reinforcement learning with less data and less real time. Mach. Learn. 13, 103-130 (1993).
- (1993) Mach. Learn. , vol.13 , pp. 103-130
- Moore, A.¹ Atkeson, C.²

31
- 77953183471
- Hat is the bestmulti-stage architecture for object recognition?
- Jarrett, K., Kavukcuoglu, K., Ranzato, M. A.&LeCun, Y.What is the bestmulti-stage architecture for object recognition? Proc. IEEE. Int. Conf. Comput. Vis. 2146-2153 (2009).
- (2009) Proc. IEEE Int. Conf. Comput. Vis. , pp. 2146-2153
- Jarrett, K.¹ Kavukcuoglu, K.² Ranzato, M.A.³ Lecun, Y.⁴

32
- 77956509090
- Rectified linear units improve restricted Boltzmann machines
- Nair, V. & Hinton, G. E. Rectified linear units improve restricted Boltzmann machines. Proc. Int. Conf. Mach. Learn. 807-814 (2010).
- (2010) Proc. Int. Conf. Mach. Learn. , pp. 807-814
- Nair, V.¹ Hinton, G.E.²

33
- 0032073263
- Planning and acting in partially observable stochastic domains
- Kaelbling, L. P., Littman, M. L. & Cassandra, A. R. Planning and acting in partially observable stochastic domains. Artificial Intelligence 101, 99-134 (1994).
- (1994) Artificial Intelligence , vol.101 , pp. 99-134
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.