메뉴 건너뛰기




Volumn 550, Issue 7676, 2017, Pages 354-359

Mastering the game of Go without human knowledge

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHM; ARTIFICIAL INTELLIGENCE; ARTIFICIAL NEURAL NETWORK; KNOWLEDGE; SUPERVISED LEARNING;

EID: 85031918331     PISSN: 00280836     EISSN: 14764687     Source Type: Journal    
DOI: 10.1038/nature24270     Document Type: Article
Times cited : (9897)

References (69)
  • 3
    • 84876231242 scopus 로고    scopus 로고
    • ImageNet classification with deep convolutional neural networks
    • (eds Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.)
    • Krizhevsky, A., Sutskever, I. & Hinton, G. ImageNet classification with deep convolutional neural networks. In Adv. Neural Inf. Process. Syst. Vol. 25 (eds Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q.) 1097-1105 (2012).
    • (2012) Adv. Neural Inf. Process. Syst. , vol.25 , pp. 1097-1105
    • Krizhevsky, A.1    Sutskever, I.2    Hinton, G.3
  • 6
    • 84924051598 scopus 로고    scopus 로고
    • Human-level control through deep reinforcement learning
    • Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529-533 (2015).
    • (2015) Nature , vol.518 , pp. 529-533
    • Mnih, V.1
  • 7
    • 84937779024 scopus 로고    scopus 로고
    • Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
    • (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.)
    • Guo, X., Singh, S. P., Lee, H., Lewis, R. L. & Wang, X. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning. In Adv. Neural Inf. Process. Syst. Vol. 27 (eds Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N. D. & Weinberger, K. Q.) 3338-3346 (2014).
    • (2014) Adv. Neural Inf. Process. Syst. , vol.27 , pp. 3338-3346
    • Guo, X.1    Singh, S.P.2    Lee, H.3    Lewis, R.L.4    Wang, X.5
  • 8
    • 84971448181 scopus 로고    scopus 로고
    • Asynchronous methods for deep reinforcement learning
    • (eds Balcan, M. F. & Weinberger, K. Q.)
    • Mnih, V. et al. Asynchronous methods for deep reinforcement learning. In Proc. 33rd Int. Conf. Mach. Learn. Vol. 48 (eds Balcan, M. F. & Weinberger, K. Q.) 1928-1937 (2016).
    • (2016) Proc. 33rd Int. Conf. Mach. Learn. , vol.48 , pp. 1928-1937
    • Mnih, V.1
  • 9
  • 12
    • 84963949906 scopus 로고    scopus 로고
    • Mastering the game of Go with deep neural networks and tree search
    • Silver, D. et al. Mastering the game of Go with deep neural networks and tree search. Nature 529, 484-489 (2016).
    • (2016) Nature , vol.529 , pp. 484-489
    • Silver, D.1
  • 13
    • 34547971839 scopus 로고    scopus 로고
    • Efficient selectivity and backup operators in Monte-Carlo tree search
    • (eds Ciancarini, P. & van den Herik, H. J.)
    • Coulom, R. Efficient selectivity and backup operators in Monte-Carlo tree search. In 5th Int. Conf. Computers and Games (eds Ciancarini, P. & van den Herik, H. J.) 72-83 (2006).
    • (2006) 5th Int. Conf. Computers and Games , pp. 72-83
    • Coulom, R.1
  • 15
    • 84858960516 scopus 로고    scopus 로고
    • A survey of Monte Carlo tree search methods
    • Browne, C. et al. A survey of Monte Carlo tree search methods. IEEE Trans. Comput. Intell. AI Games 4, 1-49 (2012).
    • (2012) IEEE Trans. Comput. Intell. AI Games , vol.4 , pp. 1-49
    • Browne, C.1
  • 16
    • 0019152630 scopus 로고
    • Neocognitron: A self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position
    • Fukushima, K. Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193-202 (1980).
    • (1980) Biol. Cybern. , vol.36 , pp. 193-202
    • Fukushima, K.1
  • 18
    • 84969584486 scopus 로고    scopus 로고
    • Batch normalization: Accelerating deep network training by reducing internal covariate shift
    • Ioffe, S. & Szegedy, C. Batch normalization: accelerating deep network training by reducing internal covariate shift. In Proc. 32nd Int. Conf. Mach. Learn. Vol. 37 448-456 (2015).
    • (2015) Proc. 32nd Int. Conf. Mach. Learn. , vol.37 , pp. 448-456
    • Ioffe, S.1    Szegedy, C.2
  • 19
    • 0034702306 scopus 로고    scopus 로고
    • Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit
    • Hahnloser, R. H. R., Sarpeshkar, R., Mahowald, M. A., Douglas, R. J. & Seung, H. S. Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405, 947-951 (2000).
    • (2000) Nature , vol.405 , pp. 947-951
    • Hahnloser, R.H.R.1    Sarpeshkar, R.2    Mahowald, M.A.3    Douglas, R.J.4    Seung, H.S.5
  • 22
    • 79960439729 scopus 로고    scopus 로고
    • Approximate policy iteration: A survey and some new methods
    • Bertsekas, D. P. Approximate policy iteration: a survey and some new methods. J. Control Theory Appl. 9, 310-335 (2011).
    • (2011) J. Control Theory Appl. , vol.9 , pp. 310-335
    • Bertsekas, D.P.1
  • 23
    • 85031913482 scopus 로고    scopus 로고
    • Approximate policy iteration schemes: A comparison
    • Scherrer, B. Approximate policy iteration schemes: a comparison. In Proc. 31st Int. Conf. Mach. Learn. Vol. 32 1314-1322 (2014).
    • (2014) Proc. 31st Int. Conf. Mach. Learn. , vol.32 , pp. 1314-1322
    • Scherrer, B.1
  • 24
    • 82355173286 scopus 로고    scopus 로고
    • Multi-armed bandits with episode context
    • Rosin, C. D. Multi-armed bandits with episode context. Ann. Math. Artif. Intell. 61, 203-230 (2011).
    • (2011) Ann. Math. Artif. Intell. , vol.61 , pp. 203-230
    • Rosin, C.D.1
  • 25
    • 55249086085 scopus 로고    scopus 로고
    • Whole-history rating: A Bayesian rating system for players of time-varying strength
    • (eds van den Herik, H. J., Xu, X. Ma, Z. & Winands, M. H. M.) Springer
    • Coulom, R. Whole-history rating: a Bayesian rating system for players of time-varying strength. In Int. Conf. Comput. Games (eds van den Herik, H. J., Xu, X. Ma, Z. & Winands, M. H. M.) Vol. 5131 113-124 (Springer, 2008).
    • (2008) Int. Conf. Comput. Games , vol.5131 , pp. 113-124
    • Coulom, R.1
  • 27
    • 85038115383 scopus 로고    scopus 로고
    • Stabilising experience replay for deep multi-agent reinforcement learning
    • Foerster, J. N. et al. Stabilising experience replay for deep multi-agent reinforcement learning. In Proc. 34th Int. Conf. Mach. Learn. Vol. 70 1146-1155 (2017).
    • (2017) Proc. 34th Int. Conf. Mach. Learn. , vol.70 , pp. 1146-1155
    • Foerster, J.N.1
  • 29
    • 85025594365 scopus 로고    scopus 로고
    • In-datacenter performance analysis of a Tensor Processing Unit
    • Jouppi, N. P. et al. In-datacenter performance analysis of a Tensor Processing Unit. Proc. 44th Annu. Int. Symp. Comp. Architecture Vol. 17 1-12 (2017).
    • (2017) Proc. 44th Annu. Int. Symp. Comp. Architecture , vol.17 , pp. 1-12
    • Jouppi, N.P.1
  • 31
    • 84969920322 scopus 로고    scopus 로고
    • Training deep convolutional neural networks to play Go
    • Clark, C. & Storkey, A. J. Training deep convolutional neural networks to play Go. In Proc. 32nd Int. Conf. Mach. Learn. Vol. 37 1766-1774 (2015).
    • (2015) Proc. 32nd Int. Conf. Mach. Learn. , vol.37 , pp. 1766-1774
    • Clark, C.1    Storkey, A.J.2
  • 32
    • 85083953106 scopus 로고    scopus 로고
    • Better computer Go player with neural network and long-term prediction
    • Tian, Y. & Zhu, Y. Better computer Go player with neural network and long-term prediction. In 4th Int. Conf. Learn. Representations (2016).
    • (2016) 4th Int. Conf. Learn. Representations
    • Tian, Y.1    Zhu, Y.2
  • 35
    • 2442603180 scopus 로고
    • Monte Carlo matrix inversion and reinforcement learning
    • Barto, A. G. & Duff, M. Monte Carlo matrix inversion and reinforcement learning. Adv. Neural Inf. Process. Syst. 6, 687-694 (1994).
    • (1994) Adv. Neural Inf. Process. Syst. , vol.6 , pp. 687-694
    • Barto, A.G.1    Duff, M.2
  • 36
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Singh, S. P. & Sutton, R. S. Reinforcement learning with replacing eligibility traces. Mach. Learn. 22, 123-158 (1996).
    • (1996) Mach. Learn. , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 37
    • 1942420814 scopus 로고    scopus 로고
    • Reinforcement learning as classification: Leveraging modern classifiers
    • Lagoudakis, M. G. & Parr, R. Reinforcement learning as classification: leveraging modern classifiers. In Proc. 20th Int. Conf. Mach. Learn. 424-431 (2003).
    • (2003) Proc. 20th Int. Conf. Mach. Learn. , pp. 424-431
    • Lagoudakis, M.G.1    Parr, R.2
  • 38
    • 84962317462 scopus 로고    scopus 로고
    • Approximate modified policy iteration and its application to the game of Tetris
    • Scherrer, B., Ghavamzadeh, M., Gabillon, V., Lesner, B. & Geist, M. Approximate modified policy iteration and its application to the game of Tetris. J. Mach. Learn. Res. 16, 1629-1676 (2015).
    • (2015) J. Mach. Learn. Res. , vol.16 , pp. 1629-1676
    • Scherrer, B.1    Ghavamzadeh, M.2    Gabillon, V.3    Lesner, B.4    Geist, M.5
  • 39
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • Littman, M. L. Markov games as a framework for multi-agent reinforcement learning. In Proc. 11th Int. Conf. Mach. Learn. 157-163 (1994).
    • (1994) Proc. 11th Int. Conf. Mach. Learn. , pp. 157-163
    • Littman, M.L.1
  • 41
    • 85031899055 scopus 로고    scopus 로고
    • (eds Van Den Herik, H. J., Iida, H. & Heinz, E. A.)
    • Enzenberger, M. in Advances in Computer Games (eds Van Den Herik, H. J., Iida, H. & Heinz, E. A.) 97-108 (2003).
    • (2003) Advances in Computer Games , pp. 97-108
    • Enzenberger, M.1
  • 42
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R. Learning to predict by the method of temporal differences. Mach. Learn. 3, 9-44 (1988).
    • (1988) Mach. Learn. , vol.3 , pp. 9-44
    • Sutton, R.1
  • 44
    • 84863416482 scopus 로고    scopus 로고
    • Temporal-difference search in computer Go
    • Silver, D., Sutton, R. & Müller, M. Temporal-difference search in computer Go. Mach. Learn. 87, 183-219 (2012).
    • (2012) Mach. Learn. , vol.87 , pp. 183-219
    • Silver, D.1    Sutton, R.2    Müller, M.3
  • 46
    • 79956202655 scopus 로고    scopus 로고
    • Monte-Carlo tree search and rapid action value estimation in computer Go
    • Gelly, S. & Silver, D. Monte-Carlo tree search and rapid action value estimation in computer Go. Artif. Intell. 175, 1856-1875 (2011).
    • (2011) Artif. Intell. , vol.175 , pp. 1856-1875
    • Gelly, S.1    Silver, D.2
  • 47
    • 38849139064 scopus 로고    scopus 로고
    • Computing Elo ratings of move patterns in the game of Go
    • Coulom, R. Computing Elo ratings of move patterns in the game of Go. Int. Comput. Games Assoc. J. 30, 198-208 (2007).
    • (2007) Int. Comput. Games Assoc. J. , vol.30 , pp. 198-208
    • Coulom, R.1
  • 49
    • 0034275416 scopus 로고    scopus 로고
    • Learning to play chess using temporal differences
    • Baxter, J., Tridgell, A. & Weaver, L. Learning to play chess using temporal differences. Mach. Learn. 40, 243-263 (2000).
    • (2000) Mach. Learn. , vol.40 , pp. 243-263
    • Baxter, J.1    Tridgell, A.2    Weaver, L.3
  • 52
    • 0038145011 scopus 로고    scopus 로고
    • Temporal difference learning applied to a high-performance game-playing program
    • Schaeffer, J., Hlynka, M. & Jussila, V. Temporal difference learning applied to a high-performance game-playing program. In Proc. 17th Int. Jt Conf. Artif. Intell. Vol. 1 529-534 (2001).
    • (2001) Proc. 17th Int. Jt Conf. Artif. Intell. , vol.1 , pp. 529-534
    • Schaeffer, J.1    Hlynka, M.2    Jussila, V.3
  • 53
    • 0000985504 scopus 로고
    • TD-gammon, a self-teaching backgammon program, achieves master-level play
    • Tesauro, G. TD-gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6, 215-219 (1994).
    • (1994) Neural Comput. , vol.6 , pp. 215-219
    • Tesauro, G.1
  • 54
    • 84956863737 scopus 로고    scopus 로고
    • From simple features to sophisticated evaluation functions
    • Buro, M. From simple features to sophisticated evaluation functions. In Proc. 1st Int. Conf. Comput. Games 126-145 (1999).
    • (1999) Proc. 1st Int. Conf. Comput. Games , pp. 126-145
    • Buro, M.1
  • 55
    • 0036146034 scopus 로고    scopus 로고
    • World-championship-caliber scrabble
    • Sheppard, B. World-championship-caliber Scrabble. Artif. Intell. 134, 241-275 (2002).
    • (2002) Artif. Intell. , vol.134 , pp. 241-275
    • Sheppard, B.1
  • 56
    • 85014477370 scopus 로고    scopus 로고
    • DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
    • Moravčík, M. et al. DeepStack: expert-level artificial intelligence in heads-up no-limit poker. Science 356, 508-513 (2017).
    • (2017) Science , vol.356 , pp. 508-513
    • Moravčík, M.1
  • 57
    • 84898992015 scopus 로고    scopus 로고
    • On-line policy improvement using Monte-Carlo search
    • Tesauro, G & Galperin, G. On-line policy improvement using Monte-Carlo search. In Adv. Neural Inf. Process. Syst. 1068-1074 (1996).
    • (1996) Adv. Neural Inf. Process. Syst. , pp. 1068-1074
    • Tesauro, G.1    Galperin, G.2
  • 58
    • 0025559238 scopus 로고
    • Neurogammon: A neural-network backgammon program
    • Tesauro, G. Neurogammon: a neural-network backgammon program. In Proc. Int. Jt Conf. Neural Netw. Vol. 3, 33-39 (1990).
    • (1990) Proc. Int. Jt Conf. Neural Netw. , vol.3 , pp. 33-39
    • Tesauro, G.1
  • 59
    • 0001201757 scopus 로고
    • Some studies in machine learning using the game of checkers II-recent progress
    • Samuel, A. L. Some studies in machine learning using the game of checkers II-recent progress. IBM J. Res. Develop. 11, 601-617 (1967).
    • (1967) IBM J. Res. Develop. , vol.11 , pp. 601-617
    • Samuel, A.L.1
  • 60
    • 84884276459 scopus 로고    scopus 로고
    • Reinforcement learning in robotics: A survey
    • Kober, J., Bagnell, J. A. & Peters, J. Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32, 1238-1274 (2013).
    • (2013) Int. J. Robot. Res. , vol.32 , pp. 1238-1274
    • Kober, J.1    Bagnell, J.A.2    Peters, J.3
  • 64
    • 33646357710 scopus 로고    scopus 로고
    • Empirical comparison of various reinforcement learning strategies for sequential targeted marketing
    • Abe, N. et al. Empirical comparison of various reinforcement learning strategies for sequential targeted marketing. In IEEE Int. Conf. Data Mining 3-10 (2002).
    • (2002) IEEE Int. Conf. Data Mining , pp. 3-10
    • Abe, N.1
  • 67
    • 0036149616 scopus 로고    scopus 로고
    • Computer go
    • Müller, M. Computer Go. Artif. Intell. 134, 145-179 (2002).
    • (2002) Artif. Intell. , vol.134 , pp. 145-179
    • Müller, M.1
  • 68
    • 84949985138 scopus 로고    scopus 로고
    • Taking the human out of the loop: A review of Bayesian optimization
    • Shahriari, B., Swersky, K., Wang, Z., Adams, R. P. & de Freitas, N. Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104, 148-175 (2016).
    • (2016) Proc. IEEE , vol.104 , pp. 148-175
    • Shahriari, B.1    Swersky, K.2    Wang, Z.3    Adams, R.P.4    De Freitas, N.5
  • 69
    • 79952022478 scopus 로고    scopus 로고
    • On the scalability of parallel UCT
    • Segal, R. B. On the scalability of parallel UCT. Comput. Games 6515, 36-47 (2011).
    • (2011) Comput. Games , vol.6515 , pp. 36-47
    • Segal, R.B.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.