메뉴 건너뛰기




Volumn 33, Issue 6, 2016, Pages 701-717

Review of deep reinforcement learning and discussions on the development of computer Go

Author keywords

AlphaGo; Artificial intelligence; Deep learning; Deep reinforcement learning; Reinforcement learning

Indexed keywords

ARTIFICIAL INTELLIGENCE; CURRICULA; DECISION MAKING;

EID: 84979285126     PISSN: 10008152     EISSN: None     Source Type: Journal    
DOI: 10.7641/CTA.2016.60173     Document Type: Review
Times cited : (109)

References (120)
  • 1
    • 84924051598 scopus 로고    scopus 로고
    • Human-level control through deep reinforcement learning
    • MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
    • (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
    • Mnih, V.1    Kavukcuoglu, K.2    Silver, D.3
  • 2
    • 84963949906 scopus 로고    scopus 로고
    • Mastering the game of Go with deep neural networks and tree search
    • SILVER D, HUANG A, MADDISON C, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
    • (2016) Nature , vol.529 , Issue.7587 , pp. 484-489
    • Silver, D.1    Huang, A.2    Maddison, C.3
  • 3
    • 85016056483 scopus 로고    scopus 로고
    • Deep reinforcement learning as foundation for artificial general intelligence
    • Amsterdam: Atlantis Press
    • AREL I. Deep reinforcement learning as foundation for artificial general intelligence[M]// Theoretical Foundations of Artificial General Intelligence. Amsterdam: Atlantis Press, 2012: 89-102.
    • (2012) Theoretical Foundations of Artificial General Intelligence , pp. 89-102
    • Arel, I.1
  • 4
    • 0000985504 scopus 로고
    • TD-Gammon, a self-teaching backgammon program, achieves master-level play
    • TEAAURO G. TD-Gammon, a self-teaching backgammon program, achieves master-level play[J]. Neural Computation, 1994, 6(2): 215-219.
    • (1994) Neural Computation , vol.6 , Issue.2 , pp. 215-219
    • Teaauro, G.1
  • 6
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • KEARNS M, SINGH S. Near-optimal reinforcement learning in polynomial time[J]. Machine Learning, 2002, 49(2/3): 209-232.
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.1    Singh, S.2
  • 8
    • 84930647882 scopus 로고    scopus 로고
    • Reinforcement learning improves behaviour from evaluative feedback
    • LITTMAN M L. Reinforcement learning improves behaviour from evaluative feedback[J]. Nature, 2015, 521(7553): 445-451.
    • (2015) Nature , vol.521 , Issue.7553 , pp. 445-451
    • Littman, M.L.1
  • 10
    • 0002557583 scopus 로고
    • Advanced forecasting methods for global crisis warning and models of intelligence
    • WERBOS P J. Advanced forecasting methods for global crisis warning and models of intelligence[J]. General Systems Yearbook, 1977, 22(12): 25-38.
    • (1977) General Systems Yearbook , vol.22 , Issue.12 , pp. 25-38
    • Werbos, P.J.1
  • 11
    • 0004049893 scopus 로고
    • Learning from delayed rewards
    • Cambridge: University of Cambridge
    • WATKINS C J C H. Learning from delayed rewards[D]. Cambridge: University of Cambridge, 1989.
    • (1989)
    • Watkins, C.J.C.H.1
  • 15
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • LEWIS F L, VRABIE D. Reinforcement learning and adaptive dynamic programming for feedback control[J]. IEEE Circuits and Systems Magazine, 2009, 9(3): 32-50.
    • (2009) IEEE Circuits and Systems Magazine , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 17
    • 85011514513 scopus 로고    scopus 로고
    • Monte Carlo and quasi-Monte Carlo methods
    • CAFLISCH R E. Monte Carlo and quasi-Monte Carlo methods[J]. Acta Numerica, 1998, 7: 1-49.
    • (1998) Acta Numerica , vol.7 , pp. 1-49
    • Caflisch, R.E.1
  • 18
    • 0000827179 scopus 로고
    • BOXES: An experiment in adaptive control
    • MICHIE D, CHAMBERS R A. BOXES: An experiment in adaptive control[J]. Machine Intelligence, 1968, 2(2): 137-152.
    • (1968) Machine Intelligence , vol.2 , Issue.2 , pp. 137-152
    • Michie, D.1    Chambers, R.A.2
  • 21
    • 80054025121 scopus 로고    scopus 로고
    • Monte-Carlo tree search
    • Maastricht: Maastricht Universiteit
    • CHASLOTH G. Monte-Carlo tree search[D]. Maastricht: Maastricht Universiteit, 2010.
    • (2010)
    • Chasloth, G.1
  • 22
    • 34547971839 scopus 로고    scopus 로고
    • Efficient selectivity and backup operators in Monte-Carlo tree search
    • Berlin Heidelberg: Springer
    • COULOM R. Efficient selectivity and backup operators in Monte-Carlo tree search[M]// Computers and Games. Berlin Heidelberg: Springer, 2006: 72-83.
    • (2006) Computers and Games , pp. 72-83
    • Coulom, R.1
  • 23
    • 84983805477 scopus 로고    scopus 로고
    • A new discrete-time iterative adaptive dynamic programming algorithm based on Q-learning
    • New York: Springer
    • WEI Q L, LIU D R. A new discrete-time iterative adaptive dynamic programming algorithm based on Q-learning[M]// International Symposium on Neural Networks. New York: Springer, 2015: 43-52.
    • (2015) International Symposium on Neural Networks , pp. 43-52
    • Wei, Q.L.1    Liu, D.R.2
  • 24
    • 84924872284 scopus 로고    scopus 로고
    • A novel dual iterative-learning method for optimal battery management in smart residential environments
    • WEI Q L, LIU D R, SHI G. A novel dual iterative-learning method for optimal battery management in smart residential environments[J]. IEEE Transactions on Industrial Electronics, 2015, 62(4): 2509-2518.
    • (2015) IEEE Transactions on Industrial Electronics , vol.62 , Issue.4 , pp. 2509-2518
    • Wei, Q.L.1    Liu, D.R.2    Shi, G.3
  • 25
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • JAAKKOLA T, JORDAN M I, SINGH S P. On the convergence of stochastic iterative dynamic programming algorithms[J]. Neural Computation, 1994, 6(6): 1185-1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 26
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Qlearning
    • TSITSIKLIS J N. Asynchronous stochastic approximation and Qlearning[J]. Machine Learning, 1994, 16(3): 185-202.
    • (1994) Machine Learning , vol.16 , Issue.3 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 28
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • SINGH S, JAAKKOLA T, LITTMAN M L, et al. Convergence results for single-step on-policy reinforcement-learning algorithms[J]. Machine Learning, 2000, 38(3): 287-308.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3
  • 29
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1): 9-44.
    • (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 31
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
    • Williams, R.J.1
  • 32
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • Denver: MIT Press
    • SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Advances in Neural Information Processing Systems. Denver: MIT Press, 1999, 99: 1057-1063.
    • (1999) Advances in Neural Information Processing Systems , vol.99 , pp. 1057-1063
    • Sutton, R.S.1    Mcallester, D.A.2    Singh, S.P.3
  • 33
    • 84897594646 scopus 로고    scopus 로고
    • Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems
    • LIU D R, WEI Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621-634.
    • (2014) IEEE Transactions on Neural Networks and Learning Systems , vol.25 , Issue.3 , pp. 621-634
    • Liu, D.R.1    Wei, Q.L.2
  • 35
    • 84960449514 scopus 로고    scopus 로고
    • Model-free optimal control for affine nonlinear systems with convergence analysis
    • ZHAO D B, XIA Z P, WANG D. Model-free optimal control for affine nonlinear systems with convergence analysis[J]. IEEE Transactions on Automation Science and Engineering, 2015, 12(4): 1461-1468.
    • (2015) IEEE Transactions on Automation Science and Engineering , vol.12 , Issue.4 , pp. 1461-1468
    • Zhao, D.B.1    Xia, Z.P.2    Wang, D.3
  • 36
    • 85027924045 scopus 로고    scopus 로고
    • MEC-a near-optimal online reinforcement learning algorithm for continuous deterministic systems
    • ZHAO D B, ZHU Y H. MEC-a near-optimal online reinforcement learning algorithm for continuous deterministic systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(2): 346-356.
    • (2015) IEEE Transactions on Neural Networks and Learning Systems , vol.26 , Issue.2 , pp. 346-356
    • Zhao, D.B.1    Zhu, Y.H.2
  • 37
    • 84979518732 scopus 로고    scopus 로고
    • Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics
    • ZHU Y H, ZHAO D B, LI X J. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics[J]. IET Control Theory & Applications, 2016, DOI:10.1049/iet-cta.2015.0769.
    • (2016) IET Control Theory & Applications
    • Zhu, Y.H.1    Zhao, D.B.2    Li, X.J.3
  • 38
    • 84899471403 scopus 로고    scopus 로고
    • Robust adaptive dynamic programming and feedback stabilization of nonlinear systems
    • JIANG Y, JIANG Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 882-893.
    • (2014) IEEE Transactions on Neural Networks and Learning Systems , vol.25 , Issue.5 , pp. 882-893
    • Jiang, Y.1    Jiang, Z.P.2
  • 39
    • 84876909440 scopus 로고    scopus 로고
    • Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control
    • WU H N, LUO B. Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control[J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(12): 1884-1895.
    • (2012) IEEE Transactions on Neural Networks and Learning Systems , vol.23 , Issue.12 , pp. 1884-1895
    • Wu, H.N.1    Luo, B.2
  • 40
    • 84945951645 scopus 로고    scopus 로고
    • Experience replay for optimal control of nonzero-sum game systems with unknown dynamics
    • ZHAO D B, ZHANG Q C, WANG D, et al. Experience replay for optimal control of nonzero-sum game systems with unknown dynamics[J]. IEEE Transactions on Cybernetics, 2016, 46(3): 854-865.
    • (2016) IEEE Transactions on Cybernetics , vol.46 , Issue.3 , pp. 854-865
    • Zhao, D.B.1    Zhang, Q.C.2    Wang, D.3
  • 41
    • 82755181919 scopus 로고    scopus 로고
    • Recent advances of reinforcement learning in multi-robot systems: a survey
    • WU Jun, XU Xin, WANG Jian, et al. Recent advances of reinforcement learning in multi-robot systems: a survey[J]. Control and Decision, 2011, 26(11): 1601-1610.
    • (2011) Control and Decision , vol.26 , Issue.11 , pp. 1601-1610
    • Wu, J.1    Xu, X.2    Wang, J.3
  • 42
    • 0030647149 scopus 로고    scopus 로고
    • Reinforcement learning in the multi-robot domain
    • New York: Springer
    • MATARIC M J. Reinforcement learning in the multi-robot domain[M]// Robot Colonies. New York: Springer, 1997: 73-83.
    • (1997) Robot Colonies , pp. 73-83
    • Mataric, M.J.1
  • 43
    • 82655181840 scopus 로고    scopus 로고
    • Self-teaching adaptive dynamic programming for Gomoku
    • ZHAO D B, ZHANG Z, DAI Y J. Self-teaching adaptive dynamic programming for Gomoku[J]. Neurocomputing, 2012, 78(1): 23-29.
    • (2012) Neurocomputing , vol.78 , Issue.1 , pp. 23-29
    • Zhao, D.B.1    Zhang, Z.2    Dai, Y.J.3
  • 44
    • 84885903360 scopus 로고    scopus 로고
    • A supervised actor-critic approach for adaptive cruise control
    • ZHAO D B, WANG B, LIU D R. A supervised actor-critic approach for adaptive cruise control[J]. Soft Computing, 2013, 17(11): 2089-2099.
    • (2013) Soft Computing , vol.17 , Issue.11 , pp. 2089-2099
    • Zhao, D.B.1    Wang, B.2    Liu, D.R.3
  • 46
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • TSITSIKLIS J N, VAN R B. An analysis of temporal-difference learning with function approximation[J]. IEEE Transactions on Automatic Control, 1997, 42(5): 674-690.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van, R.B.2
  • 47
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • TSITSIKLIS J N, VAN R B. Average cost temporal-difference learning[J]. Automatica, 1999, 35(11): 1799-1808.
    • (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van, R.B.2
  • 48
    • 84979245843 scopus 로고    scopus 로고
    • Convergent temporal-difference learning with arbitrary smooth function approximation
    • Vancouver: MIT Press
    • BHATNAGAR S, PRECUP D, SILVER D, et al. Convergent temporal-difference learning with arbitrary smooth function approximation[C]// Advances in Neural Information Processing Systems. Vancouver: MIT Press, 2009: 1204-1212.
    • (2009) Advances in Neural Information Processing Systems , pp. 1204-1212
    • Bhatnagar, S.1    Precup, D.2    Silver, D.3
  • 50
    • 56049095326 scopus 로고    scopus 로고
    • Fitted natural actor-critic: a new algorithm for continuous state-action MDPs
    • Berlin Heidelberg: Springer
    • MELO F S, LOPES M. Fitted natural actor-critic: a new algorithm for continuous state-action MDPs[M]// Machine Learning and Knowledge Discovery in Databases. Berlin Heidelberg: Springer, 2008: 66-81.
    • (2008) Machine Learning and Knowledge Discovery in Databases , pp. 66-81
    • Melo, F.S.1    Lopes, M.2
  • 51
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning
    • BRAFMAN R I, TENNENHOLTZ M. R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning[J]. The Journal of Machine Learning Research, 2003, 3(10): 213-231.
    • (2003) The Journal of Machine Learning Research , vol.3 , Issue.10 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 52
    • 20444403165 scopus 로고    scopus 로고
    • Research on reinforcement learning technology: a review
    • GAO Yang, CHEN Shifu, LU Xin. Research on reinforcement learning technology: a review[J]. Acta Automatica Sinica, 2004, 30(1): 86-100.
    • (2004) Acta Automatica Sinica , vol.30 , Issue.1 , pp. 86-100
    • Gao, Y.1    Chen, S.2    Lu, X.3
  • 53
    • 78649716899 scopus 로고    scopus 로고
    • Adaptive-resolution reinforcement learning with efficient exploration in deterministic domains
    • BERNSTEIN A, SHIMKIN N. Adaptive-resolution reinforcement learning with efficient exploration in deterministic domains[J]. Machine Learning, 2010, 81(3): 359-397.
    • (2010) Machine Learning , vol.81 , Issue.3 , pp. 359-397
    • Bernstein, A.1    Shimkin, N.2
  • 57
    • 38649092712 scopus 로고    scopus 로고
    • Teachable robots: understanding human teaching behavior to build more effective robot learners
    • THOMAZ A L, BREZEAL C. Teachable robots: understanding human teaching behavior to build more effective robot learners[J]. Artificial Intelligence, 2008, 172(6): 716-737.
    • (2008) Artificial Intelligence , vol.172 , Issue.6 , pp. 716-737
    • Thomaz, A.L.1    Brezeal, C.2
  • 58
    • 84883087774 scopus 로고    scopus 로고
    • Neuroscience: Dopamine ramps up
    • NIV Y. Neuroscience: Dopamine ramps up[J]. Nature, 2013, 500(7464): 533-535.
    • (2013) Nature , vol.500 , Issue.7464 , pp. 533-535
    • Niv, Y.1
  • 59
    • 84880419430 scopus 로고    scopus 로고
    • Action, outcome, and value a dual-system framework for morality
    • CUSHMAN F. Action, outcome, and value a dual-system framework for morality[J]. Personality and Social Psychology Review, 2013, 17(3): 273-292.
    • (2013) Personality and Social Psychology Review , vol.17 , Issue.3 , pp. 273-292
    • Cushman, F.1
  • 60
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554.
    • (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.W.3
  • 63
    • 84959553080 scopus 로고    scopus 로고
    • Learning mutual visibility relationship for pedestrian detection with a deep model
    • OUYANG W, ZENG X, WANG X. Learning mutual visibility relationship for pedestrian detection with a deep model[J]. International Journal of Computer Vision, 2016, DOI:10.1007/s11263-016-0890-9.
    • (2016) International Journal of Computer Vision
    • Ouyang, W.1    Zeng, X.2    Wang, X.3
  • 64
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 30-42.
    • (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3
  • 71
    • 84930630277 scopus 로고    scopus 로고
    • Deep learning
    • LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
    • (2015) Nature , vol.521 , Issue.7553 , pp. 436-444
    • Lecun, Y.1    Bengio, Y.2    Hinton, G.3
  • 74
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
    • (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
    • Lecun, Y.1    Bottou, L.2    Bengio, Y.3
  • 75
    • 85083953063 scopus 로고    scopus 로고
    • Very deep convolutional networks for large-scale image recognition
    • arXiv preprint. arXiv:1409.1556[cs.CV]
    • SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]// arXiv preprint. 2015. arXiv:1409.1556[cs.CV].
    • (2015)
    • Simonyan, K.1    Zisserman, A.2
  • 76
    • 56449086223 scopus 로고    scopus 로고
    • Training restricted Boltzmann machines using approximations to the likelihood gradient
    • Helsinki: ACM
    • TIELEMAN T. Training restricted Boltzmann machines using approximations to the likelihood gradient[C]// Proceedings of the 25th International Conference on Machine Learning. Helsinki: ACM, 2008: 1064-1071.
    • (2008) Proceedings of the 25th International Conference on Machine Learning , pp. 1064-1071
    • Tieleman, T.1
  • 80
    • 79551480483 scopus 로고    scopus 로고
    • Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion
    • VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. The Journal of Machine Learning Research, 2010, 11(11): 3371-3408.
    • (2010) The Journal of Machine Learning Research , vol.11 , Issue.11 , pp. 3371-3408
    • Vincent, P.1    Larochelle, H.2    Lajoie, I.3
  • 85
    • 84977535969 scopus 로고    scopus 로고
    • Explaining and harnessing adversarial examples
    • arXiv preprint. arXiv:1412.6572v3[stat.ML]
    • GOODFELLOWI J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]// arXiv preprint. 2015. arXiv:1412.6572v3[stat.ML].
    • (2015)
    • Goodfellowi, J.1    Shlens, J.2    Szegedy, C.3
  • 87
    • 70349094600 scopus 로고    scopus 로고
    • Acquisition of box pushing by direct-visionbased reinforcement learning
    • Nagoya: IEEE
    • SHIBATA K, IIDA M. Acquisition of box pushing by direct-visionbased reinforcement learning[C]// Proceedings of the SICE Annual Conference. Nagoya: IEEE, 2003, 3: 2322-2327.
    • (2003) Proceedings of the SICE Annual Conference , vol.3 , pp. 2322-2327
    • Shibata, K.1    Iida, M.2
  • 88
    • 0030702533 scopus 로고    scopus 로고
    • Reinforcement learning when visual sensory signals are directly given as inputs
    • Houston: IEEE
    • SHIBATA K, OKABE Y. Reinforcement learning when visual sensory signals are directly given as inputs[C]// Proceedings of the International Conference on Neural Networks. Houston: IEEE, 1997, 3: 1716-1720.
    • (1997) Proceedings of the International Conference on Neural Networks , vol.3 , pp. 1716-1720
    • Shibata, K.1    Okabe, Y.2
  • 92
    • 84938304119 scopus 로고    scopus 로고
    • TORCS, The open racing car simulator
    • WYMANN B, ESPI E, GUIONNEAU C, et al. TORCS, The open racing car simulator[EB/OL]. 2014, http://torcs.sourceforge.net.
    • (2014)
    • Wymann, B.1    Espi, E.2    Guionneau, C.3
  • 93
    • 84958542967 scopus 로고    scopus 로고
    • Online evolution of deep convolutional network for vision-based reinforcement learning
    • New York: Springer
    • KOUTNIK J, SCHMIDHUBER J, GOMEZ F. Online evolution of deep convolutional network for vision-based reinforcement learning[M]// From Animals to Animats 13. New York: Springer, 2014: 260-269.
    • (2014) From Animals to Animats 13 , pp. 260-269
    • Koutnik, J.1    Schmidhuber, J.2    Gomez, F.3
  • 94
    • 0003673017 scopus 로고
    • Reinforcement learning for robots using neural networks
    • Pittsburgh: Carnegie Mellon University
    • LIN L J. Reinforcement learning for robots using neural networks[D]. Pittsburgh: Carnegie Mellon University, 1993.
    • (1993)
    • Lin, L.J.1
  • 97
    • 84937779024 scopus 로고    scopus 로고
    • Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
    • Montreal: MIT Press
    • GUO X, SINGH S, LEE H, et al. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning[C]// Advances in Neural Information Processing Systems. Montreal: MIT Press, 2014: 3338-3346.
    • (2014) Advances in Neural Information Processing Systems , pp. 3338-3346
    • Guo, X.1    Singh, S.2    Lee, H.3
  • 100
    • 84979240090 scopus 로고    scopus 로고
    • Deep exploration via bootstrapped DQN
    • arXiv preprint. arXiv:1602.04621
    • OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[EB/OL]// arXiv preprint.2016.arXiv:1602.04621.
    • (2016)
    • Osband, I.1    Blundell, C.2    Pritzel, A.3
  • 101
    • 84971448181 scopus 로고    scopus 로고
    • Asynchronous methods for deep reinforcement learning
    • arXiv preprint. arXiv:1602.01783[cs.LG]
    • MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[EB/OL]// arXiv preprint. 2016. arXiv:1602.01783[cs.LG].
    • (2016)
    • Mnih, V.1    Badia, A.P.2    Mirza, M.3
  • 106
    • 34250005402 scopus 로고    scopus 로고
    • Computer Go: a grand challenge to AI
    • Berlin Heidelberg: Springer
    • CAI X, WUNSCH II D C. Computer Go: a grand challenge to AI[M]// Challenges for Computational Intelligence. Berlin Heidelberg: Springer, 2007: 443-465.
    • (2007) Challenges for Computational Intelligence , pp. 443-465
    • Cai, X.1    Wunsch, D.C.2
  • 107
    • 84975258481 scopus 로고    scopus 로고
    • Better computer Go player with neural network and long-term pediction
    • arXiv preprint. arXiv:1511.06410v3[cs.LG]
    • TIAN Y D, ZHU Y. Better computer Go player with neural network and long-term pediction[EB/OL]// arXiv preprint.2016.arXiv:1511.06410v3[cs.LG].
    • (2016)
    • Tian, Y.D.1    Zhu, Y.2
  • 108
    • 84971474111 scopus 로고    scopus 로고
    • A simple analysis of AlphaGo
    • TIAN Yuandong. A simple analysis of AlphaGo[J]. Acta Automatica Sinica, 2016, 42(5): 671-675.
    • (2016) Acta Automatica Sinica , vol.42 , Issue.5 , pp. 671-675
    • Tian, Y.1
  • 109
    • 84979201323 scopus 로고    scopus 로고
    • The strategies for Ko fight of computer Go
    • Taiwan: National Taiwan Normal University
    • HUANG Shijie. The strategies for Ko fight of computer Go[D]. Taiwan: National Taiwan Normal University, 2002: 1-57.
    • (2002) , pp. 1-57
    • Huang, S.1
  • 110
    • 84971425169 scopus 로고    scopus 로고
    • Deep learning applied to games
    • GUO Xiaoxiao, LI Cheng, MEI Qiaozhu. Deep learning applied to games[J]. Acta Automatica Sinica, 2016, 42(5): 676-684.
    • (2016) Acta Automatica Sinica , vol.42 , Issue.5 , pp. 676-684
    • Guo, X.1    Li, C.2    Mei, Q.3
  • 111
    • 0141503453 scopus 로고    scopus 로고
    • Multi-agent influence diagrams for representing and solving games
    • KOLLER D, MILCH B. Multi-agent influence diagrams for representing and solving games[J]. Games and Economic Behavior, 2003, 45(1): 181-221.
    • (2003) Games and Economic Behavior , vol.45 , Issue.1 , pp. 181-221
    • Koller, D.1    Milch, B.2
  • 113
    • 84979258646 scopus 로고    scopus 로고
    • Learning to communicate to solve riddles with deep distributed recurrent qnetworks
    • arXiv preprint. arXiv:1602.02672
    • FOERSTER J N, ASSAEL Y M, FREITAS N, et al. Learning to communicate to solve riddles with deep distributed recurrent qnetworks[EB/OL]// arXiv preprint. 2016. arXiv:1602.02672.
    • (2016)
    • Foerster, J.N.1    Assael, Y.M.2    Freitas, N.3
  • 114
    • 84979289652 scopus 로고    scopus 로고
    • Continuous deep Qlearning with model-based acceleration
    • arXiv preprint. arXiv:1603.00748
    • GU S, LILLICRAP T, SUTSKEVER I, et al. Continuous deep Qlearning with model-based acceleration[EB/OL]// arXiv preprint. 2016. arXiv:1603.00748.
    • (2016)
    • Gu, S.1    Lillicrap, T.2    Sutskever, I.3
  • 115
    • 85083953657 scopus 로고    scopus 로고
    • Continuous control with deep reinforcement learning
    • arXiv preprint. arXiv:1509.02971v5[cs.LG]
    • LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]// arXiv preprint. 2016.arXiv:1509.02971v5[cs.LG].
    • (2016)
    • Lillicrap, T.P.1    Hunt, J.J.2    Pritzel, A.3
  • 120
    • 84979224293 scopus 로고    scopus 로고
    • Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
    • arXiv preprint. arXiv:1603.02199
    • LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[EB/OL]// arXiv preprint. 2016. arXiv:1603.02199.
    • (2016)
    • Levine, S.1    Pastor, P.2    Krizhevsky, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.