-
1
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529-533.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
-
2
-
-
84963949906
-
Mastering the game of Go with deep neural networks and tree search
-
SILVER D, HUANG A, MADDISON C, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587): 484-489.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.3
-
3
-
-
85016056483
-
Deep reinforcement learning as foundation for artificial general intelligence
-
Amsterdam: Atlantis Press
-
AREL I. Deep reinforcement learning as foundation for artificial general intelligence[M]// Theoretical Foundations of Artificial General Intelligence. Amsterdam: Atlantis Press, 2012: 89-102.
-
(2012)
Theoretical Foundations of Artificial General Intelligence
, pp. 89-102
-
-
Arel, I.1
-
4
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
TEAAURO G. TD-Gammon, a self-teaching backgammon program, achieves master-level play[J]. Neural Computation, 1994, 6(2): 215-219.
-
(1994)
Neural Computation
, vol.6
, Issue.2
, pp. 215-219
-
-
Teaauro, G.1
-
6
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
KEARNS M, SINGH S. Near-optimal reinforcement learning in polynomial time[J]. Machine Learning, 2002, 49(2/3): 209-232.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 209-232
-
-
Kearns, M.1
Singh, S.2
-
8
-
-
84930647882
-
Reinforcement learning improves behaviour from evaluative feedback
-
LITTMAN M L. Reinforcement learning improves behaviour from evaluative feedback[J]. Nature, 2015, 521(7553): 445-451.
-
(2015)
Nature
, vol.521
, Issue.7553
, pp. 445-451
-
-
Littman, M.L.1
-
10
-
-
0002557583
-
Advanced forecasting methods for global crisis warning and models of intelligence
-
WERBOS P J. Advanced forecasting methods for global crisis warning and models of intelligence[J]. General Systems Yearbook, 1977, 22(12): 25-38.
-
(1977)
General Systems Yearbook
, vol.22
, Issue.12
, pp. 25-38
-
-
Werbos, P.J.1
-
11
-
-
0004049893
-
Learning from delayed rewards
-
Cambridge: University of Cambridge
-
WATKINS C J C H. Learning from delayed rewards[D]. Cambridge: University of Cambridge, 1989.
-
(1989)
-
-
Watkins, C.J.C.H.1
-
15
-
-
70349116541
-
Reinforcement learning and adaptive dynamic programming for feedback control
-
LEWIS F L, VRABIE D. Reinforcement learning and adaptive dynamic programming for feedback control[J]. IEEE Circuits and Systems Magazine, 2009, 9(3): 32-50.
-
(2009)
IEEE Circuits and Systems Magazine
, vol.9
, Issue.3
, pp. 32-50
-
-
Lewis, F.L.1
Vrabie, D.2
-
17
-
-
85011514513
-
Monte Carlo and quasi-Monte Carlo methods
-
CAFLISCH R E. Monte Carlo and quasi-Monte Carlo methods[J]. Acta Numerica, 1998, 7: 1-49.
-
(1998)
Acta Numerica
, vol.7
, pp. 1-49
-
-
Caflisch, R.E.1
-
18
-
-
0000827179
-
BOXES: An experiment in adaptive control
-
MICHIE D, CHAMBERS R A. BOXES: An experiment in adaptive control[J]. Machine Intelligence, 1968, 2(2): 137-152.
-
(1968)
Machine Intelligence
, vol.2
, Issue.2
, pp. 137-152
-
-
Michie, D.1
Chambers, R.A.2
-
20
-
-
84858960516
-
A survey of Monte Carlo tree search methods
-
BROWNE C B, POWLEY E, WHITEHOUSE D, et al. A survey of Monte Carlo tree search methods[J]. IEEE Transactions on Computational Intelligence and AI in Games, 2012, 4(1): 1-43.
-
(2012)
IEEE Transactions on Computational Intelligence and AI in Games
, vol.4
, Issue.1
, pp. 1-43
-
-
Browne, C.B.1
Powley, E.2
Whitehouse, D.3
-
21
-
-
80054025121
-
Monte-Carlo tree search
-
Maastricht: Maastricht Universiteit
-
CHASLOTH G. Monte-Carlo tree search[D]. Maastricht: Maastricht Universiteit, 2010.
-
(2010)
-
-
Chasloth, G.1
-
22
-
-
34547971839
-
Efficient selectivity and backup operators in Monte-Carlo tree search
-
Berlin Heidelberg: Springer
-
COULOM R. Efficient selectivity and backup operators in Monte-Carlo tree search[M]// Computers and Games. Berlin Heidelberg: Springer, 2006: 72-83.
-
(2006)
Computers and Games
, pp. 72-83
-
-
Coulom, R.1
-
23
-
-
84983805477
-
A new discrete-time iterative adaptive dynamic programming algorithm based on Q-learning
-
New York: Springer
-
WEI Q L, LIU D R. A new discrete-time iterative adaptive dynamic programming algorithm based on Q-learning[M]// International Symposium on Neural Networks. New York: Springer, 2015: 43-52.
-
(2015)
International Symposium on Neural Networks
, pp. 43-52
-
-
Wei, Q.L.1
Liu, D.R.2
-
24
-
-
84924872284
-
A novel dual iterative-learning method for optimal battery management in smart residential environments
-
WEI Q L, LIU D R, SHI G. A novel dual iterative-learning method for optimal battery management in smart residential environments[J]. IEEE Transactions on Industrial Electronics, 2015, 62(4): 2509-2518.
-
(2015)
IEEE Transactions on Industrial Electronics
, vol.62
, Issue.4
, pp. 2509-2518
-
-
Wei, Q.L.1
Liu, D.R.2
Shi, G.3
-
25
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
JAAKKOLA T, JORDAN M I, SINGH S P. On the convergence of stochastic iterative dynamic programming algorithms[J]. Neural Computation, 1994, 6(6): 1185-1201.
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
26
-
-
0028497630
-
Asynchronous stochastic approximation and Qlearning
-
TSITSIKLIS J N. Asynchronous stochastic approximation and Qlearning[J]. Machine Learning, 1994, 16(3): 185-202.
-
(1994)
Machine Learning
, vol.16
, Issue.3
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
28
-
-
0033901602
-
Convergence results for single-step on-policy reinforcement-learning algorithms
-
SINGH S, JAAKKOLA T, LITTMAN M L, et al. Convergence results for single-step on-policy reinforcement-learning algorithms[J]. Machine Learning, 2000, 38(3): 287-308.
-
(2000)
Machine Learning
, vol.38
, Issue.3
, pp. 287-308
-
-
Singh, S.1
Jaakkola, T.2
Littman, M.L.3
-
29
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
SUTTON R S. Learning to predict by the methods of temporal differences[J]. Machine Learning, 1988, 3(1): 9-44.
-
(1988)
Machine Learning
, vol.3
, Issue.1
, pp. 9-44
-
-
Sutton, R.S.1
-
31
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J]. Machine Learning, 1992, 8(3/4): 229-256.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 229-256
-
-
Williams, R.J.1
-
32
-
-
84898939480
-
Policy gradient methods for reinforcement learning with function approximation
-
Denver: MIT Press
-
SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Advances in Neural Information Processing Systems. Denver: MIT Press, 1999, 99: 1057-1063.
-
(1999)
Advances in Neural Information Processing Systems
, vol.99
, pp. 1057-1063
-
-
Sutton, R.S.1
Mcallester, D.A.2
Singh, S.P.3
-
33
-
-
84897594646
-
Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems
-
LIU D R, WEI Q L. Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(3): 621-634.
-
(2014)
IEEE Transactions on Neural Networks and Learning Systems
, vol.25
, Issue.3
, pp. 621-634
-
-
Liu, D.R.1
Wei, Q.L.2
-
35
-
-
84960449514
-
Model-free optimal control for affine nonlinear systems with convergence analysis
-
ZHAO D B, XIA Z P, WANG D. Model-free optimal control for affine nonlinear systems with convergence analysis[J]. IEEE Transactions on Automation Science and Engineering, 2015, 12(4): 1461-1468.
-
(2015)
IEEE Transactions on Automation Science and Engineering
, vol.12
, Issue.4
, pp. 1461-1468
-
-
Zhao, D.B.1
Xia, Z.P.2
Wang, D.3
-
36
-
-
85027924045
-
MEC-a near-optimal online reinforcement learning algorithm for continuous deterministic systems
-
ZHAO D B, ZHU Y H. MEC-a near-optimal online reinforcement learning algorithm for continuous deterministic systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2015, 26(2): 346-356.
-
(2015)
IEEE Transactions on Neural Networks and Learning Systems
, vol.26
, Issue.2
, pp. 346-356
-
-
Zhao, D.B.1
Zhu, Y.H.2
-
37
-
-
84979518732
-
Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics
-
ZHU Y H, ZHAO D B, LI X J. Using reinforcement learning techniques to solve continuous-time non-linear optimal tracking problem without system dynamics[J]. IET Control Theory & Applications, 2016, DOI:10.1049/iet-cta.2015.0769.
-
(2016)
IET Control Theory & Applications
-
-
Zhu, Y.H.1
Zhao, D.B.2
Li, X.J.3
-
38
-
-
84899471403
-
Robust adaptive dynamic programming and feedback stabilization of nonlinear systems
-
JIANG Y, JIANG Z P. Robust adaptive dynamic programming and feedback stabilization of nonlinear systems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5): 882-893.
-
(2014)
IEEE Transactions on Neural Networks and Learning Systems
, vol.25
, Issue.5
, pp. 882-893
-
-
Jiang, Y.1
Jiang, Z.P.2
-
39
-
-
84876909440
-
Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control
-
WU H N, LUO B. Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control[J]. IEEE Transactions on Neural Networks and Learning Systems, 2012, 23(12): 1884-1895.
-
(2012)
IEEE Transactions on Neural Networks and Learning Systems
, vol.23
, Issue.12
, pp. 1884-1895
-
-
Wu, H.N.1
Luo, B.2
-
40
-
-
84945951645
-
Experience replay for optimal control of nonzero-sum game systems with unknown dynamics
-
ZHAO D B, ZHANG Q C, WANG D, et al. Experience replay for optimal control of nonzero-sum game systems with unknown dynamics[J]. IEEE Transactions on Cybernetics, 2016, 46(3): 854-865.
-
(2016)
IEEE Transactions on Cybernetics
, vol.46
, Issue.3
, pp. 854-865
-
-
Zhao, D.B.1
Zhang, Q.C.2
Wang, D.3
-
41
-
-
82755181919
-
Recent advances of reinforcement learning in multi-robot systems: a survey
-
WU Jun, XU Xin, WANG Jian, et al. Recent advances of reinforcement learning in multi-robot systems: a survey[J]. Control and Decision, 2011, 26(11): 1601-1610.
-
(2011)
Control and Decision
, vol.26
, Issue.11
, pp. 1601-1610
-
-
Wu, J.1
Xu, X.2
Wang, J.3
-
42
-
-
0030647149
-
Reinforcement learning in the multi-robot domain
-
New York: Springer
-
MATARIC M J. Reinforcement learning in the multi-robot domain[M]// Robot Colonies. New York: Springer, 1997: 73-83.
-
(1997)
Robot Colonies
, pp. 73-83
-
-
Mataric, M.J.1
-
43
-
-
82655181840
-
Self-teaching adaptive dynamic programming for Gomoku
-
ZHAO D B, ZHANG Z, DAI Y J. Self-teaching adaptive dynamic programming for Gomoku[J]. Neurocomputing, 2012, 78(1): 23-29.
-
(2012)
Neurocomputing
, vol.78
, Issue.1
, pp. 23-29
-
-
Zhao, D.B.1
Zhang, Z.2
Dai, Y.J.3
-
44
-
-
84885903360
-
A supervised actor-critic approach for adaptive cruise control
-
ZHAO D B, WANG B, LIU D R. A supervised actor-critic approach for adaptive cruise control[J]. Soft Computing, 2013, 17(11): 2089-2099.
-
(2013)
Soft Computing
, vol.17
, Issue.11
, pp. 2089-2099
-
-
Zhao, D.B.1
Wang, B.2
Liu, D.R.3
-
46
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
TSITSIKLIS J N, VAN R B. An analysis of temporal-difference learning with function approximation[J]. IEEE Transactions on Automatic Control, 1997, 42(5): 674-690.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, Issue.5
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van, R.B.2
-
47
-
-
0033221519
-
Average cost temporal-difference learning
-
TSITSIKLIS J N, VAN R B. Average cost temporal-difference learning[J]. Automatica, 1999, 35(11): 1799-1808.
-
(1999)
Automatica
, vol.35
, Issue.11
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van, R.B.2
-
48
-
-
84979245843
-
Convergent temporal-difference learning with arbitrary smooth function approximation
-
Vancouver: MIT Press
-
BHATNAGAR S, PRECUP D, SILVER D, et al. Convergent temporal-difference learning with arbitrary smooth function approximation[C]// Advances in Neural Information Processing Systems. Vancouver: MIT Press, 2009: 1204-1212.
-
(2009)
Advances in Neural Information Processing Systems
, pp. 1204-1212
-
-
Bhatnagar, S.1
Precup, D.2
Silver, D.3
-
50
-
-
56049095326
-
Fitted natural actor-critic: a new algorithm for continuous state-action MDPs
-
Berlin Heidelberg: Springer
-
MELO F S, LOPES M. Fitted natural actor-critic: a new algorithm for continuous state-action MDPs[M]// Machine Learning and Knowledge Discovery in Databases. Berlin Heidelberg: Springer, 2008: 66-81.
-
(2008)
Machine Learning and Knowledge Discovery in Databases
, pp. 66-81
-
-
Melo, F.S.1
Lopes, M.2
-
51
-
-
0041965975
-
R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning
-
BRAFMAN R I, TENNENHOLTZ M. R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning[J]. The Journal of Machine Learning Research, 2003, 3(10): 213-231.
-
(2003)
The Journal of Machine Learning Research
, vol.3
, Issue.10
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
52
-
-
20444403165
-
Research on reinforcement learning technology: a review
-
GAO Yang, CHEN Shifu, LU Xin. Research on reinforcement learning technology: a review[J]. Acta Automatica Sinica, 2004, 30(1): 86-100.
-
(2004)
Acta Automatica Sinica
, vol.30
, Issue.1
, pp. 86-100
-
-
Gao, Y.1
Chen, S.2
Lu, X.3
-
53
-
-
78649716899
-
Adaptive-resolution reinforcement learning with efficient exploration in deterministic domains
-
BERNSTEIN A, SHIMKIN N. Adaptive-resolution reinforcement learning with efficient exploration in deterministic domains[J]. Machine Learning, 2010, 81(3): 359-397.
-
(2010)
Machine Learning
, vol.81
, Issue.3
, pp. 359-397
-
-
Bernstein, A.1
Shimkin, N.2
-
57
-
-
38649092712
-
Teachable robots: understanding human teaching behavior to build more effective robot learners
-
THOMAZ A L, BREZEAL C. Teachable robots: understanding human teaching behavior to build more effective robot learners[J]. Artificial Intelligence, 2008, 172(6): 716-737.
-
(2008)
Artificial Intelligence
, vol.172
, Issue.6
, pp. 716-737
-
-
Thomaz, A.L.1
Brezeal, C.2
-
58
-
-
84883087774
-
Neuroscience: Dopamine ramps up
-
NIV Y. Neuroscience: Dopamine ramps up[J]. Nature, 2013, 500(7464): 533-535.
-
(2013)
Nature
, vol.500
, Issue.7464
, pp. 533-535
-
-
Niv, Y.1
-
59
-
-
84880419430
-
Action, outcome, and value a dual-system framework for morality
-
CUSHMAN F. Action, outcome, and value a dual-system framework for morality[J]. Personality and Social Psychology Review, 2013, 17(3): 273-292.
-
(2013)
Personality and Social Psychology Review
, vol.17
, Issue.3
, pp. 273-292
-
-
Cushman, F.1
-
60
-
-
33745805403
-
A fast learning algorithm for deep belief nets
-
HINTON G E, OSINDERO S, TEH Y W. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7): 1527-1554.
-
(2006)
Neural Computation
, vol.18
, Issue.7
, pp. 1527-1554
-
-
Hinton, G.E.1
Osindero, S.2
Teh, Y.W.3
-
61
-
-
84911473441
-
Convolutional neural networks for speech recognition
-
ABDEL-HAMID O, MOHAMED A, JIANG H, et al. Convolutional neural networks for speech recognition[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014, 22(10): 1533-1545.
-
(2014)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
, vol.22
, Issue.10
, pp. 1533-1545
-
-
Abdel-Hamid, O.1
Mohamed, A.2
Jiang, H.3
-
63
-
-
84959553080
-
Learning mutual visibility relationship for pedestrian detection with a deep model
-
OUYANG W, ZENG X, WANG X. Learning mutual visibility relationship for pedestrian detection with a deep model[J]. International Journal of Computer Vision, 2016, DOI:10.1007/s11263-016-0890-9.
-
(2016)
International Journal of Computer Vision
-
-
Ouyang, W.1
Zeng, X.2
Wang, X.3
-
64
-
-
84055222005
-
Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
-
DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 30-42.
-
(2012)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.20
, Issue.1
, pp. 30-42
-
-
Dahl, G.E.1
Yu, D.2
Deng, L.3
-
68
-
-
84970002232
-
Show, attend and tell: neural image caption generation with visual attention
-
Lille: ACM
-
XU K, BA J, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention[C]// Proceedings of the 32nd International Conference on Machine Learning. Lille: ACM, 2015: 2048-2057.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning
, pp. 2048-2057
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
-
71
-
-
84930630277
-
Deep learning
-
LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553): 436-444.
-
(2015)
Nature
, vol.521
, Issue.7553
, pp. 436-444
-
-
Lecun, Y.1
Bengio, Y.2
Hinton, G.3
-
74
-
-
0032203257
-
Gradient-based learning applied to document recognition
-
LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
-
(1998)
Proceedings of the IEEE
, vol.86
, Issue.11
, pp. 2278-2324
-
-
Lecun, Y.1
Bottou, L.2
Bengio, Y.3
-
75
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
arXiv preprint. arXiv:1409.1556[cs.CV]
-
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[EB/OL]// arXiv preprint. 2015. arXiv:1409.1556[cs.CV].
-
(2015)
-
-
Simonyan, K.1
Zisserman, A.2
-
76
-
-
56449086223
-
Training restricted Boltzmann machines using approximations to the likelihood gradient
-
Helsinki: ACM
-
TIELEMAN T. Training restricted Boltzmann machines using approximations to the likelihood gradient[C]// Proceedings of the 25th International Conference on Machine Learning. Helsinki: ACM, 2008: 1064-1071.
-
(2008)
Proceedings of the 25th International Conference on Machine Learning
, pp. 1064-1071
-
-
Tieleman, T.1
-
78
-
-
84055211743
-
Acoustic modeling using deep belief networks
-
MOHAMED A, DAHL G E, HINTON G. Acoustic modeling using deep belief networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1): 14-22.
-
(2012)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.20
, Issue.1
, pp. 14-22
-
-
Mohamed, A.1
Dahl, G.E.2
Hinton, G.3
-
80
-
-
79551480483
-
Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion
-
VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. The Journal of Machine Learning Research, 2010, 11(11): 3371-3408.
-
(2010)
The Journal of Machine Learning Research
, vol.11
, Issue.11
, pp. 3371-3408
-
-
Vincent, P.1
Larochelle, H.2
Lajoie, I.3
-
85
-
-
84977535969
-
Explaining and harnessing adversarial examples
-
arXiv preprint. arXiv:1412.6572v3[stat.ML]
-
GOODFELLOWI J, SHLENS J, SZEGEDY C. Explaining and harnessing adversarial examples[EB/OL]// arXiv preprint. 2015. arXiv:1412.6572v3[stat.ML].
-
(2015)
-
-
Goodfellowi, J.1
Shlens, J.2
Szegedy, C.3
-
87
-
-
70349094600
-
Acquisition of box pushing by direct-visionbased reinforcement learning
-
Nagoya: IEEE
-
SHIBATA K, IIDA M. Acquisition of box pushing by direct-visionbased reinforcement learning[C]// Proceedings of the SICE Annual Conference. Nagoya: IEEE, 2003, 3: 2322-2327.
-
(2003)
Proceedings of the SICE Annual Conference
, vol.3
, pp. 2322-2327
-
-
Shibata, K.1
Iida, M.2
-
88
-
-
0030702533
-
Reinforcement learning when visual sensory signals are directly given as inputs
-
Houston: IEEE
-
SHIBATA K, OKABE Y. Reinforcement learning when visual sensory signals are directly given as inputs[C]// Proceedings of the International Conference on Neural Networks. Houston: IEEE, 1997, 3: 1716-1720.
-
(1997)
Proceedings of the International Conference on Neural Networks
, vol.3
, pp. 1716-1720
-
-
Shibata, K.1
Okabe, Y.2
-
92
-
-
84938304119
-
TORCS, The open racing car simulator
-
WYMANN B, ESPI E, GUIONNEAU C, et al. TORCS, The open racing car simulator[EB/OL]. 2014, http://torcs.sourceforge.net.
-
(2014)
-
-
Wymann, B.1
Espi, E.2
Guionneau, C.3
-
93
-
-
84958542967
-
Online evolution of deep convolutional network for vision-based reinforcement learning
-
New York: Springer
-
KOUTNIK J, SCHMIDHUBER J, GOMEZ F. Online evolution of deep convolutional network for vision-based reinforcement learning[M]// From Animals to Animats 13. New York: Springer, 2014: 260-269.
-
(2014)
From Animals to Animats 13
, pp. 260-269
-
-
Koutnik, J.1
Schmidhuber, J.2
Gomez, F.3
-
94
-
-
0003673017
-
Reinforcement learning for robots using neural networks
-
Pittsburgh: Carnegie Mellon University
-
LIN L J. Reinforcement learning for robots using neural networks[D]. Pittsburgh: Carnegie Mellon University, 1993.
-
(1993)
-
-
Lin, L.J.1
-
97
-
-
84937779024
-
Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning
-
Montreal: MIT Press
-
GUO X, SINGH S, LEE H, et al. Deep learning for real-time Atari game play using offline Monte-Carlo tree search planning[C]// Advances in Neural Information Processing Systems. Montreal: MIT Press, 2014: 3338-3346.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 3338-3346
-
-
Guo, X.1
Singh, S.2
Lee, H.3
-
100
-
-
84979240090
-
Deep exploration via bootstrapped DQN
-
arXiv preprint. arXiv:1602.04621
-
OSBAND I, BLUNDELL C, PRITZEL A, et al. Deep exploration via bootstrapped DQN[EB/OL]// arXiv preprint.2016.arXiv:1602.04621.
-
(2016)
-
-
Osband, I.1
Blundell, C.2
Pritzel, A.3
-
101
-
-
84971448181
-
Asynchronous methods for deep reinforcement learning
-
arXiv preprint. arXiv:1602.01783[cs.LG]
-
MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[EB/OL]// arXiv preprint. 2016. arXiv:1602.01783[cs.LG].
-
(2016)
-
-
Mnih, V.1
Badia, A.P.2
Mirza, M.3
-
102
-
-
80055009389
-
Intrinsically motivated neuroevolution for vision-based reinforcement learning
-
Trondheim: IEEE
-
CUCCU G, LUCIW M, SCHMIDHUBER J, et al. Intrinsically motivated neuroevolution for vision-based reinforcement learning[C]// Proceedings of the IEEE International Conference on Development and Learning. Trondheim: IEEE, 2011, 2: 1-7.
-
(2011)
Proceedings of the IEEE International Conference on Development and Learning
, vol.2
, pp. 1-7
-
-
Cuccu, G.1
Luciw, M.2
Schmidhuber, J.3
-
106
-
-
34250005402
-
Computer Go: a grand challenge to AI
-
Berlin Heidelberg: Springer
-
CAI X, WUNSCH II D C. Computer Go: a grand challenge to AI[M]// Challenges for Computational Intelligence. Berlin Heidelberg: Springer, 2007: 443-465.
-
(2007)
Challenges for Computational Intelligence
, pp. 443-465
-
-
Cai, X.1
Wunsch, D.C.2
-
107
-
-
84975258481
-
Better computer Go player with neural network and long-term pediction
-
arXiv preprint. arXiv:1511.06410v3[cs.LG]
-
TIAN Y D, ZHU Y. Better computer Go player with neural network and long-term pediction[EB/OL]// arXiv preprint.2016.arXiv:1511.06410v3[cs.LG].
-
(2016)
-
-
Tian, Y.D.1
Zhu, Y.2
-
108
-
-
84971474111
-
A simple analysis of AlphaGo
-
TIAN Yuandong. A simple analysis of AlphaGo[J]. Acta Automatica Sinica, 2016, 42(5): 671-675.
-
(2016)
Acta Automatica Sinica
, vol.42
, Issue.5
, pp. 671-675
-
-
Tian, Y.1
-
109
-
-
84979201323
-
The strategies for Ko fight of computer Go
-
Taiwan: National Taiwan Normal University
-
HUANG Shijie. The strategies for Ko fight of computer Go[D]. Taiwan: National Taiwan Normal University, 2002: 1-57.
-
(2002)
, pp. 1-57
-
-
Huang, S.1
-
110
-
-
84971425169
-
Deep learning applied to games
-
GUO Xiaoxiao, LI Cheng, MEI Qiaozhu. Deep learning applied to games[J]. Acta Automatica Sinica, 2016, 42(5): 676-684.
-
(2016)
Acta Automatica Sinica
, vol.42
, Issue.5
, pp. 676-684
-
-
Guo, X.1
Li, C.2
Mei, Q.3
-
111
-
-
0141503453
-
Multi-agent influence diagrams for representing and solving games
-
KOLLER D, MILCH B. Multi-agent influence diagrams for representing and solving games[J]. Games and Economic Behavior, 2003, 45(1): 181-221.
-
(2003)
Games and Economic Behavior
, vol.45
, Issue.1
, pp. 181-221
-
-
Koller, D.1
Milch, B.2
-
113
-
-
84979258646
-
Learning to communicate to solve riddles with deep distributed recurrent qnetworks
-
arXiv preprint. arXiv:1602.02672
-
FOERSTER J N, ASSAEL Y M, FREITAS N, et al. Learning to communicate to solve riddles with deep distributed recurrent qnetworks[EB/OL]// arXiv preprint. 2016. arXiv:1602.02672.
-
(2016)
-
-
Foerster, J.N.1
Assael, Y.M.2
Freitas, N.3
-
114
-
-
84979289652
-
Continuous deep Qlearning with model-based acceleration
-
arXiv preprint. arXiv:1603.00748
-
GU S, LILLICRAP T, SUTSKEVER I, et al. Continuous deep Qlearning with model-based acceleration[EB/OL]// arXiv preprint. 2016. arXiv:1603.00748.
-
(2016)
-
-
Gu, S.1
Lillicrap, T.2
Sutskever, I.3
-
115
-
-
85083953657
-
Continuous control with deep reinforcement learning
-
arXiv preprint. arXiv:1509.02971v5[cs.LG]
-
LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[EB/OL]// arXiv preprint. 2016.arXiv:1509.02971v5[cs.LG].
-
(2016)
-
-
Lillicrap, T.P.1
Hunt, J.J.2
Pritzel, A.3
-
120
-
-
84979224293
-
Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection
-
arXiv preprint. arXiv:1603.02199
-
LEVINE S, PASTOR P, KRIZHEVSKY A, et al. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection[EB/OL]// arXiv preprint. 2016. arXiv:1603.02199.
-
(2016)
-
-
Levine, S.1
Pastor, P.2
Krizhevsky, A.3
|