SCOPUS 정보 검색 플랫폼

30th AAAI Conference on Artificial Intelligence, AAAI 2016

Volumn , Issue , 2016, Pages 2094-2100

Deep reinforcement learning with double Q-Learning

(3) Van Hasselt, Hado a Guez, Arthur a Silver, David a

a DEEPMIND (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; ARTIFICIAL INTELLIGENCE; REINFORCEMENT LEARNING;

DEEP NEURAL NETWORKS; Q-LEARNING; Q-LEARNING ALGORITHMS; SCALE FUNCTIONS; SPECIFIC ADAPTATIONS;

LEARNING ALGORITHMS;

EID: 85007210890 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (6676)

References (26)

1
- 0000616723
- Sample mean based index policies with O(log n) regret for the multi-Armed bandit problem
- R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-Armed bandit problem. Advances in Applied Probability, pages 1054-1078, 1995.
- (1995) Advances in Applied Probability , pp. 1054-1078
- Agrawal, R.¹

2
- 0036568025
- Finite-Time analysis of the multiarmed bandit problem
- P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-Time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235-256, 2002.
- (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

3
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- L. Baird. Residual algorithms: Reinforcement learning with function approximation. In Machine Learning: Proceedings of the Twelfth International Conference, pages 30-37, 1995.
- (1995) Machine Learning: Proceedings of the Twelfth International Conference , pp. 30-37
- Baird, L.¹

4
- 84879976780
- The arcade learning environment: An evaluation platform for general agents
- M. G. Bellemare, Y. Naddaf, J. Veness, and M. Bowling. The arcade learning environment: An evaluation platform for general agents. J. Artif. Intell. Res. (JAIR), 47:253-279, 2013.
- (2013) J. Artif. Intell. Res. (JAIR) , vol.47 , pp. 253-279
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

5
- 0041965975
- Max-A general polynomial time algorithm for near-optimal reinforcement learning
- R. I. Brafman and M. Tennenholtz. R-max-A general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3:213-231, 2003.
- (2003) The Journal of Machine Learning Research , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

6
- 0023846591
- Neocognitron: A hierarchical neural network capable of visual pattern recognition
- K. Fukushima. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural networks, 1(2):119-130, 1988.
- (1988) Neural Networks , vol.1 , Issue.2 , pp. 119-130
- Fukushima, K.¹

7
- 0029679044
- Reinforcement learning: A survey
- L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4: 237-285, 1996.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

8
- 0032203257
- Gradient-based learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

9
- 0000123778
- Self-improving reactive agents based on reinforcement learning, planning and teaching
- L. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3):293-321, 1992.
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 293-321
- Lin, L.¹

10
- 84864655352
- PhD thesis, University of Alberta
- H. R. Maei. Gradient temporal-difference learning algorithms. PhD thesis, University of Alberta, 2011.
- (2011) Gradient Temporal-difference Learning Algorithms
- Maei, H.R.¹

11
- 84924051598
- Human-level control through deep reinforcement learning
- V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540): 529-533, 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

12
- 84980007683
- Massively parallel methods for deep reinforcement learning
- A. Nair, P. Srinivasan, S. Blackwell, C. Alcicek, R. Fearon, A. D. Maria, V. Panneershelvam, M. Suleyman, C. Beattie, S. Petersen, S. Legg, V. Mnih, K. Kavukcuoglu, and D. Silver. Massively parallel methods for deep reinforcement learning. In Deep Learning Workshop, ICML, 2015.
- (2015) Deep Learning Workshop, ICML
- Nair, A.¹ Srinivasan, P.² Blackwell, S.³ Alcicek, C.⁴ Fearon, R.⁵ Maria, A.D.⁶ Panneershelvam, V.⁷ Suleyman, M.⁸ Beattie, C.⁹ Petersen, S.¹⁰ Legg, S.¹¹ Mnih, V.¹² Kavukcuoglu, K.¹³ Silver, D.¹⁴

13
- 33646398129
- Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method
- Springer
- M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In Proceedings of the 16th European Conference on Machine Learning, pages 317-328. Springer, 2005.
- (2005) Proceedings of the 16th European Conference on Machine Learning , pp. 317-328
- Riedmiller, M.¹

14
- 32844474095
- Reinforcement learning with factored states and actions
- B. Sallans and G. E. Hinton. Reinforcement learning with factored states and actions. The Journal of Machine Learning Research, 5: 1063-1088, 2004.
- (2004) The Journal of Machine Learning Research , vol.5 , pp. 1063-1088
- Sallans, B.¹ Hinton, G.E.²

15
- 73549084301
- Reinforcement learning in finite MDPs: PAC analysis
- A. L. Strehl, L. Li, and M. L. Littman. Reinforcement learning in finite MDPs: PAC analysis. The Journal of Machine Learning Research, 10:2413-2444, 2009.
- (2009) The Journal of Machine Learning Research , vol.10 , pp. 2413-2444
- Strehl, A.L.¹ Li, L.² Littman, M.L.³

16
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9-44, 1988.
- (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.S.¹

17
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the seventh international conference on machine learning, pages 216-224, 1990.
- (1990) Proceedings of the Seventh International Conference on Machine Learning , pp. 216-224
- Sutton, R.S.¹

18
- 0003420416
- MIT Press
- R. S. Sutton and A. G. Barto. Introduction to reinforcement learning. MIT Press, 1998.
- (1998) Introduction to Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

19
- 85007185217
- arXiv preprint arXiv:1503.04269
- R. S. Sutton, A. R. Mahmood, and M. White. An emphatic approach to the problem of off-policy temporal-difference learning. arXiv preprint arXiv:1503.04269, 2015.
- (2015) An Emphatic Approach to the Problem of Off-policy Temporal-difference Learning
- Sutton, R.S.¹ Mahmood, A.R.² White, M.³

20
- 56449092664
- The many faces of optimism: A unifying approach
- ACM
- I. Szita and A. Lorincz. The many faces of optimism: A unifying approach. In Proceedings of the 25th international conference on Machine learning, pages 1048-1055. ACM, 2008.
- (2008) Proceedings of the 25th International Conference on Machine Learning, Pages , pp. 1048-1055
- Szita, I.¹ Lorincz, A.²

21
- 0029276036
- Temporal difference learning and td-gammon
- G. Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58-68, 1995.
- (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

22
- 0003270924
- Issues in using function approximation for reinforcement learning
- In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Hillsdale, NJ, Lawrence Erlbaum
- S. Thrun and A. Schwartz. Issues in using function approximation for reinforcement learning. In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, 1993. Lawrence Erlbaum.
- (1993) Proceedings of the 1993 Connectionist Models Summer School
- Thrun, S.¹ Schwartz, A.²

23
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

24
- 85161998941
- Double Q-learning
- H. van Hasselt. Double Q-learning. Advances in Neural Information Processing Systems, 23:2613-2621, 2010.
- (2010) Advances in Neural Information Processing Systems , vol.23 , pp. 2613-2621
- Van Hasselt, H.¹

25
- 84893166876
- PhD thesis, Utrecht University
- H. van Hasselt. Insights in Reinforcement Learning. PhD thesis, Utrecht University, 2011.
- (2011) Insights in Reinforcement Learning
- Hasselt, H.V.¹

26
- 0004049893
- PhD thesis, University of Cambridge England
- C. J. C. H. Watkins. Learning from delayed rewards. PhD thesis, University of Cambridge England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.