메뉴 건너뛰기




Volumn , Issue , 2016, Pages 2094-2100

Deep reinforcement learning with double Q-Learning

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; ARTIFICIAL INTELLIGENCE; REINFORCEMENT LEARNING;

EID: 85007210890     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (6676)

References (26)
  • 1
    • 0000616723 scopus 로고
    • Sample mean based index policies with O(log n) regret for the multi-Armed bandit problem
    • R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-Armed bandit problem. Advances in Applied Probability, pages 1054-1078, 1995.
    • (1995) Advances in Applied Probability , pp. 1054-1078
    • Agrawal, R.1
  • 2
    • 0036568025 scopus 로고    scopus 로고
    • Finite-Time analysis of the multiarmed bandit problem
    • P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-Time analysis of the multiarmed bandit problem. Machine learning, 47(2-3):235-256, 2002.
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 4
  • 5
    • 0041965975 scopus 로고    scopus 로고
    • Max-A general polynomial time algorithm for near-optimal reinforcement learning
    • R. I. Brafman and M. Tennenholtz. R-max-A general polynomial time algorithm for near-optimal reinforcement learning. The Journal of Machine Learning Research, 3:213-231, 2003.
    • (2003) The Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 6
    • 0023846591 scopus 로고
    • Neocognitron: A hierarchical neural network capable of visual pattern recognition
    • K. Fukushima. Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural networks, 1(2):119-130, 1988.
    • (1988) Neural Networks , vol.1 , Issue.2 , pp. 119-130
    • Fukushima, K.1
  • 8
    • 0032203257 scopus 로고    scopus 로고
    • Gradient-based learning applied to document recognition
    • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
    • (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
    • LeCun, Y.1    Bottou, L.2    Bengio, Y.3    Haffner, P.4
  • 9
    • 0000123778 scopus 로고
    • Self-improving reactive agents based on reinforcement learning, planning and teaching
    • L. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine learning, 8(3):293-321, 1992.
    • (1992) Machine Learning , vol.8 , Issue.3 , pp. 293-321
    • Lin, L.1
  • 13
    • 33646398129 scopus 로고    scopus 로고
    • Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method
    • Springer
    • M. Riedmiller. Neural fitted Q iteration - first experiences with a data efficient neural reinforcement learning method. In Proceedings of the 16th European Conference on Machine Learning, pages 317-328. Springer, 2005.
    • (2005) Proceedings of the 16th European Conference on Machine Learning , pp. 317-328
    • Riedmiller, M.1
  • 16
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton. Learning to predict by the methods of temporal differences. Machine learning, 3(1):9-44, 1988.
    • (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 17
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the seventh international conference on machine learning, pages 216-224, 1990.
    • (1990) Proceedings of the Seventh International Conference on Machine Learning , pp. 216-224
    • Sutton, R.S.1
  • 21
    • 0029276036 scopus 로고
    • Temporal difference learning and td-gammon
    • G. Tesauro. Temporal difference learning and td-gammon. Communications of the ACM, 38(3):58-68, 1995.
    • (1995) Communications of the ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 22
    • 0003270924 scopus 로고
    • Issues in using function approximation for reinforcement learning
    • In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Hillsdale, NJ, Lawrence Erlbaum
    • S. Thrun and A. Schwartz. Issues in using function approximation for reinforcement learning. In M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, editors, Proceedings of the 1993 Connectionist Models Summer School, Hillsdale, NJ, 1993. Lawrence Erlbaum.
    • (1993) Proceedings of the 1993 Connectionist Models Summer School
    • Thrun, S.1    Schwartz, A.2
  • 23
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42(5):674-690, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.