메뉴 건너뛰기




Volumn , Issue , 2010, Pages

Double Q-learning

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

EID: 85161998941     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (1644)

References (24)
  • 4
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6:1185-1201, 1994.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 5
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N. Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185-202, 1994.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 7
    • 85156187730 scopus 로고    scopus 로고
    • Improving elevator performance using reinforcement learning
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Cambridge MA. MIT Press
    • R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1017-1023, Cambridge MA, 1996. MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1017-1023
    • Crites, R.H.1    Barto, A.G.2
  • 11
    • 84899026236 scopus 로고    scopus 로고
    • Finite-sample convergence rates for Q-learning and indirect algorithms
    • MIT Press
    • M. J. Kearns and S. P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Neural Information Processing Systems 12, pages 996-1002. MIT Press, 1999.
    • (1999) Neural Information Processing Systems , vol.12 , pp. 996-1002
    • Kearns, M.J.1    Singh, S.P.2
  • 15
    • 31344446857 scopus 로고    scopus 로고
    • Rational overoptimism (and other biases
    • September
    • E. Van den Steen. Rational overoptimism (and other biases). American Economic Review, 94(4):1141-1151, September 2004.
    • (2004) American Economic Review , vol.94 , Issue.4 , pp. 1141-1151
    • Van Den Steen, E.1
  • 16
    • 33644898597 scopus 로고    scopus 로고
    • The optimizer's curse: Skepticism and postdecision surprise in decision analysis
    • J. E. Smith and R. L. Winkler. The optimizer's curse: Skepticism and postdecision surprise in decision analysis. Management Science, 52(3):311-322, 2006.
    • (2006) Management Science , vol.52 , Issue.3 , pp. 311-322
    • Smith, J.E.1    Winkler, R.L.2
  • 18
    • 0001520893 scopus 로고
    • Anomalies: The winner's curse
    • Winter
    • R. H. Thaler. Anomalies: The winner's curse. Journal of Economic Perspectives, 2(1):191-202, Winter 1988.
    • (1988) Journal of Economic Perspectives , vol.2 , Issue.1 , pp. 191-202
    • Thaler, R.H.1
  • 19
    • 34250609333 scopus 로고
    • Sur les fonctions convexes et les inégalités entre les valeurs moyennes
    • J. L. W. V. Jensen. Sur les fonctions convexes et les inégalités entre les valeurs moyennes. Journal Acta Mathematica, 30(1):175-193, 1906.
    • (1906) Journal Acta Mathematica , vol.30 , Issue.1 , pp. 175-193
    • Jensen, J.L.W.V.1
  • 20
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • S. P. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.P.1    Jaakkola, T.2    Littman, M.L.3    Szepesvári, C.4
  • 24
    • 77956890234 scopus 로고
    • Monte Carlo sampling methods using Markov chains and their applications
    • W. K. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, pages 97-109, 1970.
    • (1970) Biometrika , pp. 97-109
    • Hastings, W.K.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.