메뉴 건너뛰기




Volumn , Issue , 2009, Pages 89-96

Near-optimal regret bounds for reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

EID: 73549103329     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (364)

References (13)
  • 2
    • 84899026236 scopus 로고    scopus 로고
    • Finite-sample convergence rates for Q-learning and indirect algorithms
    • MIT Press
    • Michael J. Kearns and Satinder P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 11. MIT Press, 1999.
    • (1999) Advances in Neural Information Processing Systems , vol.11
    • Kearns, M.J.1    Singh, S.P.2
  • 3
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
    • (2002) Mach. Learn. , vol.49 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 5
    • 56449090814 scopus 로고    scopus 로고
    • Logarithmic online regret bounds for reinforcement learning
    • MIT Press
    • Peter Auer and Ronald Ortner. Logarithmic online regret bounds for reinforcement learning. In Advances in Neural Information Processing Systems 19, pages 49-56. MIT Press, 2007.
    • (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 49-56
    • Auer, P.1    Ortner, R.2
  • 6
    • 0041965975 scopus 로고    scopus 로고
    • R-max - A general polynomial time algorithm for near-optimal reinforcement learning
    • Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
    • (2002) J. Mach. Learn. Res. , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 7
    • 85162041468 scopus 로고    scopus 로고
    • Optimistic linear programming gives logarithmic regret for irreducible mdps
    • MIT Press
    • Ambuj Tewari and Peter Bartlett. Optimistic linear programming gives logarithmic regret for irreducible mdps. In Advances in Neural Information Processing Systems 20, pages 1505-1512. MIT Press, 2008.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1505-1512
    • Tewari, A.1    Bartlett, P.2
  • 10
    • 55549110436 scopus 로고    scopus 로고
    • An analysis of model-based interval estimation for Markov decision processes
    • Alexander L. Strehl and Michael L. Littman. An analysis of model-based interval estimation for Markov decision processes. J. Comput. System Sci., 74(8):1309-1331, 2008.
    • (2008) J. Comput. System Sci. , vol.74 , Issue.8 , pp. 1309-1331
    • Strehl, A.L.1    Littman, M.L.2
  • 11
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for markov decision processes
    • Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997. (Pubitemid 127621321)
    • (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 13
    • 85162041677 scopus 로고    scopus 로고
    • Near-optimal regret bounds for reinforcement learning
    • Chair for Information Technology
    • Peter Auer, Thomas Jaksch, and Ronald Ortner. Near-optimal regret bounds for reinforcement learning. Technical Report CIT-2009-01, University of Leoben, Chair for Information Technology, 2009. http://institute.unileoben.ac.at/ infotech/publications/TR/CIT-2009-01.pdf.
    • (2009) Technical Report CIT-2009-01, University of Leoben
    • Auer, P.1    Jaksch, T.2    Ortner, R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.