메뉴 건너뛰기




Volumn , Issue , 2007, Pages 49-56

Logarithmic online regret bounds for undiscounted reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

FINITE NUMBER; MULTI-ARMED BANDIT PROBLEM; ON-LINE PERFORMANCE; OPTIMAL POLICIES; UPPER CONFIDENCE BOUND;

EID: 56449090814     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (183)

References (16)
  • 1
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
    • (2002) Mach. Learn. , vol.49 , pp. 209-232
    • Kearns, M.J.1    Singh, S.P.2
  • 2
    • 0041965975 scopus 로고    scopus 로고
    • R-max - A general polynomial time algorithm for near-optimal reinforcement learning
    • Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
    • (2002) J. Mach. Learn. Res. , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 4
    • 31844432138 scopus 로고    scopus 로고
    • A theoretical analysis of model-based interval estimation
    • Alexander L. Strehl and Michael L. Littman. A theoretical analysis of model-based interval estimation. In Proc. 22nd ICML 2005, pages 857-864, 2005.
    • (2005) Proc. 22nd ICML 2005 , pp. 857-864
    • Strehl, A.L.1    Littman, M.L.2
  • 6
    • 78649499440 scopus 로고
    • Efficient reinforcement learning
    • ACM
    • Claude-Nicolas Fiechter. Efficient reinforcement learning. In Proc. 7th COLT, pages 88-97. ACM, 1994.
    • (1994) Proc. 7th COLT , pp. 88-97
    • Fiechter, C.-N.1
  • 7
    • 77951961847 scopus 로고    scopus 로고
    • Online regret bounds for a new reinforcement learning algorithm
    • ÖCG
    • Peter Auer and Ronald Ortner. Online regret bounds for a new reinforcement learning algorithm. In Proc. 1st ACVW, pages 35-42. ÖCG, 2005.
    • (2005) Proc. 1st ACVW , pp. 35-42
    • Auer, P.1    Ortner, R.2
  • 8
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res., 3:397-422, 2002.
    • (2002) J. Mach. Learn. Res. , vol.3 , pp. 397-422
    • Auer, P.1
  • 9
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multi-armed bandit problem
    • Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multi-armed bandit problem. Mach. Learn., 47:235-256, 2002.
    • (2002) Mach. Learn. , vol.47 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 10
    • 16244391087 scopus 로고    scopus 로고
    • An empirical evaluation of interval estimation for Markov decision processes
    • IEEE Computer Society
    • Alexander L. Strehl and Michael L. Littman. An empirical evaluation of interval estimation for Markov decision processes. In Proc. 16th ICTAI, pages 128-135. IEEE Computer Society, 2004.
    • (2004) Proc. 16th ICTAI , pp. 128-135
    • Strehl, A.L.1    Littman, M.L.2
  • 12
    • 1942421149 scopus 로고    scopus 로고
    • Action elimination and stopping conditions for reinforcement learning
    • AAAI Press
    • Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for reinforcement learning. In Proc. 20th ICML, pages 162-169. AAAI Press, 2003.
    • (2003) Proc. 20th ICML , pp. 162-169
    • Even-Dar, E.1    Mannor, S.2    Mansour, Y.3
  • 13
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for Markov decision processes
    • Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997.
    • (1997) Math. Oper. Res. , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 14
    • 84899000904 scopus 로고    scopus 로고
    • Experts in a Markov decision process
    • MIT Press
    • Eyal Even-Dar, Sham M. Kakade, and Yishay Mansour. Experts in a Markov decision process. In Proc. 17th NIPS, pages 401-408. MIT Press, 2004.
    • (2004) Proc. 17th NIPS , pp. 401-408
    • Even-Dar, E.1    Kakade, S.M.2    Mansour, Y.3
  • 16
    • 0034375401 scopus 로고    scopus 로고
    • Markov chain sensitivity measured by mean first passage times
    • Grace E. Cho and Carl D. Meyer. Markov chain sensitivity measured by mean first passage times. Linear Algebra Appl., 316:21-28, 2000.
    • (2000) Linear Algebra Appl. , vol.316 , pp. 21-28
    • Cho, G.E.1    Meyer, C.D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.