메뉴 건너뛰기




Volumn , Issue , 2008, Pages

Optimistic Linear Programming gives logarithmic regret for irreducible MDPs

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; MARKOV PROCESSES;

EID: 85162041468     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (84)

References (9)
  • 1
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for markov decision processes
    • Burnetas, A.N. & Katehakis, M.N. (1997) Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research 22(1):222-255 (Pubitemid 127621321)
    • (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 2
    • 56449090814 scopus 로고    scopus 로고
    • Logarithmic online regret bounds for undiscounted reinforcement learning
    • Cambridge, MA: MIT Press
    • Auer, P. & Ortner, R. (2007) Logarithmic online regret bounds for undiscounted reinforcement learning. Advances in Neural Information Processing Systems 19. Cambridge, MA: MIT Press.
    • (2007) Advances in Neural Information Processing Systems , vol.19
    • Auer, P.1    Ortner, R.2
  • 3
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T.L. & Robbins, H. (1985) Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1):4-22.
    • (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.L.1    Robbins, H.2
  • 4
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
    • Brafman, R.I. & Tennenholtz, M. (2002) R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3:213-231.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 5
    • 0041966002 scopus 로고    scopus 로고
    • Using confidence bounds for exploitation-exploration trade-offs
    • Auer, P. (2002) Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research 3:397-422.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 397-422
    • Auer, P.1
  • 6
    • 0036568025 scopus 로고    scopus 로고
    • Finite-time analysis of the multiarmed bandit problem
    • DOI 10.1023/A:1013689704352, Computational Learning Theory
    • Auer, P., Cesa-Bianchi, N. & and Fischer, P. (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learning 47(2-3):235-256. (Pubitemid 34126111)
    • (2002) Machine Learning , vol.47 , Issue.2-3 , pp. 235-256
    • Auer, P.1    Cesa-Bianchi, N.2    Fischer, P.3
  • 8
    • 84858757495 scopus 로고    scopus 로고
    • PhD thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley
    • Tewari, A. (2007) Reinforcement Learning in Large or Unknown MDPs. PhD thesis, Department of Electrical Engineering and Computer Sciences, University of California at Berkeley.
    • (2007) Reinforcement Learning in Large or Unknown MDPs
    • Tewari, A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.