메뉴 건너뛰기




Volumn , Issue , 2009, Pages 35-42

REGAL: A regularization based algorithm for reinforcement learning in weakly communicating MDPs

Author keywords

[No Author keywords available]

Indexed keywords

MARKOV PROCESSES;

EID: 80053161827     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (238)

References (13)
  • 1
    • 56449090814 scopus 로고    scopus 로고
    • Logarithmic online regret bounds for undiscounted reinforcement learning
    • MIT Press
    • Peter Auer and Ronald Ortner. Logarithmic online regret bounds for undiscounted reinforcement learning. In Advances in Neural Information Processing Systems 19, pages 49-56. MIT Press, 2007.
    • (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 49-56
    • Auer, P.1    Ortner, R.2
  • 4
    • 0041965975 scopus 로고    scopus 로고
    • R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
    • Ronen I. Brafman and Moshe Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
    • (2002) Journal of Machine Learning Research , vol.3 , pp. 213-231
    • Brafman, R.I.1    Tennenholtz, M.2
  • 5
    • 0031070051 scopus 로고    scopus 로고
    • Optimal adaptive policies for markov decision processes
    • A. N. Burnetas and M. N. Katehakis. Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1):222-255, 1997. (Pubitemid 127621321)
    • (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
    • Burnetas, A.N.1    Katehakis, M.N.2
  • 7
    • 23244466805 scopus 로고    scopus 로고
    • PhD thesis, Gatsby Computational Neuroscience Unit, University College London
    • Sham Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.
    • (2003) On the Sample Complexity of Reinforcement Learning
    • Kakade, S.1
  • 8
    • 0036832954 scopus 로고    scopus 로고
    • Near-optimal reinforcement learning in polynomial time
    • DOI 10.1023/A:1017984413808
    • Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209-232, 2002. (Pubitemid 34325687)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 209-232
    • Kearns, M.1    Singh, S.2
  • 13
    • 85162041468 scopus 로고    scopus 로고
    • Optimistic linear programming gives logarithmic regret for irreducible MDPs
    • MIT Press
    • Ambuj Tewari and Peter L. Bartlett. Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Advances in Neural Information Processing Systems 20, pages 1505-1512. MIT Press, 2008.
    • (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1505-1512
    • Tewari, A.1    Bartlett, P.L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.