메뉴 건너뛰기




Volumn 7568 LNAI, Issue , 2012, Pages 320-334

PAC bounds for discounted MDPs

Author keywords

exploration exploitation; Markov decision processes; PAC MDP; Reinforcement learning; sample complexity

Indexed keywords

FINITE-STATE; LOWER BOUNDS; MARKOV DECISION PROCESSES; PAC BOUNDS; PAC-MDP; SAMPLE-COMPLEXITY; TRANSITION MATRICES; TRANSITION PROBABILITIES; UPPER AND LOWER BOUNDS;

EID: 84867877076     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-642-34106-9_26     Document Type: Conference Paper
Times cited : (139)

References (15)
  • 1
    • 77951952841 scopus 로고    scopus 로고
    • Near-optimal regret bounds for reinforcement learning
    • Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 99, 1563-1600 (2010)
    • (2010) J. Mach. Learn. Res. , vol.99 , pp. 1563-1600
    • Auer, P.1    Jaksch, T.2    Ortner, R.3
  • 3
    • 56449090814 scopus 로고    scopus 로고
    • Logarithmic online regret bounds for undiscounted reinforcement learning
    • MIT Press
    • Auer, P., Ortner, R.: Logarithmic online regret bounds for undiscounted reinforcement learning. In: Advances in Neural Information Processing Systems 19, pp. 49-56. MIT Press (2007)
    • (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 49-56
    • Auer, P.1    Ortner, R.2
  • 5
    • 34250210230 scopus 로고    scopus 로고
    • Concentration inequalities and martingale inequalities a survey
    • Chung, F., Lu, L.: Concentration inequalities and martingale inequalities a survey. Internet Mathematics 3, 1 (2006)
    • (2006) Internet Mathematics , vol.3 , pp. 1
    • Chung, F.1    Lu, L.2
  • 8
    • 0002899547 scopus 로고
    • Asymptotically efficient adaptive allocation rules
    • Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1), 4-22 (1985)
    • (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
    • Lai, T.1    Robbins, H.2
  • 9
    • 30044441333 scopus 로고    scopus 로고
    • The sample complexity of exploration in the multi-armed bandit problem
    • Mannor, S., Tsitsiklis, J.: The sample complexity of exploration in the multi-armed bandit problem. J. Mach. Learn. Res. 5, 623-648 (2004)
    • (2004) J. Mach. Learn. Res. , vol.5 , pp. 623-648
    • Mannor, S.1    Tsitsiklis, J.2
  • 11
    • 55549110436 scopus 로고    scopus 로고
    • An analysis of model-based interval estimation for Markov decision processes
    • Strehl, A., Littman, M.: An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences 74(8), 1309-1331 (2008)
    • (2008) Journal of Computer and System Sciences , vol.74 , Issue.8 , pp. 1309-1331
    • Strehl, A.1    Littman, M.2
  • 12
    • 73549084301 scopus 로고    scopus 로고
    • Reinforcement learning in finite MDPs: PAC analysis
    • Strehl, A., Li, L., Littman, M.: Reinforcement learning in finite MDPs: PAC analysis. J. Mach. Learn. Res. 10, 2413-2444 (2009)
    • (2009) J. Mach. Learn. Res. , vol.10 , pp. 2413-2444
    • Strehl, A.1    Li, L.2    Littman, M.3
  • 14
    • 0020279968 scopus 로고
    • The variance of discounted Markov decision processes
    • Sobel, M.: The variance of discounted Markov decision processes. Journal of Applied Probability 19(4), 794-802 (1982)
    • (1982) Journal of Applied Probability , vol.19 , Issue.4 , pp. 794-802
    • Sobel, M.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.