SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 7568 LNAI, Issue , 2012, Pages 320-334

PAC bounds for discounted MDPs

(2) Lattimore, Tor a Hutter, Marcus a

a AUSTRALIAN NATIONAL UNIVERSITY (Australia)

Author keywords

exploration exploitation; Markov decision processes; PAC MDP; Reinforcement learning; sample complexity

Indexed keywords

FINITE-STATE; LOWER BOUNDS; MARKOV DECISION PROCESSES; PAC BOUNDS; PAC-MDP; SAMPLE-COMPLEXITY; TRANSITION MATRICES; TRANSITION PROBABILITIES; UPPER AND LOWER BOUNDS;

MARKOV PROCESSES;

REINFORCEMENT LEARNING;

EID: 84867877076 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-642-34106-9_26 Document Type: Conference Paper

Times cited : (139)

References (15)

1
- 77951952841
- Near-optimal regret bounds for reinforcement learning
- Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 99, 1563-1600 (2010)
- (2010) J. Mach. Learn. Res. , vol.99 , pp. 1563-1600
- Auer, P.¹ Jaksch, T.² Ortner, R.³

2
- 84867120416
- On the sample complexity of reinforcement learning with a generative model
- ACM, New York
- Azar, M., Munos, R., Kappen, B.: On the sample complexity of reinforcement learning with a generative model. In: Proceedings of the 29th International Conference on Machine Learning. ACM, New York (2012)
- (2012) Proceedings of the 29th International Conference on Machine Learning
- Azar, M.¹ Munos, R.² Kappen, B.³

3
- 56449090814
- Logarithmic online regret bounds for undiscounted reinforcement learning
- MIT Press
- Auer, P., Ortner, R.: Logarithmic online regret bounds for undiscounted reinforcement learning. In: Advances in Neural Information Processing Systems 19, pp. 49-56. MIT Press (2007)
- (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 49-56
- Auer, P.¹ Ortner, R.²

4
- 84867850152
- Upper confidence reinforcement learning
- Unpublished, keynote at
- Auer, P.: Upper confidence reinforcement learning. Unpublished, keynote at European Workshop of Reinforcement Learning (2011)
- European Workshop of Reinforcement Learning (2011)
- Auer, P.¹

5
- 34250210230
- Concentration inequalities and martingale inequalities a survey
- Chung, F., Lu, L.: Concentration inequalities and martingale inequalities a survey. Internet Mathematics 3, 1 (2006)
- (2006) Internet Mathematics , vol.3 , pp. 1
- Chung, F.¹ Lu, L.²

6
- 23244466805
- PhD thesis, University College London
- Kakade, S.: On The Sample Complexity of Reinforcement Learning. PhD thesis, University College London (2003)
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.¹

7
- 84867851411
- Technical report
- Lattimore, T., Hutter, M.: PAC bounds for discounted MDPs. Technical report (2012), http://arxiv.org/abs/1202.3890
- (2012) PAC Bounds for Discounted MDPs
- Lattimore, T.¹ Hutter, M.²

8
- 0002899547
- Asymptotically efficient adaptive allocation rules
- Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1), 4-22 (1985)
- (1985) Advances in Applied Mathematics , vol.6 , Issue.1 , pp. 4-22
- Lai, T.¹ Robbins, H.²

9
- 30044441333
- The sample complexity of exploration in the multi-armed bandit problem
- Mannor, S., Tsitsiklis, J.: The sample complexity of exploration in the multi-armed bandit problem. J. Mach. Learn. Res. 5, 623-648 (2004)
- (2004) J. Mach. Learn. Res. , vol.5 , pp. 623-648
- Mannor, S.¹ Tsitsiklis, J.²

10
- 31844432138
- A theoretical analysis of model-based interval estimation
- Strehl, A., Littman, M.: A theoretical analysis of model-based interval estimation. In: Proceedings of the 22nd International Conference on Machine Learning, ICML 2005, pp. 856-863 (2005)
- (2005) Proceedings of the 22nd International Conference on Machine Learning, ICML 2005 , pp. 856-863
- Strehl, A.¹ Littman, M.²

11
- 55549110436
- An analysis of model-based interval estimation for Markov decision processes
- Strehl, A., Littman, M.: An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences 74(8), 1309-1331 (2008)
- (2008) Journal of Computer and System Sciences , vol.74 , Issue.8 , pp. 1309-1331
- Strehl, A.¹ Littman, M.²

12
- 73549084301
- Reinforcement learning in finite MDPs: PAC analysis
- Strehl, A., Li, L., Littman, M.: Reinforcement learning in finite MDPs: PAC analysis. J. Mach. Learn. Res. 10, 2413-2444 (2009)
- (2009) J. Mach. Learn. Res. , vol.10 , pp. 2413-2444
- Strehl, A.¹ Li, L.² Littman, M.³

13
- 33749255382
- PAC model-free reinforcement learning
- ACM, New York
- Strehl, A., Li, L., Wiewiorac, E., Langford, J., Littman, M.: PAC model-free reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 881-888. ACM, New York (2006)
- (2006) Proceedings of the 23rd International Conference on Machine Learning, ICML 2006 , pp. 881-888
- Strehl, A.¹ Li, L.² Wiewiorac, E.³ Langford, J.⁴ Littman, M.⁵

14
- 0020279968
- The variance of discounted Markov decision processes
- Sobel, M.: The variance of discounted Markov decision processes. Journal of Applied Probability 19(4), 794-802 (1982)
- (1982) Journal of Applied Probability , vol.19 , Issue.4 , pp. 794-802
- Sobel, M.¹

15
- 77956520676
- Model-based reinforcement learning with nearly tight exploration complexity bounds
- ACM, New York
- Szita, I., Szepesvári, C.: Model-based reinforcement learning with nearly tight exploration complexity bounds. In: Proceedings of the 27th International Conference on Machine Learning, pp. 1031-1038. ACM, New York (2010)
- (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 1031-1038
- Szita, I.¹ Szepesvári, C.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.