SCOPUS 정보 검색 플랫폼

Volumn , Issue , 2007, Pages 49-56

Logarithmic online regret bounds for undiscounted reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

FINITE NUMBER; MULTI-ARMED BANDIT PROBLEM; ON-LINE PERFORMANCE; OPTIMAL POLICIES; UPPER CONFIDENCE BOUND;

LEARNING ALGORITHMS;

REINFORCEMENT LEARNING;

EID: 56449090814 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (183)

References (16)

1
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
- (2002) Mach. Learn. , vol.49 , pp. 209-232
- Kearns, M.J.¹ Singh, S.P.²

2
- 0041965975
- R-max - A general polynomial time algorithm for near-optimal reinforcement learning
- Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
- (2002) J. Mach. Learn. Res. , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

3
- 23244466805
- PhD thesis, University College London
- Sham M. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.M.¹

4
- 31844432138
- A theoretical analysis of model-based interval estimation
- Alexander L. Strehl and Michael L. Littman. A theoretical analysis of model-based interval estimation. In Proc. 22nd ICML 2005, pages 857-864, 2005.
- (2005) Proc. 22nd ICML 2005 , pp. 857-864
- Strehl, A.L.¹ Littman, M.L.²

5
- 33749255382
- Pac model-free reinforcement learning
- Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman. Pac model-free reinforcement learning. In Proc. 23nd ICML 2006, pages 881-888, 2006.
- (2006) Proc. 23nd ICML 2006 , pp. 881-888
- Strehl, A.L.¹ Li, L.² Wiewiora, E.³ Langford, J.⁴ Littman, M.L.⁵

8
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res., 3:397-422, 2002.
- (2002) J. Mach. Learn. Res. , vol.3 , pp. 397-422
- Auer, P.¹

9
- 0036568025
- Finite-time analysis of the multi-armed bandit problem
- Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multi-armed bandit problem. Mach. Learn., 47:235-256, 2002.
- (2002) Mach. Learn. , vol.47 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

11
- 0004280606
- MIT Press
- Leslie P. Kaelbling. Learning in Embedded Systems. MIT Press, 1993.
- (1993) Learning in Embedded Systems
- Kaelbling, L.P.¹

13
- 0031070051
- Optimal adaptive policies for Markov decision processes
- Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997.
- (1997) Math. Oper. Res. , vol.22 , Issue.1 , pp. 222-255
- Burnetas, A.N.¹ Katehakis, M.N.²

15
- 0003998452
- Wiley
- Martin L. Puterman. Markov Decision Processes. Discrete Stochastic Programming. Wiley, 1994.
- (1994) Markov Decision Processes. Discrete Stochastic Programming
- Puterman, M.L.¹

16
- 0034375401
- Markov chain sensitivity measured by mean first passage times
- Grace E. Cho and Carl D. Meyer. Markov chain sensitivity measured by mean first passage times. Linear Algebra Appl., 316:21-28, 2000.
- (2000) Linear Algebra Appl. , vol.316 , pp. 21-28
- Cho, G.E.¹ Meyer, C.D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.