SCOPUS 정보 검색 플랫폼

NIPS 2006: Proceedings of the 19th International Conference on Neural Information Processing Systems

Volumn , Issue , 2006, Pages 49-56

Logarithmic Online Regret Bounds for Undiscounted Reinforcement Learning

(2) Auer, Peter a Ortner, Ronald a

a UNIVERSITY OF LEOBEN (Austria)

Author keywords

[No Author keywords available]

Indexed keywords

E-LEARNING; LEARNING ALGORITHMS;

EXPLORATION/EXPLOITATION; FINITE NUMBER; MULTIARMED BANDIT PROBLEMS (MABP); ON-LINE PERFORMANCE; OPTIMAL POLICIES; REGRET BOUNDS; REINFORCEMENT LEARNINGS; UPPER CONFIDENCE BOUND;

REINFORCEMENT LEARNING;

EID: 85151789426 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (73)

References (16)

1
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
- (2002) Mach. Learn , vol.49 , pp. 209-232
- Kearns, Michael J.¹ Singh, Satinder P.²

2
- 0041965975
- R-max - a general polynomial time algorithm for near-optimal reinforcement learning
- Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
- (2002) J. Mach. Learn. Res , vol.3 , pp. 213-231
- Brafman, Ronen I.¹ Tennenholtz, Moshe²

3
- 23244466805
- PhD thesis, University College London
- Sham M. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, Sham M.¹

4
- 31844432138
- A theoretical analysis of model-based interval estimation
- Alexander L. Strehl and Michael L. Littman. A theoretical analysis of model-based interval estimation. In Proc. 22nd ICML 2005, pages 857-864, 2005.
- (2005) Proc. 22nd ICML 2005 , pp. 857-864
- Strehl, Alexander L.¹ Littman, Michael L.²

5
- 34250700033
- Pac model-free reinforcement learning
- Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman. Pac model-free reinforcement learning. In Proc. 23nd ICML 2006, pages 881-888, 2006.
- (2006) Proc. 23nd ICML 2006 , pp. 881-888
- Strehl, Alexander L.¹ Li, Lihong² Wiewiora, Eric³ Langford, John⁴ Littman, Michael L.⁵

6
- 78649499440
- Efficient reinforcement learning
- ACM
- Claude-Nicolas Fiechter. Efficient reinforcement learning. In Proc. 7th COLT, pages 88-97. ACM, 1994.
- (1994) Proc. 7th COLT , pp. 88-97
- Fiechter, Claude-Nicolas¹

7
- 77951961847
- Online regret bounds for a new reinforcement learning algorithm
- ÖCG
- Peter Auer and Ronald Ortner. Online regret bounds for a new reinforcement learning algorithm. In Proc. 1st ACVW, pages 35-42. ÖCG, 2005.
- (2005) Proc. 1st ACVW , pp. 35-42
- Auer, Peter¹ Ortner, Ronald²

8
- 0041966002
- Using confidence bounds for exploitation-exploration trade-offs
- Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res., 3:397-422, 2002.
- (2002) J. Mach. Learn. Res , vol.3 , pp. 397-422
- Auer, Peter¹

9
- 0036568025
- Finite-time analysis of the multi-armed bandit problem
- Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multi-armed bandit problem. Mach. Learn., 47:235-256, 2002.
- (2002) Mach. Learn , vol.47 , pp. 235-256
- Auer, Peter¹ Cesa-Bianchi, Nicolò² Fischer, Paul³

10
- 16244391087
- An empirical evaluation of interval estimation for Markov decision processes
- IEEE Computer Society
- Alexander L. Strehl and Michael L. Littman. An empirical evaluation of interval estimation for Markov decision processes. In Proc. 16th ICTAI, pages 128-135. IEEE Computer Society, 2004.
- (2004) Proc. 16th ICTAI , pp. 128-135
- Strehl, Alexander L.¹ Littman, Michael L.²

11
- 0004280606
- MIT Press
- Leslie P. Kaelbling. Learning in Embedded Systems. MIT Press, 1993.
- (1993) Learning in Embedded Systems
- Kaelbling, Leslie P.¹

12
- 1942421149
- Action elimination and stopping conditions for reinforcement learning
- AAAI Press
- Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for reinforcement learning. In Proc. 20th ICML, pages 162-169. AAAI Press, 2003.
- (2003) Proc. 20th ICML , pp. 162-169
- Even-Dar, Eyal¹ Mannor, Shie² Mansour, Yishay³

13
- 0031070051
- Optimal adaptive policies for Markov decision processes
- Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997.
- (1997) Math. Oper. Res , vol.22 , Issue.1 , pp. 222-255
- Burnetas, Apostolos N.¹ Katehakis, Michael N.²

14
- 41649111187
- Experts in a Markov decision process
- MIT Press
- Eyal Even-Dar, Sham M. Kakade, and Yishay Mansour. Experts in a Markov decision process. In Proc. 17th NIPS, pages 401-408. MIT Press, 2004.
- (2004) Proc. 17th NIPS , pp. 401-408
- Even-Dar, Eyal¹ Kakade, Sham M.² Mansour, Yishay³

15
- 0003998452
- Markov Decision Processes
- Wiley
- Martin L. Puterman. Markov Decision Processes. Discrete Stochastic Programming. Wiley, 1994.
- (1994) Discrete Stochastic Programming
- Puterman, Martin L.¹

16
- 0034375401
- Markov chain sensitivity measured by mean first passage times
- Grace E. Cho and Carl D. Meyer. Markov chain sensitivity measured by mean first passage times. Linear Algebra Appl., 316:21-28, 2000.
- (2000) Linear Algebra Appl , vol.316 , pp. 21-28
- Cho, Grace E.¹ Meyer, Carl D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.