SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems 21 - Proceedings of the 2008 Conference

Volumn , Issue , 2009, Pages 89-96

Near-optimal regret bounds for reinforcement learning

(3) Auer, Peter a Jaksch, Thomas a Ortner, Ronald a

a UNIVERSITY OF LEOBEN (Austria)

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; REINFORCEMENT LEARNING;

MARKOV DECISION PROCESSES; NEAR-OPTIMAL; NEW PARAMETERS; OPTIMAL POLICIES; OPTIMAL REGRET; REGRET BOUNDS; REINFORCEMENT LEARNING ALGORITHMS; REINFORCEMENT LEARNINGS; S STATE; TRANSITION STRUCTURES;

MARKOV PROCESSES;

EID: 73549103329 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (364)

References (13)

1
- 0004102479
- MIT Press
- Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

2
- 84899026236
- Finite-sample convergence rates for Q-learning and indirect algorithms
- MIT Press
- Michael J. Kearns and Satinder P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 11. MIT Press, 1999.
- (1999) Advances in Neural Information Processing Systems , vol.11
- Kearns, M.J.¹ Singh, S.P.²

3
- 0036832954
- Near-optimal reinforcement learning in polynomial time
- Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
- (2002) Mach. Learn. , vol.49 , pp. 209-232
- Kearns, M.J.¹ Singh, S.P.²

4
- 0003998452
- John Wiley & Sons, Inc., New York, NY, USA
- Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

5
- 56449090814
- Logarithmic online regret bounds for reinforcement learning
- MIT Press
- Peter Auer and Ronald Ortner. Logarithmic online regret bounds for reinforcement learning. In Advances in Neural Information Processing Systems 19, pages 49-56. MIT Press, 2007.
- (2007) Advances in Neural Information Processing Systems , vol.19 , pp. 49-56
- Auer, P.¹ Ortner, R.²

6
- 0041965975
- R-max - A general polynomial time algorithm for near-optimal reinforcement learning
- Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
- (2002) J. Mach. Learn. Res. , vol.3 , pp. 213-231
- Brafman, R.I.¹ Tennenholtz, M.²

7
- 85162041468
- Optimistic linear programming gives logarithmic regret for irreducible mdps
- MIT Press
- Ambuj Tewari and Peter Bartlett. Optimistic linear programming gives logarithmic regret for irreducible mdps. In Advances in Neural Information Processing Systems 20, pages 1505-1512. MIT Press, 2008.
- (2008) Advances in Neural Information Processing Systems , vol.20 , pp. 1505-1512
- Tewari, A.¹ Bartlett, P.²

8
- 23244466805
- PhD thesis, University College London
- Sham M. Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, University College London, 2003.
- (2003) On the Sample Complexity of Reinforcement Learning
- Kakade, S.M.¹

9
- 31844432138
- A theoretical analysis of model-based interval estimation
- ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
- Alexander L. Strehl and Michael L. Littman. A theoretical analysis of model-based interval estimation. In Proc. 22nd ICML 2005, pages 857-864, 2005. (Pubitemid 43183415)
- (2005) ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning , pp. 857-864
- Strehl, A.L.¹ Littman, M.L.²

10
- 55549110436
- An analysis of model-based interval estimation for Markov decision processes
- Alexander L. Strehl and Michael L. Littman. An analysis of model-based interval estimation for Markov decision processes. J. Comput. System Sci., 74(8):1309-1331, 2008.
- (2008) J. Comput. System Sci. , vol.74 , Issue.8 , pp. 1309-1331
- Strehl, A.L.¹ Littman, M.L.²

11
- 0031070051
- Optimal adaptive policies for markov decision processes
- Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997. (Pubitemid 127621321)
- (1997) Mathematics of Operations Research , vol.22 , Issue.1 , pp. 222-255
- Burnetas, A.N.¹ Katehakis, M.N.²

12
- 84899000904
- Experts in a Markov decision process
- MIT Press
- Eyal Even-Dar, Sham M. Kakade, and Yishay Mansour. Experts in a Markov decision process. In Advances in Neural Information Processing Systems 17, pages 401-408. MIT Press, 2005.
- (2005) Advances in Neural Information Processing Systems , vol.17 , pp. 401-408
- Even-Dar, E.¹ Kakade, S.M.² Mansour, Y.³

13
- 85162041677
- Near-optimal regret bounds for reinforcement learning
- Chair for Information Technology
- Peter Auer, Thomas Jaksch, and Ronald Ortner. Near-optimal regret bounds for reinforcement learning. Technical Report CIT-2009-01, University of Leoben, Chair for Information Technology, 2009. http://institute.unileoben.ac.at/ infotech/publications/TR/CIT-2009-01.pdf.
- (2009) Technical Report CIT-2009-01, University of Leoben
- Auer, P.¹ Jaksch, T.² Ortner, R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.