-
2
-
-
84899026236
-
Finite-sample convergence rates for Q-learning and indirect algorithms
-
MIT Press
-
Michael J. Kearns and Satinder P. Singh. Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 11. MIT Press, 1999.
-
(1999)
Advances in Neural Information Processing Systems
, vol.11
-
-
Kearns, M.J.1
Singh, S.P.2
-
3
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
-
(2002)
Mach. Learn.
, vol.49
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
4
-
-
0003998452
-
-
John Wiley & Sons, Inc., New York, NY, USA
-
Martin L. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY, USA, 1994.
-
(1994)
Markov Decision Processes: Discrete Stochastic Dynamic Programming
-
-
Puterman, M.L.1
-
5
-
-
56449090814
-
Logarithmic online regret bounds for reinforcement learning
-
MIT Press
-
Peter Auer and Ronald Ortner. Logarithmic online regret bounds for reinforcement learning. In Advances in Neural Information Processing Systems 19, pages 49-56. MIT Press, 2007.
-
(2007)
Advances in Neural Information Processing Systems
, vol.19
, pp. 49-56
-
-
Auer, P.1
Ortner, R.2
-
6
-
-
0041965975
-
R-max - A general polynomial time algorithm for near-optimal reinforcement learning
-
Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
-
(2002)
J. Mach. Learn. Res.
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
7
-
-
85162041468
-
Optimistic linear programming gives logarithmic regret for irreducible mdps
-
MIT Press
-
Ambuj Tewari and Peter Bartlett. Optimistic linear programming gives logarithmic regret for irreducible mdps. In Advances in Neural Information Processing Systems 20, pages 1505-1512. MIT Press, 2008.
-
(2008)
Advances in Neural Information Processing Systems
, vol.20
, pp. 1505-1512
-
-
Tewari, A.1
Bartlett, P.2
-
10
-
-
55549110436
-
An analysis of model-based interval estimation for Markov decision processes
-
Alexander L. Strehl and Michael L. Littman. An analysis of model-based interval estimation for Markov decision processes. J. Comput. System Sci., 74(8):1309-1331, 2008.
-
(2008)
J. Comput. System Sci.
, vol.74
, Issue.8
, pp. 1309-1331
-
-
Strehl, A.L.1
Littman, M.L.2
-
11
-
-
0031070051
-
Optimal adaptive policies for markov decision processes
-
Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997. (Pubitemid 127621321)
-
(1997)
Mathematics of Operations Research
, vol.22
, Issue.1
, pp. 222-255
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
13
-
-
85162041677
-
Near-optimal regret bounds for reinforcement learning
-
Chair for Information Technology
-
Peter Auer, Thomas Jaksch, and Ronald Ortner. Near-optimal regret bounds for reinforcement learning. Technical Report CIT-2009-01, University of Leoben, Chair for Information Technology, 2009. http://institute.unileoben.ac.at/ infotech/publications/TR/CIT-2009-01.pdf.
-
(2009)
Technical Report CIT-2009-01, University of Leoben
-
-
Auer, P.1
Jaksch, T.2
Ortner, R.3
|