-
1
-
-
56449090814
-
Logarithmic online regret bounds for undiscounted reinforcement learning
-
MIT Press
-
Peter Auer and Ronald Ortner. Logarithmic online regret bounds for undiscounted reinforcement learning. In Advances in Neural Information Processing Systems 19, pages 49-56. MIT Press, 2007.
-
(2007)
Advances in Neural Information Processing Systems
, vol.19
, pp. 49-56
-
-
Auer, P.1
Ortner, R.2
-
4
-
-
0041965975
-
R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
-
Ronen I. Brafman and Moshe Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
5
-
-
0031070051
-
Optimal adaptive policies for markov decision processes
-
A. N. Burnetas and M. N. Katehakis. Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, 22(1):222-255, 1997. (Pubitemid 127621321)
-
(1997)
Mathematics of Operations Research
, vol.22
, Issue.1
, pp. 222-255
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
7
-
-
23244466805
-
-
PhD thesis, Gatsby Computational Neuroscience Unit, University College London
-
Sham Kakade. On the Sample Complexity of Reinforcement Learning. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003.
-
(2003)
On the Sample Complexity of Reinforcement Learning
-
-
Kakade, S.1
-
8
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
DOI 10.1023/A:1017984413808
-
Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49:209-232, 2002. (Pubitemid 34325687)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 209-232
-
-
Kearns, M.1
Singh, S.2
-
12
-
-
33749255382
-
PAC model-free reinforcement learning
-
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Lang-ford, and Michael L. Littman. PAC model-free reinforcement learning. In Proceedings of the Twenty-Third International Conference on Machine Learning, 2006.
-
(2006)
Proceedings of the Twenty-Third International Conference on Machine Learning
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Lang-Ford, J.4
Littman, M.L.5
-
13
-
-
85162041468
-
Optimistic linear programming gives logarithmic regret for irreducible MDPs
-
MIT Press
-
Ambuj Tewari and Peter L. Bartlett. Optimistic linear programming gives logarithmic regret for irreducible MDPs. In Advances in Neural Information Processing Systems 20, pages 1505-1512. MIT Press, 2008.
-
(2008)
Advances in Neural Information Processing Systems
, vol.20
, pp. 1505-1512
-
-
Tewari, A.1
Bartlett, P.L.2
|