-
1
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
-
(2002)
Mach. Learn.
, vol.49
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
2
-
-
0041965975
-
R-max - A general polynomial time algorithm for near-optimal reinforcement learning
-
Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
-
(2002)
J. Mach. Learn. Res.
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
4
-
-
31844432138
-
A theoretical analysis of model-based interval estimation
-
Alexander L. Strehl and Michael L. Littman. A theoretical analysis of model-based interval estimation. In Proc. 22nd ICML 2005, pages 857-864, 2005.
-
(2005)
Proc. 22nd ICML 2005
, pp. 857-864
-
-
Strehl, A.L.1
Littman, M.L.2
-
5
-
-
33749255382
-
Pac model-free reinforcement learning
-
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman. Pac model-free reinforcement learning. In Proc. 23nd ICML 2006, pages 881-888, 2006.
-
(2006)
Proc. 23nd ICML 2006
, pp. 881-888
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Langford, J.4
Littman, M.L.5
-
6
-
-
78649499440
-
Efficient reinforcement learning
-
ACM
-
Claude-Nicolas Fiechter. Efficient reinforcement learning. In Proc. 7th COLT, pages 88-97. ACM, 1994.
-
(1994)
Proc. 7th COLT
, pp. 88-97
-
-
Fiechter, C.-N.1
-
7
-
-
77951961847
-
Online regret bounds for a new reinforcement learning algorithm
-
ÖCG
-
Peter Auer and Ronald Ortner. Online regret bounds for a new reinforcement learning algorithm. In Proc. 1st ACVW, pages 35-42. ÖCG, 2005.
-
(2005)
Proc. 1st ACVW
, pp. 35-42
-
-
Auer, P.1
Ortner, R.2
-
8
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res., 3:397-422, 2002.
-
(2002)
J. Mach. Learn. Res.
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
9
-
-
0036568025
-
Finite-time analysis of the multi-armed bandit problem
-
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multi-armed bandit problem. Mach. Learn., 47:235-256, 2002.
-
(2002)
Mach. Learn.
, vol.47
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
10
-
-
16244391087
-
An empirical evaluation of interval estimation for Markov decision processes
-
IEEE Computer Society
-
Alexander L. Strehl and Michael L. Littman. An empirical evaluation of interval estimation for Markov decision processes. In Proc. 16th ICTAI, pages 128-135. IEEE Computer Society, 2004.
-
(2004)
Proc. 16th ICTAI
, pp. 128-135
-
-
Strehl, A.L.1
Littman, M.L.2
-
12
-
-
1942421149
-
Action elimination and stopping conditions for reinforcement learning
-
AAAI Press
-
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for reinforcement learning. In Proc. 20th ICML, pages 162-169. AAAI Press, 2003.
-
(2003)
Proc. 20th ICML
, pp. 162-169
-
-
Even-Dar, E.1
Mannor, S.2
Mansour, Y.3
-
13
-
-
0031070051
-
Optimal adaptive policies for Markov decision processes
-
Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997.
-
(1997)
Math. Oper. Res.
, vol.22
, Issue.1
, pp. 222-255
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
14
-
-
84899000904
-
Experts in a Markov decision process
-
MIT Press
-
Eyal Even-Dar, Sham M. Kakade, and Yishay Mansour. Experts in a Markov decision process. In Proc. 17th NIPS, pages 401-408. MIT Press, 2004.
-
(2004)
Proc. 17th NIPS
, pp. 401-408
-
-
Even-Dar, E.1
Kakade, S.M.2
Mansour, Y.3
-
16
-
-
0034375401
-
Markov chain sensitivity measured by mean first passage times
-
Grace E. Cho and Carl D. Meyer. Markov chain sensitivity measured by mean first passage times. Linear Algebra Appl., 316:21-28, 2000.
-
(2000)
Linear Algebra Appl.
, vol.316
, pp. 21-28
-
-
Cho, G.E.1
Meyer, C.D.2
|