-
1
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Michael J. Kearns and Satinder P. Singh. Near-optimal reinforcement learning in polynomial time. Mach. Learn., 49:209-232, 2002.
-
(2002)
Mach. Learn
, vol.49
, pp. 209-232
-
-
Kearns, Michael J.1
Singh, Satinder P.2
-
2
-
-
0041965975
-
R-max - a general polynomial time algorithm for near-optimal reinforcement learning
-
Ronen I. Brafman and Moshe Tennenholtz. R-max - a general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res., 3:213-231, 2002.
-
(2002)
J. Mach. Learn. Res
, vol.3
, pp. 213-231
-
-
Brafman, Ronen I.1
Tennenholtz, Moshe2
-
5
-
-
34250700033
-
Pac model-free reinforcement learning
-
Alexander L. Strehl, Lihong Li, Eric Wiewiora, John Langford, and Michael L. Littman. Pac model-free reinforcement learning. In Proc. 23nd ICML 2006, pages 881-888, 2006.
-
(2006)
Proc. 23nd ICML 2006
, pp. 881-888
-
-
Strehl, Alexander L.1
Li, Lihong2
Wiewiora, Eric3
Langford, John4
Littman, Michael L.5
-
6
-
-
78649499440
-
Efficient reinforcement learning
-
ACM
-
Claude-Nicolas Fiechter. Efficient reinforcement learning. In Proc. 7th COLT, pages 88-97. ACM, 1994.
-
(1994)
Proc. 7th COLT
, pp. 88-97
-
-
Fiechter, Claude-Nicolas1
-
7
-
-
77951961847
-
Online regret bounds for a new reinforcement learning algorithm
-
ÖCG
-
Peter Auer and Ronald Ortner. Online regret bounds for a new reinforcement learning algorithm. In Proc. 1st ACVW, pages 35-42. ÖCG, 2005.
-
(2005)
Proc. 1st ACVW
, pp. 35-42
-
-
Auer, Peter1
Ortner, Ronald2
-
8
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res., 3:397-422, 2002.
-
(2002)
J. Mach. Learn. Res
, vol.3
, pp. 397-422
-
-
Auer, Peter1
-
9
-
-
0036568025
-
Finite-time analysis of the multi-armed bandit problem
-
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multi-armed bandit problem. Mach. Learn., 47:235-256, 2002.
-
(2002)
Mach. Learn
, vol.47
, pp. 235-256
-
-
Auer, Peter1
Cesa-Bianchi, Nicolò2
Fischer, Paul3
-
10
-
-
16244391087
-
An empirical evaluation of interval estimation for Markov decision processes
-
IEEE Computer Society
-
Alexander L. Strehl and Michael L. Littman. An empirical evaluation of interval estimation for Markov decision processes. In Proc. 16th ICTAI, pages 128-135. IEEE Computer Society, 2004.
-
(2004)
Proc. 16th ICTAI
, pp. 128-135
-
-
Strehl, Alexander L.1
Littman, Michael L.2
-
12
-
-
1942421149
-
Action elimination and stopping conditions for reinforcement learning
-
AAAI Press
-
Eyal Even-Dar, Shie Mannor, and Yishay Mansour. Action elimination and stopping conditions for reinforcement learning. In Proc. 20th ICML, pages 162-169. AAAI Press, 2003.
-
(2003)
Proc. 20th ICML
, pp. 162-169
-
-
Even-Dar, Eyal1
Mannor, Shie2
Mansour, Yishay3
-
13
-
-
0031070051
-
Optimal adaptive policies for Markov decision processes
-
Apostolos N. Burnetas and Michael N. Katehakis. Optimal adaptive policies for Markov decision processes. Math. Oper. Res., 22(1):222-255, 1997.
-
(1997)
Math. Oper. Res
, vol.22
, Issue.1
, pp. 222-255
-
-
Burnetas, Apostolos N.1
Katehakis, Michael N.2
-
16
-
-
0034375401
-
Markov chain sensitivity measured by mean first passage times
-
Grace E. Cho and Carl D. Meyer. Markov chain sensitivity measured by mean first passage times. Linear Algebra Appl., 316:21-28, 2000.
-
(2000)
Linear Algebra Appl
, vol.316
, pp. 21-28
-
-
Cho, Grace E.1
Meyer, Carl D.2
|