-
1
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Auer P. Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3 (2002) 397-422
-
(2002)
J. Mach. Learn. Res.
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
2
-
-
55549083745
-
-
P. Auer, R. Ortner, Online regret bounds for a new reinforcement learning algorithm, in: First Austrian Cognitive Vision Workshop, 2005, pp. 35-42
-
P. Auer, R. Ortner, Online regret bounds for a new reinforcement learning algorithm, in: First Austrian Cognitive Vision Workshop, 2005, pp. 35-42
-
-
-
-
3
-
-
0041965975
-
R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman R.I., and Tennenholtz M. R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning. J. Mach. Learn. Res. 3 (2002) 213-231
-
(2002)
J. Mach. Learn. Res.
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
4
-
-
84937398609
-
-
E. Even-Dar, S. Mannor, Y. Mansour, PAC bounds for multi-armed bandit and Markov decision processes, in: 15th Annual Conference on Computational Learning Theory (COLT), 2002, pp. 255-270
-
E. Even-Dar, S. Mannor, Y. Mansour, PAC bounds for multi-armed bandit and Markov decision processes, in: 15th Annual Conference on Computational Learning Theory (COLT), 2002, pp. 255-270
-
-
-
-
5
-
-
1942421149
-
-
E. Even-Dar, S. Mannor, Y. Mansour, Action elimination and stopping conditions for reinforcement learning, in: The Twentieth International Conference on Machine Learning (ICML 2003), 2003, pp. 162-169
-
E. Even-Dar, S. Mannor, Y. Mansour, Action elimination and stopping conditions for reinforcement learning, in: The Twentieth International Conference on Machine Learning (ICML 2003), 2003, pp. 162-169
-
-
-
-
6
-
-
55549099461
-
-
C.-N. Fiechter, Expected mistake bound model for on-line reinforcement learning, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 116-124
-
C.-N. Fiechter, Expected mistake bound model for on-line reinforcement learning, in: Proceedings of the Fourteenth International Conference on Machine Learning, 1997, pp. 116-124
-
-
-
-
7
-
-
55549113709
-
-
P.W.L. Fong, A quantitative study of hypothesis selection, in: Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), 1995, pp. 226-234
-
P.W.L. Fong, A quantitative study of hypothesis selection, in: Proceedings of the Twelfth International Conference on Machine Learning (ICML-95), 1995, pp. 226-234
-
-
-
-
8
-
-
0034272032
-
Bounded-parameter Markov decision processes
-
Givan R., Leach S., and Dean T. Bounded-parameter Markov decision processes. Artificial Intelligence 122 1-2 (2000) 71-109
-
(2000)
Artificial Intelligence
, vol.122
, Issue.1-2
, pp. 71-109
-
-
Givan, R.1
Leach, S.2
Dean, T.3
-
10
-
-
55549141728
-
-
S.M. Kakade, On the sample complexity of reinforcement learning, PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003
-
S.M. Kakade, On the sample complexity of reinforcement learning, PhD thesis, Gatsby Computational Neuroscience Unit, University College London, 2003
-
-
-
-
11
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Kearns M.J., and Singh S.P. Near-optimal reinforcement learning in polynomial time. Machine Learning 49 2-3 (2002) 209-232
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
13
-
-
0000854435
-
Adaptive treatment allocation and the multi-armed bandit problem
-
Lai T.L. Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15 3 (1987) 1091-1114
-
(1987)
Ann. Statist.
, vol.15
, Issue.3
, pp. 1091-1114
-
-
Lai, T.L.1
-
14
-
-
13244260002
-
-
A. Nilim, L.E. Ghaoui, Robustness in Markov decision problems with uncertain transition matrices, in: Advances in Neural Information Processing Systems 16 (NIPS-03), 2004
-
A. Nilim, L.E. Ghaoui, Robustness in Markov decision problems with uncertain transition matrices, in: Advances in Neural Information Processing Systems 16 (NIPS-03), 2004
-
-
-
-
16
-
-
55549119838
-
-
M.J. Streeter, S.F. Smith, A simple distribution-free approach to the max k-armed bandit problem, in: CP 2006: Proceedings of the Twelfth International Conference on Principles and Practice of Constraint Programming, 2006
-
M.J. Streeter, S.F. Smith, A simple distribution-free approach to the max k-armed bandit problem, in: CP 2006: Proceedings of the Twelfth International Conference on Principles and Practice of Constraint Programming, 2006
-
-
-
-
17
-
-
34548745051
-
-
A.L. Strehl, L. Li, M.L. Littman, Incremental model-based learners with formal learning-time guarantees, in: UAI-06: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006, pp. 485-493
-
A.L. Strehl, L. Li, M.L. Littman, Incremental model-based learners with formal learning-time guarantees, in: UAI-06: Proceedings of the 22nd Conference on Uncertainty in Artificial Intelligence, 2006, pp. 485-493
-
-
-
-
18
-
-
34250700033
-
-
A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML-06: Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 881-888
-
A.L. Strehl, L. Li, E. Wiewiora, J. Langford, M.L. Littman, PAC model-free reinforcement learning, in: ICML-06: Proceedings of the 23rd International Conference on Machine Learning, 2006, pp. 881-888
-
-
-
-
19
-
-
16244391087
-
-
A.L. Strehl, M.L. Littman, An empirical evaluation of interval estimation for Markov decision processes, in: The 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004), 2004, pp. 128-135
-
A.L. Strehl, M.L. Littman, An empirical evaluation of interval estimation for Markov decision processes, in: The 16th IEEE International Conference on Tools with Artificial Intelligence (ICTAI-2004), 2004, pp. 128-135
-
-
-
-
20
-
-
31844432138
-
-
A.L. Strehl, M.L. Littman, A theoretical analysis of model-based interval estimation, in: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML-05), 2005, pp. 857-864
-
A.L. Strehl, M.L. Littman, A theoretical analysis of model-based interval estimation, in: Proceedings of the Twenty-Second International Conference on Machine Learning (ICML-05), 2005, pp. 857-864
-
-
-
-
22
-
-
0021518106
-
A theory of the learnable
-
Valiant L.G. A theory of the learnable. Comm. ACM 27 11 (1984) 1134-1142
-
(1984)
Comm. ACM
, vol.27
, Issue.11
, pp. 1134-1142
-
-
Valiant, L.G.1
-
23
-
-
55549133483
-
-
T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, M.J. Weinberger, Inequalities for the L1 deviation of the empirical distribution, Tech. Rep. HPL-2003-97R1, Hewlett-Packard Labs, 2003
-
T. Weissman, E. Ordentlich, G. Seroussi, S. Verdu, M.J. Weinberger, Inequalities for the L1 deviation of the empirical distribution, Tech. Rep. HPL-2003-97R1, Hewlett-Packard Labs, 2003
-
-
-
-
24
-
-
55549143204
-
-
M. Wiering, J. Schmidhuber, Efficient model-based exploration, in: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB'98), 1998, pp. 223-228
-
M. Wiering, J. Schmidhuber, Efficient model-based exploration, in: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior (SAB'98), 1998, pp. 223-228
-
-
-
-
25
-
-
55549109611
-
-
J.L. Wyatt, Exploration control in reinforcement learning using optimistic model selection, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), 2001, pp. 593-600
-
J.L. Wyatt, Exploration control in reinforcement learning using optimistic model selection, in: Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), 2001, pp. 593-600
-
-
-
|