-
1
-
-
77951952841
-
Near-optimal regret bounds for reinforcement learning
-
Auer, P., Jaksch, T., Ortner, R.: Near-optimal regret bounds for reinforcement learning. J. Mach. Learn. Res. 99, 1563-1600 (2010)
-
(2010)
J. Mach. Learn. Res.
, vol.99
, pp. 1563-1600
-
-
Auer, P.1
Jaksch, T.2
Ortner, R.3
-
2
-
-
84867120416
-
On the sample complexity of reinforcement learning with a generative model
-
ACM, New York
-
Azar, M., Munos, R., Kappen, B.: On the sample complexity of reinforcement learning with a generative model. In: Proceedings of the 29th International Conference on Machine Learning. ACM, New York (2012)
-
(2012)
Proceedings of the 29th International Conference on Machine Learning
-
-
Azar, M.1
Munos, R.2
Kappen, B.3
-
3
-
-
56449090814
-
Logarithmic online regret bounds for undiscounted reinforcement learning
-
MIT Press
-
Auer, P., Ortner, R.: Logarithmic online regret bounds for undiscounted reinforcement learning. In: Advances in Neural Information Processing Systems 19, pp. 49-56. MIT Press (2007)
-
(2007)
Advances in Neural Information Processing Systems
, vol.19
, pp. 49-56
-
-
Auer, P.1
Ortner, R.2
-
5
-
-
34250210230
-
Concentration inequalities and martingale inequalities a survey
-
Chung, F., Lu, L.: Concentration inequalities and martingale inequalities a survey. Internet Mathematics 3, 1 (2006)
-
(2006)
Internet Mathematics
, vol.3
, pp. 1
-
-
Chung, F.1
Lu, L.2
-
8
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics 6(1), 4-22 (1985)
-
(1985)
Advances in Applied Mathematics
, vol.6
, Issue.1
, pp. 4-22
-
-
Lai, T.1
Robbins, H.2
-
9
-
-
30044441333
-
The sample complexity of exploration in the multi-armed bandit problem
-
Mannor, S., Tsitsiklis, J.: The sample complexity of exploration in the multi-armed bandit problem. J. Mach. Learn. Res. 5, 623-648 (2004)
-
(2004)
J. Mach. Learn. Res.
, vol.5
, pp. 623-648
-
-
Mannor, S.1
Tsitsiklis, J.2
-
11
-
-
55549110436
-
An analysis of model-based interval estimation for Markov decision processes
-
Strehl, A., Littman, M.: An analysis of model-based interval estimation for Markov decision processes. Journal of Computer and System Sciences 74(8), 1309-1331 (2008)
-
(2008)
Journal of Computer and System Sciences
, vol.74
, Issue.8
, pp. 1309-1331
-
-
Strehl, A.1
Littman, M.2
-
12
-
-
73549084301
-
Reinforcement learning in finite MDPs: PAC analysis
-
Strehl, A., Li, L., Littman, M.: Reinforcement learning in finite MDPs: PAC analysis. J. Mach. Learn. Res. 10, 2413-2444 (2009)
-
(2009)
J. Mach. Learn. Res.
, vol.10
, pp. 2413-2444
-
-
Strehl, A.1
Li, L.2
Littman, M.3
-
13
-
-
33749255382
-
PAC model-free reinforcement learning
-
ACM, New York
-
Strehl, A., Li, L., Wiewiorac, E., Langford, J., Littman, M.: PAC model-free reinforcement learning. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 881-888. ACM, New York (2006)
-
(2006)
Proceedings of the 23rd International Conference on Machine Learning, ICML 2006
, pp. 881-888
-
-
Strehl, A.1
Li, L.2
Wiewiorac, E.3
Langford, J.4
Littman, M.5
-
14
-
-
0020279968
-
The variance of discounted Markov decision processes
-
Sobel, M.: The variance of discounted Markov decision processes. Journal of Applied Probability 19(4), 794-802 (1982)
-
(1982)
Journal of Applied Probability
, vol.19
, Issue.4
, pp. 794-802
-
-
Sobel, M.1
|