-
1
-
-
0000616723
-
Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
-
R. Agrawal. Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Advances in Applied Probability, 27(4):1054-1078, 1995.
-
(1995)
Advances in Applied Probability
, vol.27
, Issue.4
, pp. 1054-1078
-
-
Agrawal, R.1
-
2
-
-
78649420293
-
Regret bounds and minimax policies under partial monitoring
-
J-Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Resaerch, 11:2785-2836, 2010.
-
(2010)
Journal of Machine Learning Resaerch
, vol.11
, pp. 2785-2836
-
-
Audibert, J.-Y.1
Bubeck, S.2
-
3
-
-
62949181077
-
Exploration-exploitation trade-off using variance estimates in multi-armed bandits
-
J-Y. Audibert, R. Munos, and Cs. Szepesvári. Exploration- exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 2009.
-
(2009)
Theoretical Computer Science
, vol.410
, Issue.19
-
-
Audibert, J.-Y.1
Munos, R.2
Szepesvári, Cs.3
-
4
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2):235-256, 2002. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
5
-
-
0031070051
-
Optimal adaptive policies for markov decision processes
-
A.N. Burnetas and M.N. Katehakis. Optimal adaptive policies for Markov decision processes. Mathematics of Operations Research, pages 222-255, 1997. (Pubitemid 127621321)
-
(1997)
Mathematics of Operations Research
, vol.22
, Issue.1
, pp. 222-255
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
6
-
-
0001072895
-
The use of confidence of fiducial limits illustration in the case of the binomial
-
C.J. Clopper and E.S. Pearson. The use of confidence of fiducial limits illustration in the case of the binomial. Biometrika, 26:404-413, 1934.
-
(1934)
Biometrika
, vol.26
, pp. 404-413
-
-
Clopper, C.J.1
Pearson, E.S.2
-
7
-
-
84937398609
-
PAC bounds for multi-armed bandit and Markov decision processes
-
Lecture Notes in Comput. Sci., Springer, Berlin
-
E. Even-Dar, S. Mannor, and Y. Mansour. PAC bounds for multi-armed bandit and Markov decision processes. In Conf. Comput. Learning Theory (Sydney, Australia, 2002), volume 2375 of Lecture Notes in Comput. Sci., pages 255-270. Springer, Berlin, 2002.
-
(2002)
Conf. Comput. Learning Theory (Sydney, Australia, 2002)
, vol.2375
, pp. 255-270
-
-
Even-Dar, E.1
Mannor, S.2
Mansour, Y.3
-
9
-
-
79952428999
-
Optimism in reinforcement learning and KullbackLeibler divergence
-
Monticello, US
-
S. Filippi, O. Cappé, and A. Garivier. Optimism in reinforcement learning and KullbackLeibler divergence. In Allerton Conf. on Communication, Control, and Computing, Monticello, US, 2010.
-
(2010)
Allerton Conf. On Communication, Control, and Computing
-
-
Filippi, S.1
Cappé, O.2
Garivier, A.3
-
11
-
-
84898077171
-
An asymptotically optimal bandit algorithm for bounded support models
-
T. Kalai and M. Mohri, editors, Haifa, Israel
-
J. Honda and A. Takemura. An asymptotically optimal bandit algorithm for bounded support models. In T. Kalai and M. Mohri, editors, Conf. Comput. Learning Theory, Haifa, Israel, 2010.
-
(2010)
Conf. Comput. Learning Theory
-
-
Honda, J.1
Takemura, A.2
-
12
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1):4, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, Issue.1
, pp. 4
-
-
Lai, T.L.1
Robbins, H.2
-
13
-
-
84874038864
-
A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences
-
Budapest, Hungary
-
O-A. Maillard, R. Memos, and G. Stoltz. A finite-time analysis of multi-armed bandits problems with kullback-leibler divergences. In Conf. Comput. Learning Theory, Budapest, Hungary, 2011.
-
(2011)
Conf. Comput. Learning Theory
-
-
Maillard, O.-A.1
Memos, R.2
Stoltz, G.3
-
14
-
-
84898473682
-
-
Springer, Berlin, Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003
-
P. Massart. Concentration inequalities and model selection, volume 1896 of Lecture Notes in Mathematics. Springer, Berlin, 2007. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6-23, 2003.
-
(2007)
Concentration Inequalities and Model Selection, Volume 1896 of Lecture Notes in Mathematics
-
-
Massart, P.1
|