-
1
-
-
62949181077
-
Exploration-exploitation trade-off using variance estimates in multi-armed bandits
-
J-Y. Audibert, R. Munos, and C. Szepesvari. Exploration-exploitation trade-off using variance estimates in multi-armed bandits. Theoretical Computer Science, 410:1876-1902, 2009.
-
(2009)
Theoretical Computer Science
, vol.410
, pp. 1876-1902
-
-
Audibert, J.-Y.1
Munos, R.2
Szepesvari, C.3
-
2
-
-
78649420293
-
Regret bounds and minimax policies under partial monitoring
-
J.Y. Audibert and S. Bubeck. Regret bounds and minimax policies under partial monitoring. Journal of Machine Learning Research, 11:2635-2686, 2010.
-
(2010)
Journal of Machine Learning Research
, vol.11
, pp. 2635-2686
-
-
Audibert, J.Y.1
Bubeck, S.2
-
3
-
-
77957337199
-
UCB revisited: Improved regret bounds for the stochastic multiarmed bandit problem
-
P. Auer and R. Ortner. UCB revisited: Improved regret bounds for the stochastic multiarmed bandit problem. Periodica Mathematica Hungarica, 61(1-2) :555, 2010.
-
(2010)
Periodica Mathematica Hungarica
, vol.61
, Issue.1-2
, pp. 555
-
-
Auer, P.1
Ortner, R.2
-
4
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
DOI 10.1023/A:1013689704352, Computational Learning Theory
-
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002. (Pubitemid 34126111)
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
5
-
-
0030159874
-
Optimal adaptive policies for sequential allocation problems
-
DOI 10.1006/aama.1996.0007
-
A.N. Burnetas and M.N. Katehakis. Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2) :122-142, 1996. (Pubitemid 126160351)
-
(1996)
Advances in Applied Mathematics
, vol.17
, Issue.2
, pp. 122-142
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
9
-
-
84863920694
-
The KL-UCB algorithm for bounded stochastic bandits and beyond
-
A. Garivier and O. Cappè. The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of COLT, 2011.
-
(2011)
Proceedings of COLT
-
-
Garivier, A.1
Cappè, O.2
-
11
-
-
84898077171
-
An asymptotically optimal bandit algorithm for bounded support models
-
J. Honda and A. Takemura. An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of COLT, pages 67-79, 2010a.
-
(2010)
Proceedings of COLT
, pp. 67-79
-
-
Honda, J.1
Takemura, A.2
-
13
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
15
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematics Society, 58:527-535, 1952.
-
(1952)
Bulletin of the American Mathematics Society
, vol.58
, pp. 527-535
-
-
Robbins, H.1
|