-
2
-
-
84886540275
-
Analysis of thompson sampling for the multi-armed bandit problem
-
S. Agrawal and N. Goyal. Analysis of Thompson Sampling for the Multi-armed Bandit Problem. In COLT, 2012a.
-
(2012)
COLT
-
-
Agrawal, S.1
Goyal, N.2
-
4
-
-
84898079018
-
Minimax policies for adversarial and stochastic bandits
-
J.-Y. Audibert and S. Bubeck. Minimax Policies for Adversarial and Stochastic Bandits. In COLT, 2009.
-
(2009)
COLT
-
-
Audibert, J.-Y.1
Bubeck, S.2
-
5
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3): 235-256, 2002.
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
6
-
-
84943560912
-
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
-
S. Bubeck and N. Cesa-Bianchi. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems. CoRR, 2012.
-
(2012)
CoRR
-
-
Bubeck, S.1
Cesa-Bianchi, N.2
-
7
-
-
85162416700
-
An empirical evaluation of thompson sampling
-
O. Chapelle and L. Li. An Empirical Evaluation of Thompson Sampling. In NIPS, pages 2249-2257, 2011.
-
(2011)
NIPS
, pp. 2249-2257
-
-
Chapelle, O.1
Li, L.2
-
8
-
-
84897516898
-
Open problem: Regret bounds for thompson sampling
-
O. Chapelle and L. Li. Open Problem: Regret Bounds for Thompson Sampling. In COLT, 2012.
-
(2012)
COLT
-
-
Chapelle, O.1
Li, L.2
-
11
-
-
77956543367
-
Web-scale Bayesian click-through rate prediction for sponsored search advertising in microsoft's bing search engine
-
T. Graepel, J. Q. Candela, T. Borchert, and R. Herbrich. Web-Scale Bayesian Click-Through rate Prediction for Sponsored Search Advertising in Microsoft's Bing Search Engine. In ICML, pages 13-20, 2010.
-
(2010)
ICML
, pp. 13-20
-
-
Graepel, T.1
Candela, J.Q.2
Borchert, T.3
Herbrich, R.4
-
13
-
-
3543140670
-
Dual weak pigeonhole principle Boolean complexity and derandomization
-
October
-
E. Jerábek. Dual weak pigeonhole principle, Boolean complexity, and derandomization. Annals of Pure and Applied Logic, 129(1-3): 1-37, October 2004.
-
(2004)
Annals of Pure and Applied Logic
, vol.129
, Issue.1-3
, pp. 1-37
-
-
Jerábek, E.1
-
16
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6: 4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
19
-
-
84860620509
-
-
Technical Report 11:01, Statistics Group, Department of Mathematics, University of Bristol
-
B. C. May, N. Korda, A. Lee, and D. S. Leslie. Optimistic Bayesian sampling in contextual-bandit problems. Technical Report 11: 01, Statistics Group, Department of Mathematics, University of Bristol, 2011.
-
(2011)
Optimistic Bayesian Sampling in Contextual-Bandit Problems
-
-
May, B.C.1
Korda, N.2
Lee, A.3
Leslie, D.S.4
-
22
-
-
14344258433
-
A Bayesian framework for reinforcement learning
-
M. J. A. Strens. A Bayesian Framework for Reinforcement Learning. In ICML, pages 943-950, 2000.
-
(2000)
ICML
, pp. 943-950
-
-
Strens, M.J.A.1
-
23
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
W. R. Thompson. On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika, 25(3-4): 285-294, 1933.
-
(1933)
Biometrika
, vol.25
, Issue.3-4
, pp. 285-294
-
-
Thompson, W.R.1
|