-
1
-
-
0000616723
-
Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
-
MR1358906
-
AGRAWAL, R. (1995). Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. in Appl. Probab. 27 1054–1078. MR1358906
-
(1995)
Adv. in Appl. Probab.
, vol.27
, pp. 1054-1078
-
-
Agrawal, R.1
-
2
-
-
78649420293
-
Regret bounds and minimax policies under partial monitoring
-
MR2738783
-
AUDIBERT, J.-Y. and BUBECK, S. (2010). Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 2785–2836. MR2738783
-
(2010)
J. Mach. Learn. Res.
, vol.11
, pp. 2785-2836
-
-
Audibert, J.-Y.1
Bubeck, S.2
-
3
-
-
62949181077
-
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
-
MR2514714
-
AUDIBERT, J.-Y., MUNOS, R. and SZEPESVÁRI, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410 1876–1902. MR2514714
-
(2009)
Theoret. Comput. Sci.
, vol.410
, pp. 1876-1902
-
-
Audibert, J.-Y.1
Munos, R.2
Szepesvári, C.3
-
4
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
AUER, P., CESA-BIANCHI, N. and FISCHER, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 235–256.
-
(2002)
Machine Learning
, vol.47
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
5
-
-
84874045238
-
Regret analysis of stochastic and nonstochastic multiarmed bandit problems
-
BUBECK, S. and CESA-BIANCHI, N. (2012). Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends in Machine Learning 5 1–122.
-
(2012)
Foundations and Trends in Machine Learning
, vol.5
, pp. 1-122
-
-
Bubeck, S.1
Cesa-Bianchi, N.2
-
6
-
-
0030159874
-
Optimal adaptive policies for sequential allocation problems
-
MR1390571
-
BURNETAS, A. N. and KATEHAKIS, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Adv. in Appl. Math. 17 122–142. MR1390571
-
(1996)
Adv. in Appl. Math.
, vol.17
, pp. 122-142
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
7
-
-
0031070051
-
Optimal adaptive policies for Markov decision processes
-
MR1436581
-
BURNETAS, A. N. and KATEHAKIS, M. N. (1997). Optimal adaptive policies for Markov decision processes. Math. Oper. Res. 22 222–255. MR1436581
-
(1997)
Math. Oper. Res.
, vol.22
, pp. 222-255
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
8
-
-
0037249547
-
Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem
-
MR1959385
-
BURNETAS, A. N. and KATEHAKIS, M. N. (2003). Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem. Probab. Engrg. Inform. Sci. 17 53–82. MR1959385
-
(2003)
Probab. Engrg. Inform. Sci.
, vol.17
, pp. 53-82
-
-
Burnetas, A.N.1
Katehakis, M.N.2
-
10
-
-
84898949562
-
-
CAPPÉ, O., GARIVIER, A., MAILLARD, O.-A., MUNOS, R. and STOLTZ, G. (2013). Supplement to “Kullback–Leibler upper confidence bounds for optimal sequential allocation.” DOI:10.1214/13-AOS1119SUPP.
-
(2013)
Supplement to “Kullback–Leibler Upper Confidence Bounds for Optimal Sequential Allocation
-
-
Cappé, O.1
Garivier, A.2
Maillard, O.-A.3
Munos, R.4
Stoltz, G.5
-
11
-
-
0009953451
-
Optimal stopping and dynamic allocation
-
MR0914595
-
CHANG, F. and LAI, T. L. (1987). Optimal stopping and dynamic allocation. Adv. in Appl. Probab. 19 829–853. MR0914595
-
(1987)
Adv. in Appl. Probab.
, vol.19
, pp. 829-853
-
-
Chang, F.1
Lai, T.L.2
-
12
-
-
0003549661
-
-
2nd ed. Springer, New York. MR0953964
-
CHOW, Y. S. and TEICHER, H. (1988). Probability Theory: Independence, Interchangeability, Martingales, 2nd ed. Springer, New York. MR0953964
-
(1988)
Probability Theory: Independence, Interchangeability, Martingales
-
-
Chow, Y.S.1
Teicher, H.2
-
14
-
-
79952428999
-
Optimism in reinforcement learning and Kullback–Leibler divergence
-
IEEE Press, Piscataway, NJ
-
FILIPPI, S., CAPPÉ, O. and GARIVIER, A. (2010). Optimism in reinforcement learning and Kullback–Leibler divergence. In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing. IEEE Press, Piscataway, NJ.
-
(2010)
Proceedings of The 48th Annual Allerton Conference on Communication, Control, and Computing
-
-
Filippi, S.1
Cappé, O.2
Garivier, A.3
-
16
-
-
0000169010
-
Bandit processes and dynamic allocation indices (with discussion)
-
MR0547241
-
GITTINS, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 41 148–177. MR0547241
-
(1979)
J. R. Stat. Soc. Ser. B Stat. Methodol.
, vol.41
, pp. 148-177
-
-
Gittins, J.C.1
-
18
-
-
84947403595
-
Probability inequalities for sums of bounded random variables
-
MR0144363
-
HOEFFDING, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30. MR0144363
-
(1963)
J. Amer. Statist. Assoc.
, vol.58
, pp. 13-30
-
-
Hoeffding, W.1
-
20
-
-
83155180573
-
An asymptotically optimal policy for finite support models in the multiarmed bandit problem
-
HONDA, J. and TAKEMURA, A. (2011). An asymptotically optimal policy for finite support models in the multiarmed bandit problem. Machine Learning 85 361–391.
-
(2011)
Machine Learning
, vol.85
, pp. 361-391
-
-
Honda, J.1
Takemura, A.2
-
23
-
-
84867888479
-
Thompson sampling: An asymptotically optimal finite time analysis
-
Springer, New York
-
KAUFMANN, E., KORDA, N. and MUNOS, R. (2012). Thompson sampling: An asymptotically optimal finite time analysis. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory 199–213. Springer, New York.
-
(2012)
Proceedings of The 23rd International Conference on Algorithmic Learning Theory
, pp. 199-213
-
-
Kaufmann, E.1
Korda, N.2
Munos, R.3
-
24
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
MR0776826
-
LAI, T. L. and ROBBINS, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22. MR0776826
-
(1985)
Adv. in Appl. Math.
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
27
-
-
34247553430
-
Concentration Inequalities and Model Selection
-
Springer, Berlin. MR2319879
-
MASSART, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin. MR2319879
-
(2007)
Lecture Notes in Math
, vol.1896
-
-
Massart, P.1
-
28
-
-
85057370061
-
-
Chapman & Hall/CRC, Boca Raton, FL
-
OWEN, A. B. (2001). Empirical Likelihood. Chapman & Hall/CRC, Boca Raton, FL.
-
(2001)
Empirical Likelihood
-
-
Owen, A.B.1
-
29
-
-
84966203785
-
Some aspects of the sequential design of experiments
-
MR0050246
-
ROBBINS, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58 527–535. MR0050246
-
(1952)
Bull. Amer. Math. Soc. (N.S.)
, vol.58
, pp. 527-535
-
-
Robbins, H.1
-
30
-
-
0001395850
-
On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
-
THOMPSON, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 285–294.
-
(1933)
Biometrika
, vol.25
, pp. 285-294
-
-
Thompson, W.R.1
-
31
-
-
78650475463
-
On the Theory of Apportionment
-
MR1507085
-
THOMPSON, W. R. (1935). On the Theory of Apportionment. Amer. J. Math. 57 450–456. MR1507085
-
(1935)
Amer. J. Math.
, vol.57
, pp. 450-456
-
-
Thompson, W.R.1
-
33
-
-
65749118363
-
Graphical models, exponential families, and variational inference
-
WAINWRIGHT, M. J. and JORDAN, M. I. (2008). Graphical models, exponential families, and variational inference. Foundation and Trends in Machine Learning 1 1–305.
-
(2008)
Foundation and Trends in Machine Learning
, vol.1
, pp. 1-305
-
-
Wainwright, M.J.1
Jordan, M.I.2
-
34
-
-
0000090155
-
Sequential tests of statistical hypotheses
-
MR0013275
-
WALD, A. (1945). Sequential tests of statistical hypotheses. Ann. Math. Statist. 16 117–186. MR0013275
-
(1945)
Ann. Math. Statist.
, vol.16
, pp. 117-186
-
-
Wald, A.1
-
35
-
-
0001072450
-
On the Gittins index for multiarmed bandits
-
MR1189430
-
WEBER, R. (1992). On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2 1024–1033. MR1189430
-
(1992)
Ann. Appl. Probab.
, vol.2
, pp. 1024-1033
-
-
Weber, R.1
-
36
-
-
0000248624
-
Multi-armed bandits and the Gittins index
-
MR0583348
-
WHITTLE, P. (1980). Multi-armed bandits and the Gittins index. J. R. Stat. Soc. Ser. B Stat. Methodol. 42 143–149. MR0583348
-
(1980)
J. R. Stat. Soc. Ser. B Stat. Methodol.
, vol.42
, pp. 143-149
-
-
Whittle, P.1
|