SCOPUS 정보 검색 플랫폼

Annals of Statistics

Volumn 41, Issue 3, 2013, Pages 1516-1541

Kullback–leibler upper confidence bounds for optimal sequential allocation

(5) Cappé, Olivier a,b Garivier, Aurélien a,c Maillard, Odalric Ambrym a,d Munos, Rémi a,e Stoltz, Gilles a,f,g

a UNIVERSITÉ PAUL SABATIER (France)

b CNRS (France)

c INSTITUT DE MATHÉMATIQUES DE TOULOUSE (France)

d UNIVERSITY OF LEOBEN (Austria)

e INRIA (France)

f ECOLE NORMALE SUPÉRIEURE (France)

g HEC PARIS (France)

Author keywords

Key words; Kullback Leibler divergence; Phrases. Multi armed bandit problems; Sequential testing; Upper confidence bound

Indexed keywords

EID: 84898949562 PISSN: 00905364 EISSN: None Source Type: Journal
DOI: 10.1214/13-AOS1119 Document Type: Article

Times cited : (356)

References (36)

1
- 0000616723
- Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
- MR1358906
- AGRAWAL, R. (1995). Sample mean based index policies with O(log n) regret for the multi-armed bandit problem. Adv. in Appl. Probab. 27 1054–1078. MR1358906
- (1995) Adv. in Appl. Probab. , vol.27 , pp. 1054-1078
- Agrawal, R.¹

2
- 78649420293
- Regret bounds and minimax policies under partial monitoring
- MR2738783
- AUDIBERT, J.-Y. and BUBECK, S. (2010). Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 2785–2836. MR2738783
- (2010) J. Mach. Learn. Res. , vol.11 , pp. 2785-2836
- Audibert, J.-Y.¹ Bubeck, S.²

3
- 62949181077
- Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
- MR2514714
- AUDIBERT, J.-Y., MUNOS, R. and SZEPESVÁRI, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoret. Comput. Sci. 410 1876–1902. MR2514714
- (2009) Theoret. Comput. Sci. , vol.410 , pp. 1876-1902
- Audibert, J.-Y.¹ Munos, R.² Szepesvári, C.³

4
- 0036568025
- Finite-time analysis of the multiarmed bandit problem
- AUER, P., CESA-BIANCHI, N. and FISCHER, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 235–256.
- (2002) Machine Learning , vol.47 , pp. 235-256
- Auer, P.¹ Cesa-Bianchi, N.² Fischer, P.³

5
- 84874045238
- Regret analysis of stochastic and nonstochastic multiarmed bandit problems
- BUBECK, S. and CESA-BIANCHI, N. (2012). Regret analysis of stochastic and nonstochastic multiarmed bandit problems. Foundations and Trends in Machine Learning 5 1–122.
- (2012) Foundations and Trends in Machine Learning , vol.5 , pp. 1-122
- Bubeck, S.¹ Cesa-Bianchi, N.²

6
- 0030159874
- Optimal adaptive policies for sequential allocation problems
- MR1390571
- BURNETAS, A. N. and KATEHAKIS, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Adv. in Appl. Math. 17 122–142. MR1390571
- (1996) Adv. in Appl. Math. , vol.17 , pp. 122-142
- Burnetas, A.N.¹ Katehakis, M.N.²

7
- 0031070051
- Optimal adaptive policies for Markov decision processes
- MR1436581
- BURNETAS, A. N. and KATEHAKIS, M. N. (1997). Optimal adaptive policies for Markov decision processes. Math. Oper. Res. 22 222–255. MR1436581
- (1997) Math. Oper. Res. , vol.22 , pp. 222-255
- Burnetas, A.N.¹ Katehakis, M.N.²

8
- 0037249547
- Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem
- MR1959385
- BURNETAS, A. N. and KATEHAKIS, M. N. (2003). Asymptotic Bayes analysis for the finite-horizon one-armed-bandit problem. Probab. Engrg. Inform. Sci. 17 53–82. MR1959385
- (2003) Probab. Engrg. Inform. Sci. , vol.17 , pp. 53-82
- Burnetas, A.N.¹ Katehakis, M.N.²

9
- 85055722154
- CAPPÉ, O., GARIVIER, A. and KAUFMANN, E. (2012). py/maBandits: Matlab and Python packages for multi-armed bandits. Available at http://mloss.org/software/view/415/.
- (2012) Py/maBandits: Matlab and Python Packages for Multi-Armed Bandits
- Cappé, O.¹ Garivier, A.² Kaufmann, E.³

10
- 84898949562
- CAPPÉ, O., GARIVIER, A., MAILLARD, O.-A., MUNOS, R. and STOLTZ, G. (2013). Supplement to “Kullback–Leibler upper confidence bounds for optimal sequential allocation.” DOI:10.1214/13-AOS1119SUPP.
- (2013) Supplement to “Kullback–Leibler Upper Confidence Bounds for Optimal Sequential Allocation
- Cappé, O.¹ Garivier, A.² Maillard, O.-A.³ Munos, R.⁴ Stoltz, G.⁵

11
- 0009953451
- Optimal stopping and dynamic allocation
- MR0914595
- CHANG, F. and LAI, T. L. (1987). Optimal stopping and dynamic allocation. Adv. in Appl. Probab. 19 829–853. MR0914595
- (1987) Adv. in Appl. Probab. , vol.19 , pp. 829-853
- Chang, F.¹ Lai, T.L.²

12
- 0003549661
- 2nd ed. Springer, New York. MR0953964
- CHOW, Y. S. and TEICHER, H. (1988). Probability Theory: Independence, Interchangeability, Martingales, 2nd ed. Springer, New York. MR0953964
- (1988) Probability Theory: Independence, Interchangeability, Martingales
- Chow, Y.S.¹ Teicher, H.²

13
- 0003836047
- 2nd ed. Applications of Mathematics New York Springer, New York. MR1619036
- DEMBO, A. and ZEITOUNI, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Applications of Mathematics (New York) 38. Springer, New York. MR1619036
- (1998) Large Deviations Techniques and Applications , pp. 38
- Dembo, A.¹ Zeitouni, O.²

14
- 79952428999
- Optimism in reinforcement learning and Kullback–Leibler divergence
- IEEE Press, Piscataway, NJ
- FILIPPI, S., CAPPÉ, O. and GARIVIER, A. (2010). Optimism in reinforcement learning and Kullback–Leibler divergence. In Proceedings of the 48th Annual Allerton Conference on Communication, Control, and Computing. IEEE Press, Piscataway, NJ.
- (2010) Proceedings of The 48th Annual Allerton Conference on Communication, Control, and Computing
- Filippi, S.¹ Cappé, O.² Garivier, A.³

15
- 84863920694
- The KL-UCB algorithm for bounded stochastic bandits and beyond
- JMLR C&WP
- GARIVIER, A. and CAPPÉ, O. (2011). The KL-UCB algorithm for bounded stochastic bandits and beyond. In Proceedings of the 24th Annual Conference on Learning Theory. JMLR C&WP.
- (2011) Proceedings of The 24th Annual Conference on Learning Theory
- Garivier, A.¹ Cappé, O.²

16
- 0000169010
- Bandit processes and dynamic allocation indices (with discussion)
- MR0547241
- GITTINS, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Stat. Soc. Ser. B Stat. Methodol. 41 148–177. MR0547241
- (1979) J. R. Stat. Soc. Ser. B Stat. Methodol. , vol.41 , pp. 148-177
- Gittins, J.C.¹

17
- 84891584370
- Wiley, New York
- GITTINS, J., GLAZEBROOK, K. and WEBER, R. (2011). Multi-Armed Bandit Allocation Indices. Wiley, New York.
- (2011) Multi-Armed Bandit Allocation Indices
- Gittins, J.¹ Glazebrook, K.² Weber, R.³

18
- 84947403595
- Probability inequalities for sums of bounded random variables
- MR0144363
- HOEFFDING, W. (1963). Probability inequalities for sums of bounded random variables. J. Amer. Statist. Assoc. 58 13–30. MR0144363
- (1963) J. Amer. Statist. Assoc. , vol.58 , pp. 13-30
- Hoeffding, W.¹

19
- 84898077171
- An asymptotically optimal bandit algorithm for bounded support models
- Omnipress, Madison, WI
- HONDA, J. and TAKEMURA, A. (2010). An asymptotically optimal bandit algorithm for bounded support models. In Proceedings of the 23rd Annual Conference on Learning Theory. Omnipress, Madison, WI.
- (2010) Proceedings of The 23rd Annual Conference on Learning Theory
- Honda, J.¹ Takemura, A.²

20
- 83155180573
- An asymptotically optimal policy for finite support models in the multiarmed bandit problem
- HONDA, J. and TAKEMURA, A. (2011). An asymptotically optimal policy for finite support models in the multiarmed bandit problem. Machine Learning 85 361–391.
- (2011) Machine Learning , vol.85 , pp. 361-391
- Honda, J.¹ Takemura, A.²

21
- 84887423697
- HONDA, J. and TAKEMURA, A. (2012). Finite-time regret bound of a bandit algorithm for the semi-bounded support model. Available at arXiv:1202.2277.
- (2012) Finite-Time Regret Bound of A Bandit Algorithm for The Semi-Bounded Support Model
- Honda, J.¹ Takemura, A.²

22
- 84954519509
- On Bayesian upper confidence bounds for bandit problems
- JMLR W&CP
- KAUFMANN, E., CAPPÉ, O. and GARIVIER, A. (2012). On Bayesian upper confidence bounds for bandit problems. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics 22 592–600. JMLR W&CP.
- (2012) Proceedings of The 15th International Conference on Artificial Intelligence and Statistics , vol.22 , pp. 592-600
- Kaufmann, E.¹ Cappé, O.² Garivier, A.³

23
- 84867888479
- Thompson sampling: An asymptotically optimal finite time analysis
- Springer, New York
- KAUFMANN, E., KORDA, N. and MUNOS, R. (2012). Thompson sampling: An asymptotically optimal finite time analysis. In Proceedings of the 23rd International Conference on Algorithmic Learning Theory 199–213. Springer, New York.
- (2012) Proceedings of The 23rd International Conference on Algorithmic Learning Theory , pp. 199-213
- Kaufmann, E.¹ Korda, N.² Munos, R.³

24
- 0002899547
- Asymptotically efficient adaptive allocation rules
- MR0776826
- LAI, T. L. and ROBBINS, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22. MR0776826
- (1985) Adv. in Appl. Math. , vol.6 , pp. 4-22
- Lai, T.L.¹ Robbins, H.²

25
- 0004029130
- 2nd ed. Springer, New York. MR1639875
- LEHMANN, E. L. and CASELLA, G. (1998). Theory of Point Estimation, 2nd ed. Springer, New York. MR1639875
- (1998) Theory of Point Estimation
- Lehmann, E.L.¹ Casella, G.²

26
- 84892903032
- A finite-time analysis of multi-armed bandits problems with Kullback–Leibler divergences
- JMLR C&WP
- MAILLARD, O.-A., MUNOS, R. and STOLTZ, G. (2011). A finite-time analysis of multi-armed bandits problems with Kullback–Leibler divergences. In Proceedings of the 24th Annual Conference on Learning Theory. JMLR C&WP.
- (2011) Proceedings of The 24th Annual Conference on Learning Theory
- Maillard, O.-A.¹ Munos, R.² Stoltz, G.³

27
- 34247553430
- Concentration Inequalities and Model Selection
- Springer, Berlin. MR2319879
- MASSART, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin. MR2319879
- (2007) Lecture Notes in Math , vol.1896
- Massart, P.¹

28
- 85057370061
- Chapman & Hall/CRC, Boca Raton, FL
- OWEN, A. B. (2001). Empirical Likelihood. Chapman & Hall/CRC, Boca Raton, FL.
- (2001) Empirical Likelihood
- Owen, A.B.¹

29
- 84966203785
- Some aspects of the sequential design of experiments
- MR0050246
- ROBBINS, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58 527–535. MR0050246
- (1952) Bull. Amer. Math. Soc. (N.S.) , vol.58 , pp. 527-535
- Robbins, H.¹

30
- 0001395850
- On the likelihood that one unknown probability exceeds another in view of the evidence of two samples
- THOMPSON, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 285–294.
- (1933) Biometrika , vol.25 , pp. 285-294
- Thompson, W.R.¹

31
- 78650475463
- On the Theory of Apportionment
- MR1507085
- THOMPSON, W. R. (1935). On the Theory of Apportionment. Amer. J. Math. 57 450–456. MR1507085
- (1935) Amer. J. Math. , vol.57 , pp. 450-456
- Thompson, W.R.¹

32
- 0004272666
- Cambridge Univ. Press, Cambridge
- VAN DER VAART, A. W. (2000). Asymptotic Statistics. Cambridge Univ. Press, Cambridge.
- (2000) Asymptotic Statistics
- Van Der Vaart, A.W.¹

33
- 65749118363
- Graphical models, exponential families, and variational inference
- WAINWRIGHT, M. J. and JORDAN, M. I. (2008). Graphical models, exponential families, and variational inference. Foundation and Trends in Machine Learning 1 1–305.
- (2008) Foundation and Trends in Machine Learning , vol.1 , pp. 1-305
- Wainwright, M.J.¹ Jordan, M.I.²

34
- 0000090155
- Sequential tests of statistical hypotheses
- MR0013275
- WALD, A. (1945). Sequential tests of statistical hypotheses. Ann. Math. Statist. 16 117–186. MR0013275
- (1945) Ann. Math. Statist. , vol.16 , pp. 117-186
- Wald, A.¹

35
- 0001072450
- On the Gittins index for multiarmed bandits
- MR1189430
- WEBER, R. (1992). On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2 1024–1033. MR1189430
- (1992) Ann. Appl. Probab. , vol.2 , pp. 1024-1033
- Weber, R.¹

36
- 0000248624
- Multi-armed bandits and the Gittins index
- MR0583348
- WHITTLE, P. (1980). Multi-armed bandits and the Gittins index. J. R. Stat. Soc. Ser. B Stat. Methodol. 42 143–149. MR0583348
- (1980) J. R. Stat. Soc. Ser. B Stat. Methodol. , vol.42 , pp. 143-149
- Whittle, P.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.