-
2
-
-
85162561761
-
Improved algorithms for linear stochastic bandits
-
Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds.
-
Abbasi-Yadkori Y, Pál D, Szepesvári C (2011) Improved algorithms for linear stochastic bandits. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereira FCN, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS), (Currant Associates, Red Hook, NY), 2312-2320.
-
(2011)
Adv. Neural Inform. Processing Systems (NIPS), (Currant Associates, Red Hook, NY)
, pp. 2312-2320
-
-
Abbasi-Yadkori, Y.1
Pál, D.2
Szepesvári, C.3
-
3
-
-
84908661477
-
Online-to-confidence-set conversions and application to sparse stochastic bandits
-
Lawrence ND, Girolami MA, eds. Proc. 15th Internat. Conf. Artificial Intelligence Statist. (AISTATS)
-
Abbasi-Yadkori Y, Pal D, Szepesvári C (2012) Online-to-confidence-set conversions and application to sparse stochastic bandits. Lawrence ND, Girolami MA, eds. Proc. 15th Internat. Conf. Artificial Intelligence Statist. (AISTATS), JMLR Workshop and Conference Proceedings, Vol. 22, 1-9.
-
(2012)
JMLR Workshop and Conference Proceedings
, vol.22
, pp. 1-9
-
-
Abbasi-Yadkori, Y.1
Pal, D.2
Szepesvári, C.3
-
4
-
-
84874084136
-
Analysis of Thompson sampling for the multi-armed bandit problem
-
Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Ann. Conf. Learn. Theory
-
Agrawal S, Goyal N (2012) Analysis of Thompson sampling for the multi-armed bandit problem. Mannor S, Srebro N, Williamson RC, eds. Proc. 25th Ann. Conf. Learn. Theory, JMLR Workshop and Conference Proceedings, Vol. 23, 39.1-39.26
-
(2012)
JMLR Workshop and Conference Proceedings
, vol.23
, pp. 391-3926
-
-
Agrawal, S.1
Goyal, N.2
-
7
-
-
84874098326
-
Bandits, query learning, and the haystack dimension
-
Kakade SM, von Luxburg U, eds. Proc. 24th Annual Conf. Learn. Theory (COLT)
-
Amin K, Kearns M, Syed U (2011) Bandits, query learning, and the haystack dimension. Kakade SM, von Luxburg U, eds. Proc. 24th Annual Conf. Learn. Theory (COLT), JMLR Workshop and Conference Proceedings, Vol. 19, 87-106.
-
(2011)
JMLR Workshop and Conference Proceedings
, vol.19
, pp. 87-106
-
-
Amin, K.1
Kearns, M.2
Syed, U.3
-
8
-
-
84898079018
-
Minimax policies for adversarial and stochastic bandits
-
Audibert J-Y, Bubeck S (2009) Minimax policies for adversarial and stochastic bandits. Proc. 22th Annual Conf. Learn. Theory (COLT), (Omnipress, Madison, WI), 773-818.
-
(2009)
Proc. 22th Annual Conf. Learn. Theory (COLT), (Omnipress, Madison, WI)
, pp. 773-818
-
-
Audibert, J.-Y.1
Bubeck, S.2
-
9
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
Fischer P
-
Auer P, Cesa-Bianchi N, Fischer P (2002) Finite-time analysis of the multiarmed bandit problem. Machine Learn. 47(2):235-256.
-
(2002)
Machine Learn.
, vol.47
, Issue.2
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
-
10
-
-
84862295780
-
Contextual bandit algorithms with supervised learning guarantees
-
Proc. 14th Internat. Conf. Artificial Intelligence Statist. (AISTATS)
-
Beygelzimer A, Langford J, Li L, Reyzin L, Schapire RE (2011) Contextual bandit algorithms with supervised learning guarantees. Proc. 14th Internat. Conf. Artificial Intelligence Statist. (AISTATS), JMLR Workshop and Conference Proceedings, Vol. 15, 19-26.
-
(2011)
JMLR Workshop and Conference Proceedings
, vol.15
, pp. 19-26
-
-
Beygelzimer, A.1
Langford, J.2
Li, L.3
Reyzin, L.4
Schapire, R.E.5
-
14
-
-
84898949562
-
Kullback-Leibler upper confidence bounds for optimal sequential allocation
-
Cappé O, Garivier A, Maillard O-A, Munos R, Stoltz G (2013) Kullback-Leibler upper confidence bounds for optimal sequential allocation. Ann. Statist. 41(3):1516-1541.
-
(2013)
Ann. Statist.
, vol.41
, Issue.3
, pp. 1516-1541
-
-
Cappé, O.1
Garivier, A.2
Maillard, O.-A.3
Munos, R.4
Stoltz, G.5
-
15
-
-
85162416700
-
An empirical evaluation of Thompson sampling
-
Shawe-Taylor J, Zemel RS, Bartlett PL, Pereria F, Weinberger KQ, eds.
-
Chapelle O, Li L (2011) An empirical evaluation of Thompson sampling. Shawe-Taylor J, Zemel RS, Bartlett PL, Pereria F, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems (NIPS) (Currant Associates, Red Hook, NY), 2249-2257.
-
(2011)
Adv. Neural Inform. Processing Systems (NIPS) (Currant Associates, Red Hook, NY)
, pp. 2249-2257
-
-
Chapelle, O.1
Li, L.2
-
16
-
-
84898072179
-
Stochastic linear optimization under bandit feedback
-
Dani V, Hayes TP, Kakade SM (2008) Stochastic linear optimization under bandit feedback. Proc. 21st Annual Conf. Learn. Theory (COLT) (Omnipress, Madison, WI), 355-366.
-
(2008)
Proc. 21st Annual Conf. Learn. Theory (COLT) (Omnipress, Madison, WI)
, pp. 355-366
-
-
Dani, V.1
Hayes, T.P.2
Kakade, S.M.3
-
17
-
-
84875747480
-
Linear bandits in high dimension and recommendation systems
-
Deshpande Y, Montanari A (2012) Linear bandits in high dimension and recommendation systems. 50th Annual Allerton Conf. Communication, Control, Comput. (IEEE, Piscataway, NJ), 1750-1754.
-
(2012)
50th Annual Allerton Conf. Communication, Control, Comput. (IEEE, Piscataway, NJ)
, pp. 1750-1754
-
-
Deshpande, Y.1
Montanari, A.2
-
18
-
-
85162071043
-
Parametric bandits: The generalized linear case
-
Lafferty J, Williams C, Shawe-Taylor J, Zemel RS, Culotta A, eds.
-
Filippi S, Cappé O, Garivier A, Szepesvári C (2010) Parametric bandits: The generalized linear case. Lafferty J, Williams C, Shawe-Taylor J, Zemel RS, Culotta A, eds. Adv. Neural Inform. Processing Systems (NIPS), (Currant Associates, Red Hook, NY), 586-594.
-
(2010)
Adv. Neural Inform. Processing Systems (NIPS), (Currant Associates, Red Hook, NY)
, pp. 586-594
-
-
Filippi, S.1
Cappé, O.2
Garivier, A.3
Szepesvári, C.4
-
19
-
-
0018709825
-
A dynamic allocation index for the discounted multiarmed bandit problem
-
Gittins JC, Jones DM (1979) A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66(3):561-565.
-
(1979)
Biometrika
, vol.66
, Issue.3
, pp. 561-565
-
-
Gittins, J.C.1
Jones, D.M.2
-
22
-
-
84867888479
-
Thompson sampling: An asymptotically optimal finite time analysis
-
Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds., Algorithmic Learn. (Springer-Verlag, Berlin, Heidelberg)
-
Kauffmann E, Korda N, Munos R (2012) Thompson sampling: An asymptotically optimal finite time analysis. Bshouty NH, Stoltz G, Vayatis N, Zeugmann T, eds., Algorithmic Learn. Theory, Lecture Notes in Computer Science, Vol. 7568 (Springer-Verlag, Berlin, Heidelberg), 586-594.
-
(2012)
Theory, Lecture Notes in Computer Science
, vol.7568
, pp. 586-594
-
-
Kauffmann, E.1
Korda, N.2
Munos, R.3
-
24
-
-
84898959192
-
Thompson sampling for one-dimensional exponential family bandits
-
Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds.
-
Korda N, Kaufmann E, Munos R (2013) Thompson sampling for one-dimensional exponential family bandits. Burges CJC, Bottou L, Ghahramani Z, Weinberger KQ, eds. Adv. Neural Inform. Processing Systems, 1448-1456.
-
(2013)
Adv. Neural Inform. Processing Systems
, pp. 1448-1456
-
-
Korda, N.1
Kaufmann, E.2
Munos, R.3
-
25
-
-
0000854435
-
Adaptive treatment allocation and the multi-armed bandit problem
-
Lai TL (1987) Adaptive treatment allocation and the multi-armed bandit problem. Ann. Statist. 15(3):1091-1114.
-
(1987)
Ann. Statist.
, vol.15
, Issue.3
, pp. 1091-1114
-
-
Lai, T.L.1
-
26
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Lai TL, Robbins H (1985) Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6(1):4-22.
-
(1985)
Adv. Appl. Math.
, vol.6
, Issue.1
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
29
-
-
84900037074
-
Open problem: Regret bounds for Thompson sampling
-
Mannor S, Srebro B, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory (COLT)
-
Li L, Chapelle O (2012) Open problem: Regret bounds for Thompson sampling. Mannor S, Srebro B, Williamson RC, eds. Proc. 25th Annual Conf. Learn. Theory (COLT), JMLR Workshop and Conference Proceedings, Vol. 23, 43.1-43.3.
-
(2012)
JMLR Workshop and Conference Proceedings
, vol.23
, pp. 431-433
-
-
Li, L.1
Chapelle, O.2
-
30
-
-
84864939787
-
Optimistic Bayesian sampling in contextual-bandit problems
-
May BC, Korda N, Lee A, Leslie DS (2012) Optimistic Bayesian sampling in contextual-bandit problems. J. Machine Learn. Res. 13(1):2069-2106.
-
(2012)
J. Machine Learn. Res.
, vol.13
, Issue.1
, pp. 2069-2106
-
-
May, B.C.1
Korda, N.2
Lee, A.3
Leslie, D.S.4
-
32
-
-
84859621831
-
The knowledge gradient algorithm for a general class of online learning problems
-
Ryzhov IO, Powell WB, Frazier PI (2012) The knowledge gradient algorithm for a general class of online learning problems. Oper. Res. 60(1):180-195.
-
(2012)
Oper. Res.
, vol.60
, Issue.1
, pp. 180-195
-
-
Ryzhov, I.O.1
Powell, W.B.2
Frazier, P.I.3
-
33
-
-
0001223006
-
Computationally related problems
-
Sahni A (1974) Computationally related problems. SIAM J. Comput. 3(4):262-279.
-
(1974)
SIAM J. Comput.
, vol.3
, Issue.4
, pp. 262-279
-
-
Sahni, A.1
-
34
-
-
78650505735
-
A modern Bayesian look at the multi-armed bandit
-
Scott SL (2010) A modern Bayesian look at the multi-armed bandit. Appl. Stochastic Models Bus. Indust. 26(6):639-658.
-
(2010)
Appl. Stochastic Models Bus. Indust.
, vol.26
, Issue.6
, pp. 639-658
-
-
Scott, S.L.1
-
35
-
-
84860236413
-
Information-theoretic regret bounds for Gaussian process optimization in the bandit setting
-
Srinivas N, Krause A, Kakade SM, Seeger M (2012) Information-theoretic regret bounds for Gaussian process optimization in the bandit setting. Inform. Theory, IEEE Trans. 58(5):3250-3265.
-
(2012)
Inform. Theory, IEEE Trans.
, vol.58
, Issue.5
, pp. 3250-3265
-
-
Srinivas, N.1
Krause, A.2
Kakade, S.M.3
Seeger, M.4
-
36
-
-
84897517236
-
Stochastic simultaneous optimistic optimization
-
Dasgupta S, Mcallester D, eds. Proc. 30th Internat. Conf. Machine Learn. (ICML-13)
-
Valko M, Carpentier A, Munos R (2013) Stochastic simultaneous optimistic optimization. Dasgupta S, Mcallester D, eds. Proc. 30th Internat. Conf. Machine Learn. (ICML-13), JMLR Workshop and Conference Proceedings, Vol. 28, 19-27.
-
(2013)
JMLR Workshop and Conference Proceedings
, vol.28
, pp. 19-27
-
-
Valko, M.1
Carpentier, A.2
Munos, R.3
|