-
1
-
-
0000616723
-
Sample mean based index policies with O(log n) regret for the multi-armed bandit problem
-
Rajeev Agrawal, Sample mean based index policies with O(log n) regret for the multi-armed bandit problem, Adv. in Appl. Probab., 27 (1995), 1054-1078.
-
(1995)
Adv. In Appl. Probab.
, vol.27
, pp. 1054-1078
-
-
Agrawal, R.1
-
2
-
-
84898079018
-
-
Jean-Yves Audibert and Sébastien Bubeck, Minimax policies for adversarial and stochastic bandits, Proceedings of the 22nd Annual Conference on Learning Theory (COLT2009), 2009, 217-226.
-
-
-
-
3
-
-
62949181077
-
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
-
Jean-Yves Audibert, Rémi Munos and Csaba Szepesvári, Exploration-exploitation tradeoff using variance estimates in multi-armed bandits, Theor. Comput. Sci., 410 (2009), 1876-1902.
-
(2009)
Theor. Comput. Sci.
, vol.410
, pp. 1876-1902
-
-
Audibert, J.-Y.1
Munos, R.2
Szepesvári, C.3
-
4
-
-
0036568025
-
Finite-Time Analysis of the Multi-Armed Bandit Problem
-
Peter Auer, Nicolò Cesa-Bianchi and Paul Fischer, Finite-Time Analysis of the Multi-Armed Bandit Problem, Mach. Learn., 47 (2002), 235-256.
-
(2002)
Mach. Learn.
, vol.47
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
5
-
-
0037709910
-
The Nonstochastic Multiarmed Bandit Problem
-
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund and Robert E. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM J. Comput., 32 (2002), 48-77.
-
(2002)
SIAM J. Comput.
, vol.32
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
6
-
-
33745295134
-
Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems
-
Eyal Even-Dar, Shie Mannor and Yishay Mansour, Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems, J. Mach. Learn. Res., 7 (2006), 1079-1105.
-
(2006)
J. Mach. Learn. Res.
, vol.7
, pp. 1079-1105
-
-
Even-Dar, E.1
Mannor, S.2
Mansour, Y.3
-
7
-
-
84947403595
-
Probability inequalities for sums of bounded random variables
-
Wassily Hoeffding, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58 (1963), 13-30.
-
(1963)
J. Amer. Statist. Assoc.
, vol.58
, pp. 13-30
-
-
Hoeffding, W.1
-
8
-
-
84898981061
-
-
Robert D. Kleinberg, Nearly Tight Bounds for the Continuum-Armed Bandit Problem, Advances in Neural Information Processing Systems 17, MIT Press, 2005, 697-704.
-
-
-
-
9
-
-
0002899547
-
Asymptotically Efficient Adaptive Allocation Rules
-
Tze Leung Lai and Herbert Robbins, Asymptotically Efficient Adaptive Allocation Rules, Adv. in Appl. Math., 6 (1985), 4-22.
-
(1985)
Adv. In Appl. Math.
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
10
-
-
30044441333
-
The Sample Complexity of Exploration in the Multi-Armed Bandit Problem
-
Shie Mannor and John N. Tsitsiklis, The Sample Complexity of Exploration in the Multi-Armed Bandit Problem, J. Mach. Learn. Res., 5 (2004), 623-648.
-
(2004)
J. Mach. Learn. Res.
, vol.5
, pp. 623-648
-
-
Mannor, S.1
Tsitsiklis, J.N.2
-
11
-
-
77957327017
-
-
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, MIT Press, 1998.
-
-
-
|