-
2
-
-
0344118814
-
Reinforcement learning with immediate rewards and linear hypotheses
-
N. Abe, A. W. Biermann, and P. M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37:263293, 2003.
-
(2003)
Algorithmica
, vol.37
, pp. 263293
-
-
Abe, N.1
Biermann, A.W.2
Long, P.M.3
-
3
-
-
77953287501
-
Active learning in heteroscedastic noise
-
A. Antos, V. Grover, and Cs. Szepesvári. Active learning in heteroscedastic noise. Theoretical Computer Science, 411(29-30):2712-2728, 2010.
-
(2010)
Theoretical Computer Science
, vol.411
, Issue.29-30
, pp. 2712-2728
-
-
Antos, A.1
Grover, V.2
Szepesvári, Cs.3
-
4
-
-
62949181077
-
Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
-
J.-Y. Audibert, R. Munos, and Csaba Szepesvári. Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19):1876-1902, 2009.
-
(2009)
Theoretical Computer Science
, vol.410
, Issue.19
, pp. 1876-1902
-
-
Audibert, J.-Y.1
Munos, R.2
Szepesvári, C.3
-
5
-
-
0034497786
-
Using upper confidence bounds for online learning
-
P. Auer. Using upper confidence bounds for online learning. In FOCS, pages 270-279, 2000.
-
(2000)
FOCS
, pp. 270-279
-
-
Auer, P.1
-
6
-
-
84860601617
-
Using confidence bounds for exploitation-exploration trade-offs
-
P. Auer. Using confidence bounds for exploitation-exploration trade-offs. JMLR, 2002.
-
(2002)
JMLR
-
-
Auer, P.1
-
7
-
-
0036568025
-
Finite time analysis of the multiarmed bandit problem
-
P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3):235-256, 2002.
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
8
-
-
77952027689
-
Online optimization in X-armed bandits
-
S. Bubeck, R. Munos, G. Stoltz, and Cs. Szepesvári. Online optimization in X-armed bandits. In NIPS, pages 201-208, 2008.
-
(2008)
NIPS
, pp. 201-208
-
-
Bubeck, S.1
Munos, R.2
Stoltz, G.3
Szepesvári, C.4
-
11
-
-
70349275222
-
Bandit algorithms for tree search
-
P.-A. Coquelin and R. Munos. Bandit algorithms for tree search. In UAI, 2007.
-
(2007)
UAI
-
-
Coquelin, P.-A.1
Munos, R.2
-
12
-
-
84898072179
-
Stochastic linear optimization under bandit feedback
-
Rocco Servedio and Tong Zhang, editors
-
V. Dani, T. P. Hayes, and S. M. Kakade. Stochastic linear optimization under bandit feedback. In Rocco Servedio and Tong Zhang, editors, COLT, pages 355-366, 2008.
-
(2008)
COLT
, pp. 355-366
-
-
Dani, V.1
Hayes, T.P.2
Kakade, S.M.3
-
13
-
-
4544277579
-
Self-normalized processes: Exponential inequalities, moment bounds and iterated logarithm laws
-
V. H. de la Peña, M. J. Klass, and T. L. Lai. Self-normalized processes: exponential inequalities, moment bounds and iterated logarithm laws. Annals of Probability, 32(3):1902-1933, 2004.
-
(2004)
Annals of Probability
, vol.32
, Issue.3
, pp. 1902-1933
-
-
De La Peña, V.H.1
Klass, M.J.2
Lai, T.L.3
-
15
-
-
84875634609
-
Robust selective sampling from single and multiple teachers
-
O. Dekel, C. Gentile, and K. Sridharan. Robust selective sampling from single and multiple teachers. In COLT, 2010.
-
(2010)
COLT
-
-
Dekel, O.1
Gentile, C.2
Sridharan, K.3
-
16
-
-
0002384441
-
On tail probabilities for martingales
-
D. A. Freedman. On tail probabilities for martingales. The Annals of Probability, 3(1):100-118, 1975.
-
(1975)
The Annals of Probability
, vol.3
, Issue.1
, pp. 100-118
-
-
Freedman, D.A.1
-
17
-
-
70049104217
-
On upper-confidence bound policies for non-stationary bandit problems
-
A. Garivier and E. Moulines. On upper-confidence bound policies for non-stationary bandit problems. Technical report, LTCI, 2008.
-
(2008)
Technical Report LTCI
-
-
Garivier, A.1
Moulines, E.2
-
19
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
T. L. Lai and H. Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6:4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
20
-
-
0000258837
-
Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems
-
T. L. Lai and C. Z. Wei. Least squares estimates in stochastic regression models with applications to identification and control of dynamic systems. The Annals of Statistics, 10(1):154-166, 1982.
-
(1982)
The Annals of Statistics
, vol.10
, Issue.1
, pp. 154-166
-
-
Lai, T.L.1
Wei, C.Z.2
-
21
-
-
0018190841
-
Strong consistency of least squares estimates in multiple regression
-
T. L. Lai, H. Robbins, and C. Z. Wei. Strong consistency of least squares estimates in multiple regression. Proceedings of the National Academy of Sciences, 75(7):3034-3036, 1979.
-
(1979)
Proceedings of the National Academy of Sciences
, vol.75
, Issue.7
, pp. 3034-3036
-
-
Lai, T.L.1
Robbins, H.2
Wei, C.Z.3
-
22
-
-
77954641643
-
A contextual-bandit approach to personalized news article recommendation
-
ACM
-
L. Li, W. Chu, J. Langford, and R. E. Schapire. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th International Conference on World Wide Web (WWW 2010), pages 661-670. ACM, 2010.
-
(2010)
Proceedings of the 19th International Conference on World Wide Web (WWW 2010)
, pp. 661-670
-
-
Li, L.1
Chu, W.2
Langford, J.3
Schapire, R.E.4
-
27
-
-
79958846996
-
Exploring compact reinforcement-learning representations with linear regression
-
AUAI Press
-
T. J. Walsh, I. Szita, C. Diuk, and M. L. Littman. Exploring compact reinforcement-learning representations with linear regression. In UAI, pages 591-598. AUAI Press, 2009
-
(2009)
UAI
, pp. 591-598
-
-
Walsh, T.J.1
Szita, I.2
Diuk, C.3
Littman, M.L.4
|