-
1
-
-
0344118814
-
Long. Reinforcement learning with immediate rewards and linear hypotheses
-
Naoki Abe, Alan W. Biermann, and Philip M. Long. Reinforcement learning with immediate rewards and linear hypotheses. Algorithmica, 37(4): 263-293, 2003.
-
(2003)
Algorithmica
, vol.37
, Issue.4
, pp. 263-293
-
-
Abe, N.1
Biermann, A.W.2
Philip, M.3
-
4
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Peter Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3: 397-422, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
5
-
-
0036568025
-
Finite-time analysis of the multiarmed bandit problem
-
Peter Auer, Nicolò Cesa-Bianchi, and Paul Fischer. Finite-time analysis of the multiarmed bandit problem. Machine Learning, 47(2-3): 235-256, 2002.
-
(2002)
Machine Learning
, vol.47
, Issue.2-3
, pp. 235-256
-
-
Auer, P.1
Cesa-Bianchi, N.2
Fischer, P.3
-
6
-
-
0037709910
-
The nonstochastic multiarmed bandit problem
-
Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1): 48-77, 2002.
-
(2002)
SIAM Journal on Computing
, vol.32
, Issue.1
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
9
-
-
0028442413
-
Associative reinforcement learning: Functions in k-DNF
-
Leslie Pack Kaelbling. Associative reinforcement learning: Functions in k-DNF. Machine Learning, 15(3):279-298, 1994.
-
(1994)
Machine Learning
, vol.15
, Issue.3
, pp. 279-298
-
-
Kaelbling, L.P.1
-
11
-
-
0002899547
-
Asymptotically efficient adaptive allocation rules
-
Tze Leung Lai and Herbert Robbins. Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1): 4-22, 1985.
-
(1985)
Advances in Applied Mathematics
, vol.6
, Issue.1
, pp. 4-22
-
-
Lai, T.L.1
Robbins, H.2
-
15
-
-
0001035413
-
On the method of bounded differences
-
In J. Siemons, editor, Cambridge University Press
-
Colin McDiarmid. On the method of bounded differences. In J. Siemons, editor, Surveys in Combinatorics, volume 141 of London Mathematical Society Lecture Notes, pages 148-188. Cambridge University Press, 1989.
-
(1989)
Surveys in Combinatorics, Volume 141 of London Mathematical Society Lecture Notes
, pp. 148-188
-
-
McDiarmid, C.1
-
16
-
-
78651337589
-
Online learning for recency search ranking using real-time user feedback
-
Taesup Moon, Lihong Li, Wei Chu, Ciya Liao, Zhaohui Zheng, and Yi Chang. Online learning for recency search ranking using real-time user feedback. In Proceedings of the Nineteenth International Conference on Knowledge Management, 2010.
-
(2010)
Proceedings of the Nineteenth International Conference on Knowledge Management
-
-
Moon, T.1
Li, H.2
Chu, W.3
Liao, C.4
Zheng, Z.5
Chang, Y.6
-
19
-
-
34250750797
-
Experience-efficient learning in associative bandit problems
-
Alexander L. Strehl, Chris Mesterharm, Michael L. Littman, and Haym Hirsh. Experience-efficient learning in associative bandit problems. In Proceedings of the Twenty-Third International Conference on Machine Learning, pages 889-896, 2006.
-
(2006)
Proceedings of the Twenty-Third International Conference on Machine Learning
, pp. 889-896
-
-
Alexander, L.1
Strehl, C.M.2
Littman, M.L.3
Hirsh, H.4
-
21
-
-
15844389867
-
Bandit problems with side observations
-
Chih-Chun Wang, Sanjeev R. Kulkarni, and H. Vincent Poor. Bandit problems with side observations. IEEE Transactions on Automatic Control, 50(3):338-355, 2005.
-
(2005)
IEEE Transactions on Automatic Control
, vol.50
, Issue.3
, pp. 338-355
-
-
Wang, C.-C.1
Kulkarni, S.R.2
Vincent Poor, H.3
-
22
-
-
0001631327
-
A one-armed bandit problem with a concomitant variable
-
Michael Woodroofe. A one-armed bandit problem with a concomitant variable. Journal of the American Statistics Association, 74(368):799-806, 1979.
-
(1979)
Journal of the American Statistics Association
, vol.74
, Issue.368
, pp. 799-806
-
-
Woodroofe, M.1
|