-
6
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
P. Auer. Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3:397-422, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
9
-
-
0042496192
-
Gambling in a rigged casino: The adversarial multi-armed bandit problem
-
Technical Report NC-TR-98-025, Neurocolt
-
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. Gambling in a rigged casino: the adversarial multi-armed bandit problem. Technical Report NC-TR-98-025, Neurocolt, 1998.
-
(1998)
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
10
-
-
0037709910
-
The nonstochastic multiarmed bandit problem
-
P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochastic multiarmed bandit problem. SIAM Journal on Computing, 32(1):48-77, 2002.
-
(2002)
SIAM Journal on Computing
, vol.32
, Issue.1
, pp. 48-77
-
-
Auer, P.1
Cesa-Bianchi, N.2
Freund, Y.3
Schapire, R.E.4
-
13
-
-
0019519039
-
Associative search network: A reinforcement learning associative memory
-
A. G. Barto, R. S. Sutton, and P. S. Brouwer. Associative search network: a reinforcement learning associative memory. Biological Cybernetics, 40:201-211, 1981.
-
(1981)
Biological Cybernetics
, vol.40
, pp. 201-211
-
-
Barto, A.G.1
Sutton, R.S.2
Brouwer, P.S.3
-
15
-
-
84895163091
-
Goal-oriented multimedia dialogue with variable initiative
-
Z. W. Ras and A. Skowron (eds.)
-
A. W. Biermann, C. I. Guinn, M. Fulkerson, G. Keim, Z. Liang, D. Melamed, and K. Rajagopalan. Goal-oriented multimedia dialogue with variable initiative. In Foundations of Intelligent Systems, Z. W. Ras and A. Skowron (eds.), 1997.
-
(1997)
Foundations of Intelligent Systems
-
-
Biermann, A.W.1
Guinn, C.I.2
Fulkerson, M.3
Keim, G.4
Liang, Z.5
Melamed, D.6
Rajagopalan, K.7
-
17
-
-
0030145382
-
Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
-
N. Cesa-Bianchi, P. M. Long, and M. K. Warmuth. Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7(3):604-619, 1996.
-
(1996)
IEEE Transactions on Neural Networks
, vol.7
, Issue.3
, pp. 604-619
-
-
Cesa-Bianchi, N.1
Long, P.M.2
Warmuth, M.K.3
-
18
-
-
0031140246
-
How to use expert advice
-
May
-
N. Cesa-Bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire, and M. K. Warmuth. How to use expert advice. Journal of the Association for Computing Machinery, 44(3):427-485, May 1997.
-
(1997)
Journal of the Association for Computing Machinery
, vol.44
, Issue.3
, pp. 427-485
-
-
Cesa-Bianchi, N.1
Freund, Y.2
Haussler, D.3
Helmbold, D.P.4
Schapire, R.E.5
Warmuth, M.K.6
-
19
-
-
26544465671
-
Design and analysis of efficient reinforcement learning algorithms
-
Ph.D. thesis, University of Pittsburgh
-
C.-N. Fiechter. Design and Analysis of Efficient Reinforcement Learning Algorithms. Ph.D. thesis, University of Pittsburgh, 1997.
-
(1997)
-
-
Fiechter, C.-N.1
-
20
-
-
0030282940
-
Rigorous learning curve bounds from statistical mechanics
-
D. Haussler, M. Kearns, H. S. Seung, and N. Tishby. Rigorous learning curve bounds from statistical mechanics. Machine Learning, 25:195-236, 1996.
-
(1996)
Machine Learning
, vol.25
, pp. 195-236
-
-
Haussler, D.1
Kearns, M.2
Seung, H.S.3
Tishby, N.4
-
21
-
-
0034666805
-
Apple tasting
-
D.P.Helmbold, N.Littlestone, and P.M.Long. Apple tasting. Information and Computation, 161(2):85-139, 2000. Preliminary version in FOCS'92.
-
(2000)
Information and Computation
, vol.161
, Issue.2
, pp. 85-139
-
-
Helmbold, D.P.1
Littlestone, N.2
Long, P.M.3
-
22
-
-
0028442414
-
Associative reinforcement learning: A generate and test algorithm
-
L. P. Kaelbling. Associative reinforcement learning: a generate and test algorithm. Machine Learning, 15(3):299-320, 1994.
-
(1994)
Machine Learning
, vol.15
, Issue.3
, pp. 299-320
-
-
Kaelbling, L.P.1
-
23
-
-
0028442413
-
Associative reinforcement learning: Functions in k-dnf
-
L. P. Kaelbling. Associative reinforcement learning: functions in k-dnf. Machine Learning, 15(3):279-298, 1994.
-
(1994)
Machine Learning
, vol.15
, Issue.3
, pp. 279-298
-
-
Kaelbling, L.P.1
-
24
-
-
0023545078
-
On the learnability of Boolean formulae
-
M. Kearns, M. Li, L. Pitt, and L. G. Valiant. On the learnability of Boolean formulae. Proceedings of the 19th Annual Symposium on the Theory of Computation, pages 285-295, 1987.
-
(1987)
Proceedings of the 19th Annual Symposium on the Theory of Computation
, pp. 285-295
-
-
Kearns, M.1
Li, M.2
Pitt, L.3
Valiant, L.G.4
-
26
-
-
34250091945
-
Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
-
N. Littlestone. Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Machine Learning, 2:285-318, 1988.
-
(1988)
Machine Learning
, vol.2
, pp. 285-318
-
-
Littlestone, N.1
-
30
-
-
0002278965
-
Adaptive switching circuits
-
1960 IRE WESCON Conv. Record
-
B. Widrow and M. E. Hoff. Adaptive switching circuits. 1960 IRE WESCON Conv. Record, pages 96-104, 1960.
-
(1960)
, pp. 96-104
-
-
Widrow, B.1
Hoff, M.E.2
-
31
-
-
0000337576
-
Simple statistical gradient-following algorithms for connectionist reinforcement learning
-
R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
|