-
1
-
-
31844444663
-
Exploration and apprenticeship learning in reinforcement learning
-
DOI 10.1145/1102351.1102352, ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
-
Abbeel, P., & Ng, A. Y. (2005). Exploration and apprenticeship learning in reinforcement learning. In Proceedings of the twenty-second international conference on machine learning (pp. 1-8). (Pubitemid 43183309)
-
(2005)
ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
, pp. 1-8
-
-
Abbeel, P.1
Ng, A.Y.2
-
2
-
-
0000710299
-
Queries and concept learning
-
Angluin, D. (1988). Queries and concept learning. Machine Learning, 2, 319-342.
-
(1988)
Machine Learning
, vol.2
, pp. 319-342
-
-
Angluin, D.1
-
4
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397-422.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
5
-
-
1942450194
-
-
Technical Report CMU-RI-TR-01-25). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA
-
Bagnell, J., Ng, A. Y., & Schneider, J. (2001). Solving uncertain Markov decision problems (Technical Report CMU-RI-TR-01-25). Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
-
(2001)
Solving Uncertain Markov Decision Problems
-
-
Bagnell, J.1
Ng, A.Y.2
Schneider, J.3
-
8
-
-
0028517062
-
Separating distribution-free and mistake-bound learning models over the Boolean domain
-
Blum, A. (1994). Separating distribution-free and mistake-bound learning models over the Boolean domain. SIAM Journal on Computing, 23, 990-1000.
-
(1994)
SIAM Journal on Computing
, vol.23
, pp. 990-1000
-
-
Blum, A.1
-
9
-
-
0346942368
-
Decision-Theoretic Planning: Structural Assumptions and Computational Leverage
-
Boutilier, C., Dean, T., & Hanks, S. (1999). Decision-theoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94. (Pubitemid 129628760)
-
(1999)
Journal of Artificial Intelligence Research
, vol.11
, pp. 1-94
-
-
Boutilier, C.1
Dean, T.2
Hanks, S.3
-
10
-
-
0041965975
-
R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
11
-
-
70049084399
-
CORL: A continuous-state offsetdynamics reinforcement learner
-
Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., & Roy, N. (2008). CORL: A continuous-state offsetdynamics reinforcement learner. In Proceedings of the twenty-fourth conference on uncertainty in artificial intelligence (UAI-08) (pp. 53-61).
-
(2008)
Proceedings of the Twenty-fourth Conference on Uncertainty in Artificial Intelligence (UAI-08)
, pp. 53-61
-
-
Brunskill, E.1
Leffler, B.R.2
Li, L.3
Littman, M.L.4
Roy, N.5
-
12
-
-
70349416596
-
Provably efficient learning with typed parametric models
-
Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., & Roy, N. (2009). Provably efficient learning with typed parametric models. Journal of Machine Learning Research, 10, 1955-1988.
-
(2009)
Journal of Machine Learning Research
, vol.10
, pp. 1955-1988
-
-
Brunskill, E.1
Leffler, B.R.2
Li, L.3
Littman, M.L.4
Roy, N.5
-
13
-
-
20544462399
-
Minimizing regret with label efficient prediction
-
DOI 10.1109/TIT.2005.847729
-
Cesa-Bianchi, N., Lugosi, G., & Stoltz, G. (2005). Minimizing regret with label efficient prediction. IEEE Transactions on Information Theory, 51, 2152-2162. (Pubitemid 40843632)
-
(2005)
IEEE Transactions on Information Theory
, vol.51
, Issue.6
, pp. 2152-2162
-
-
Cesa-Bianchi, N.1
Lugosi, G.2
Stoltz, G.3
-
14
-
-
33745738567
-
Worst-case analysis of selective sampling for linear classification
-
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006). Worst-case analysis of selective sampling for linear classification. Journal of Machine Learning Research, 7, 1205-1230. (Pubitemid 44015299)
-
(2006)
Journal of Machine Learning Research
, vol.7
, pp. 1205-1230
-
-
Cesa-Bianchi, N.1
Gentile, C.2
Zaniboni, L.3
-
17
-
-
0028424239
-
Improving generalization with active learning
-
Cohn, D. A., Atlas, L., & Ladner, R. E. (1994). Improving generalization with active learning. Machine Learning, 15, 201-221.
-
(1994)
Machine Learning
, vol.15
, pp. 201-221
-
-
Cohn, D.A.1
Atlas, L.2
Ladner, R.E.3
-
18
-
-
84990553353
-
A model for reasoning about persistence and causation
-
Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
-
(1989)
Computational Intelligence
, vol.5
, pp. 142-150
-
-
Dean, T.1
Kanazawa, K.2
-
21
-
-
2542446495
-
-
Master's thesis, Department of Computer Science, University of Waterloo, Ontario, Canada
-
Fong, P.W. L. (1995b). A quantitative study of hypothesis selection.Master's thesis, Department of Computer Science, University of Waterloo, Ontario, Canada.
-
(1995)
A Quantitative Study of Hypothesis Selection
-
-
Fong, P.W.L.1
-
22
-
-
0030643068
-
Using and combining predictors that specialize
-
Freund, Y., Schapire, R. E., Singer, Y., &Warmuth, M. K. (1997a). Using and combining predictors that specialize. In STOC'97: Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 334-343).
-
(1997)
STOC'97: Proceedings of the Twenty-ninth Annual ACM Symposium on Theory of Computing
, pp. 334-343
-
-
Freund, Y.1
Schapire, R.E.2
Singer, Y.3
Warmuth, M.K.4
-
23
-
-
0031209604
-
Selective Sampling Using the Query by Committee Algorithm
-
Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997b). Selective sampling using the query by committee algorith M. Machine Learning, 28, 133-168. (Pubitemid 127506338)
-
(1997)
Machine Learning
, vol.28
, Issue.2-3
, pp. 133-168
-
-
Freund, Y.1
Seung, H.S.2
Shamir, E.3
Tishby, N.4
-
24
-
-
24344500472
-
Generalization bounds for averaged classifiers
-
DOI 10.1214/009053604000000058
-
Freund, Y.,Mansour, Y., & Schapire, R. E. (2004). Generalization bounds for averaged classifiers. The Annals of Statistics, 32, 1698-1722. (Pubitemid 41250282)
-
(2004)
Annals of Statistics
, vol.32
, Issue.4
, pp. 1698-1722
-
-
Freund, Y.1
Mansour, Y.2
Schapire, R.E.3
-
25
-
-
0004236492
-
-
(2nd ed.). Baltimore: The Johns Hopkins University Press
-
Golub, G. H., & Van Loan, C. F. (1989). Matrix computations (2nd ed.). Baltimore: The Johns Hopkins University Press.
-
(1989)
Matrix Computations
-
-
Golub, G.H.1
Van Loan, C.F.2
-
26
-
-
0034666805
-
Apple tasting
-
Helmbold, D. P., Littlestone, N., & Long, P. M. (2000). Apple tasting. Information and Computation, 161, 85-139.
-
(2000)
Information and Computation
, vol.161
, pp. 85-139
-
-
Helmbold, D.P.1
Littlestone, N.2
Long, P.M.3
-
27
-
-
84947403595
-
Probability inequalities for sums of bounded random variables
-
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables. Journal of the American Statistical Association, 58, 13-30.
-
(1963)
Journal of the American Statistical Association
, vol.58
, pp. 13-30
-
-
Hoeffding, W.1
-
28
-
-
23244466805
-
-
Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London
-
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
-
(2003)
On the Sample Complexity of Reinforcement Learning
-
-
Kakade, S.M.1
-
31
-
-
0028460231
-
Efficient distribution-free learning of probabilistic concepts
-
DOI 10.1016/S0022-0000(05)80062-5
-
Kearns, M. J., & Schapire, R. E. (1994). Efficient distribution-free learning of probabilistic concepts. Journal of Computer and System Sciences, 48, 464-497. (Pubitemid 124013300)
-
(1994)
Journal of Computer and System Sciences
, vol.48
, Issue.3
, pp. 464-497
-
-
Kearns, M.J.1
Schapire, R.E.2
-
32
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209-232.
-
(2002)
Machine Learning
, vol.49
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
33
-
-
0001553979
-
Toward efficient agnostic learning
-
Kearns, M. J., Schapire, R. E., & Sellie, L. (1994). Toward efficient agnostic learning. Machine Learning, 17, 115-141.
-
(1994)
Machine Learning
, vol.17
, pp. 115-141
-
-
Kearns, M.J.1
Schapire, R.E.2
Sellie, L.3
-
34
-
-
0036832951
-
A sparse sampling algorithm for near-optimal planning in large Markov decision processes
-
DOI 10.1023/A:1017932429737
-
Kearns, M. J., Mansour, Y., & Ng, A. Y. (2002). A sparse sampling algorithm for near-optimal planning in large Markov decision processes. Machine Learning, 49, 193-208. (Pubitemid 34325686)
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 193-208
-
-
Kearns, M.1
Mansour, Y.2
Ng, A.Y.3
-
37
-
-
0037400054
-
An empirical study of two approaches to sequence learning for anomaly detection
-
Lane, T., & Brodley, C. E. (2003). An empirical study of two approaches to sequence learning for anomaly detection. Machine Learning, 51, 73-107.
-
(2003)
Machine Learning
, vol.51
, pp. 73-107
-
-
Lane, T.1
Brodley, C.E.2
-
40
-
-
78649496546
-
Reducing reinforcement learning to KWIK online regression
-
doi:10.1007/s10472-010-9201-2
-
Li, L., & Littman, M. L. (2010). Reducing reinforcement learning to KWIK online regression. Annals of Mathematics and Artificial Intelligence. doi:10.1007/s10472-010-9201-2.
-
(2010)
Annals of Mathematics and Artificial Intelligence
-
-
Li, L.1
Littman, M.L.2
-
42
-
-
77954641643
-
A contextual-bandit approach to personalized news article recommendation
-
Li, L., Chu, W., Langford, J., & Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the nineteenth international conference on World Wide Web (WWW-10) (pp. 661-670).
-
(2010)
Proceedings of the Nineteenth International Conference on World Wide Web (WWW-10)
, pp. 661-670
-
-
Li, L.1
Chu, W.2
Langford, J.3
Schapire, R.E.4
-
43
-
-
34250091945
-
Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm
-
Littlestone, N. (1987). Learning quickly when irrelevant attributes abound: A new linear-threshold algorith M. Machine Learning, 2, 285-318.
-
(1987)
Machine Learning
, vol.2
, pp. 285-318
-
-
Littlestone, N.1
-
45
-
-
0027684215
-
Prioritized sweeping: Reinforcement learning with less data and less real time
-
Moore, A.W., & Atkeson, C. G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
-
(1993)
Machine Learning
, vol.13
, pp. 103-130
-
-
Moore, A.W.1
Atkeson, C.G.2
-
49
-
-
0028497385
-
An upper bound on the loss from approximate optimal-value functions
-
Singh, S. P., & Yee, R. C. (1994). An upper bound on the loss from approximate optimal-value functions. Machine Learning, 16, 227.
-
(1994)
Machine Learning
, vol.16
, pp. 227
-
-
Singh, S.P.1
Yee, R.C.2
-
53
-
-
33749255382
-
PAC model-free reinforcement learning
-
Strehl, A. L., Li, L., Wiewiora, E., Langford, J., & Littman, M. L. (2006b). PAC model-free reinforcement learning. In Proceedings of the twenty-third international conference on machine learning (ICML-06).
-
(2006)
Proceedings of the Twenty-third International Conference on Machine Learning (ICML-06)
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Langford, J.4
Littman, M.L.5
-
54
-
-
33749242078
-
Experience-efficient learning in associative bandit problems
-
Strehl, A. L., Mesterharm, C., Littman, M. L., & Hirsh, H. (2006c). Experience-efficient learning in associative bandit problems. In Proceedings of the twenty-third international conference on machine learning (ICML-06).
-
(2006)
Proceedings of the Twenty-third International Conference on Machine Learning (ICML-06)
-
-
Strehl, A.L.1
Mesterharm, C.2
Littman, M.L.3
Hirsh, H.4
-
56
-
-
73549084301
-
Reinforcement learning in finite MDPs: PAC analysis
-
Strehl, A. L., Li, L., & Littman, M. L. (2009). Reinforcement learning in finite MDPs: PAC analysis. Journal of Machine Learning Research, 10, 2413-2444.
-
(2009)
Journal of Machine Learning Research
, vol.10
, pp. 2413-2444
-
-
Strehl, A.L.1
Li, L.2
Littman, M.L.3
-
59
-
-
0021518106
-
A theory of the learnable
-
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134-1142.
-
(1984)
Communications of the ACM
, vol.27
, pp. 1134-1142
-
-
Valiant, L.G.1
-
60
-
-
79958846996
-
Exploring compact reinforcement-learning representations with linear regression
-
A refined version is available as Technical Report DCS-tr-660, Department of Computer Science, Rutgers University, December, 2009
-
Walsh, T. J., Szita, I., Diuk, C., & Littman, M. L. (2009). Exploring compact reinforcement-learning representations with linear regression. In Proceedings of the twenty-fifth conference on uncertainty in artificial intelligence (UAI-09) (pp. 591-598). A refined version is available as Technical Report DCS-tr-660, Department of Computer Science, Rutgers University, December, 2009.
-
(2009)
Proceedings of the Twenty-fifth Conference on Uncertainty in Artificial Intelligence (UAI-09)
, pp. 591-598
-
-
Walsh, T.J.1
Szita, I.2
Diuk, C.3
Littman, M.L.4
-
61
-
-
49549125826
-
Maximizing classifier utility when training data is costly
-
Weiss, G. M., & Tian, Y. (2006). Maximizing classifier utility when training data is costly. SIGKDD Explorations, 8, 31-38.
-
(2006)
SIGKDD Explorations
, vol.8
, pp. 31-38
-
-
Weiss, G.M.1
Tian, Y.2
|