-
1
-
-
33747670266
-
Learning factor graphs in polynomial time and sample complexity
-
Abbeel, P., Koller, D., & Ng, A. Y. (2006). Learning factor graphs in polynomial time and sample complexity. Journal of Machine Learning Research, 7, 1743-1788.
-
(2006)
Journal of Machine Learning Research
, vol.7
, pp. 1743-1788
-
-
Abbeel, P.1
Koller, D.2
Ng, A.Y.3
-
2
-
-
0346942368
-
Decisiontheoretic planning: Structural assumptions and computational leverage
-
Boutilier, C., Dean, T., & Hanks, S. (1999). Decisiontheoretic planning: Structural assumptions and computational leverage. Journal of Artificial Intelligence Research, 11, 1-94.
-
(1999)
Journal of Artificial Intelligence Research
, vol.11
, pp. 1-94
-
-
Boutilier, C.1
Dean, T.2
Hanks, S.3
-
3
-
-
0041965975
-
R-max a general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, R. I., & Tennenholtz, M. (2002). R-max a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
4
-
-
70049084399
-
CORL: A continuous-state offsetdynamics reinforcement learner
-
Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., & Roy, N. (2008). CORL: A continuous-state offsetdynamics reinforcement learner. Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI-08).
-
(2008)
Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI-08)
-
-
Brunskill, E.1
Leffler, B.R.2
Li, L.3
Littman, M.L.4
Roy, N.5
-
5
-
-
0031140246
-
How to use expert advice
-
Cesa-Bianchi, N., Freund, Y., Haussler, D., Helmbold, D. P., Schapire, R. E., & Warmuth, M. K. (1997). How to use expert advice. Journal of the ACM, 44, 427-485.
-
(1997)
Journal of the ACM
, vol.44
, pp. 427-485
-
-
Cesa-Bianchi, N.1
Freund, Y.2
Haussler, D.3
Helmbold, D.P.4
Schapire, R.E.5
Warmuth, M.K.6
-
6
-
-
84990553353
-
A model for reasoning about persistence and causation
-
Dean, T., & Kanazawa, K. (1989). A model for reasoning about persistence and causation. Computational Intelligence, 5, 142-150.
-
(1989)
Computational Intelligence
, vol.5
, pp. 142-150
-
-
Dean, T.1
Kanazawa, K.2
-
7
-
-
4544318426
-
Efficient solution algorithms for factored MDPs
-
Guestrin, C., Koller, D., Parr, R., & Venkataraman, S. (2003). Efficient solution algorithms for factored MDPs. Journal of Artificial Intelligence Research, 19, 399-468.
-
(2003)
Journal of Artificial Intelligence Research
, vol.19
, pp. 399-468
-
-
Guestrin, C.1
Koller, D.2
Parr, R.3
Venkataraman, S.4
-
8
-
-
23244466805
-
-
Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London, UK
-
Kakade, S. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London, UK.
-
(2003)
On the sample complexity of reinforcement learning
-
-
Kakade, S.1
-
11
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209-232.
-
(2002)
Machine Learning
, vol.49
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
13
-
-
70049090614
-
-
Li, L. (2009). A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Department of Computer Science, Rutgers University, New Brunswick, NJ. Li, L., Littman, M. L., & Walsh, T. J. (2008). Knows whatit knows: A framework for self-aware learning. Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML-08) (pp. 568-575).
-
Li, L. (2009). A unifying framework for computational reinforcement learning theory. Doctoral dissertation, Department of Computer Science, Rutgers University, New Brunswick, NJ. Li, L., Littman, M. L., & Walsh, T. J. (2008). Knows whatit knows: A framework for self-aware learning. Proceedings of the Twenty-Fifth International Conference on Machine Learning (ICML-08) (pp. 568-575).
-
-
-
-
14
-
-
30044441333
-
The sample complexity of exploration in the multi-armed bandit problem
-
Mannor, S., & Tsitsiklis, J. N. (2004). The sample complexity of exploration in the multi-armed bandit problem. Journal of Machine Learning Research, 5, 623- 648.
-
(2004)
Journal of Machine Learning Research
, vol.5
, pp. 623-648
-
-
Mannor, S.1
Tsitsiklis, J.N.2
-
19
-
-
33749255382
-
PAC model-free reinforcement learning
-
Strehl, A. L., Li, L.,Wiewiora, E., Langford, J., & Littman, M. L. (2006b). PAC model-free reinforcement learning. Proceedings of the Twenty-Third International Conference on Machine Learning (ICML-06) (pp. 881-888).
-
(2006)
Proceedings of the Twenty-Third International Conference on Machine Learning (ICML-06)
, pp. 881-888
-
-
Strehl, A.L.1
Li, L.2
Wiewiora, E.3
Langford, J.4
Littman, M.L.5
-
20
-
-
0021518106
-
A theory of the learnable
-
Valiant, L. G. (1984). A theory of the learnable. Communications of the ACM, 27, 1134-1142.
-
(1984)
Communications of the ACM
, vol.27
, pp. 1134-1142
-
-
Valiant, L.G.1
-
21
-
-
0000819141
-
A learning criterion for stochastic rules
-
Yamanishi, K. (1992). A learning criterion for stochastic rules. Machine Learning, 9, 165-203.
-
(1992)
Machine Learning
, vol.9
, pp. 165-203
-
-
Yamanishi, K.1
|