-
1
-
-
31844444663
-
Exploration and apprenticeship learning in reinforcement learning
-
DOI 10.1145/1102351.1102352, ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
-
Abbeel, P., & Ng, A. Y. (2005). Exploration and apprenticeship learning in reinforcement learning. ICML '05: Proceedings of the 22nd international conference on Machine learning (pp. 1-8). New York, NY, USA: ACM Press. (Pubitemid 43183309)
-
(2005)
ICML 2005 - Proceedings of the 22nd International Conference on Machine Learning
, pp. 1-8
-
-
Abbeel, P.1
Ng, A.Y.2
-
2
-
-
0041966002
-
Using confidence bounds for exploitation-exploration trade-offs
-
Auer, P. (2002). Using confidence bounds for exploitation-exploration trade-offs. Journal of Machine Learning Research, 3, 397-422.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 397-422
-
-
Auer, P.1
-
3
-
-
0041965975
-
R-MAX-A general polynomial time algorithm for near-optimal reinforcement learning
-
Brafman, R. I., & Tennenholtz, M. (2002). R-MAX-a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3, 213-231.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
4
-
-
0026206780
-
An optimal one-way multigrid algorithmfor discrete time stochastic control
-
Chow, C.-S., & Tsitsiklis, J. N. (1991). An optimal one-way multigrid algorithmfor discrete time stochastic control. IEEE Transactions on Automatic Control, 36, 898-914.
-
(1991)
IEEE Transactions on Automatic Control
, vol.36
, pp. 898-914
-
-
Chow, C.-S.1
Tsitsiklis, J.N.2
-
6
-
-
0004236492
-
-
Baltimore, Maryland: The Johns Hopkins University Press. 3rd edition
-
Golub, G. H., & Van Loan, C. F. (1996). Matrix computations. Baltimore, Maryland: The Johns Hopkins University Press. 3rd edition.
-
(1996)
Matrix Computations
-
-
Golub, G.H.1
Van Loan, C.F.2
-
7
-
-
23244466805
-
-
Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London
-
Kakade, S. M. (2003). On the sample complexity of reinforcement learning. Doctoral dissertation, Gatsby Computational Neuroscience Unit, University College London.
-
(2003)
On the Sample Complexity of Reinforcement Learning
-
-
Kakade, S.M.1
-
10
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Kearns, M. J., & Singh, S. P. (2002). Near-optimal reinforcement learning in polynomial time. Machine Learning, 49, 209-232.
-
(2002)
Machine Learning
, vol.49
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
11
-
-
3042583887
-
Autonomous helicopter flight via reinforcement learning
-
Ng, A. Y., Kim, H. J., Jordan, M. I., & Sastry, S. (2003). Autonomous helicopter flight via reinforcement learning. Advances in Neural Information Processing Systems 16 (NIPS-03).
-
(2003)
Advances in Neural Information Processing Systems 16 (NIPS-03)
-
-
Ng, A.Y.1
Kim, H.J.2
Jordan, M.I.3
Sastry, S.4
-
14
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
Tesauro, G. (1994). TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6, 215-219.
-
(1994)
Neural Computation
, vol.6
, pp. 215-219
-
-
Tesauro, G.1
|