-
2
-
-
85153940465
-
Generalization in reinforcement learning: Safely approximating the value function
-
Justin Boyan and Andrew Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems (NIPS) 7, pages 369-376, 1995.
-
(1995)
Advances in Neural Information Processing Systems (NIPS)
, vol.7
, pp. 369-376
-
-
Boyan, J.1
Moore, A.2
-
3
-
-
0041965975
-
R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning
-
Ronen I. Brafman and Moshe Tennenholtz. R-MAX - a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research, 3:213-231, 2002.
-
(2002)
Journal of Machine Learning Research
, vol.3
, pp. 213-231
-
-
Brafman, R.I.1
Tennenholtz, M.2
-
4
-
-
70049084399
-
CORL: A continuous-state offset-dynamics reinforcement learner
-
Emma Brunskill, Bethany R. Leffler, Lihong Li, Michael L. Littman, and Nicholas Roy. CORL: A continuous-state offset-dynamics reinforcement learner. In Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI), pages 53-61, 2008.
-
(2008)
Proceedings of the 24th Conference on Uncertainty in Artificial Intelligence (UAI)
, pp. 53-61
-
-
Brunskill, E.1
Leffler, B.R.2
Li, H.3
Littman, M.L.4
Roy, N.5
-
5
-
-
0003919624
-
-
Prentice Hall, ISBN 9780201808681.
-
Jeffrey B. Burl. Linear Optimal Control. Prentice Hall, 1998. ISBN 9780201808681.
-
(1998)
Linear Optimal Control
-
-
Burl, J.B.1
-
7
-
-
38249024662
-
The complexity of dynamic programming
-
Chee-Seng Chow and John N. Tsitsiklis. The complexity of dynamic programming. Journal of Complexity, 5(4):466-488, 1989.
-
(1989)
Journal of Complexity
, vol.5
, Issue.4
, pp. 466-488
-
-
Chow, C.-S.1
Tsitsiklis, J.N.2
-
8
-
-
0026206780
-
An optimal one-way multigrid algorithm for discrete-time stochastic control
-
DOI 10.1109/9.133184
-
Chee-Seng Chow and John N. Tsitsiklis. An optimal one-way multigrid algorithm for discrete-time stochastic control. IEEE Transactions on Automatic Control, 36(8):898-914, 1991. (Pubitemid 21674882)
-
(1991)
IEEE Transactions on Automatic Control
, vol.36
, Issue.8
, pp. 898-914
-
-
Chow, C.-S.1
Tsitsiklis, J.N.2
-
10
-
-
70349417489
-
Reinforcement learning benchmarks and bake-offs II
-
Workshop
-
Alain Dutech, Timothy Edmunds, Jelle Kok, Michail Lagoudakis, Michael L. Littman, Martin Riedmiller, Bryan Russell, Bruno Scherrer, Richard Sutton, Stephan Timmer, Nikos Vlassis, Adam White, and Shimon Whiteson. Reinforcement learning benchmarks and bake-offs II. In Advances in Neural Information Processing Systems (NIPS) 17 Workshop, 2005.
-
(2005)
Advances in Neural Information Processing Systems (NIPS)
, vol.17
-
-
Dutech, A.1
Edmunds, T.2
Kok, J.3
Lagoudakis, M.4
Littman, M.L.5
Riedmiller, M.6
Russell, B.7
Scherrer, B.8
Sutton, R.9
Timmer, S.10
Vlassis, N.11
White, A.12
Whiteson, S.13
-
12
-
-
0004236492
-
-
The Johns Hopkins University Press, 3rd edition, ISBN 0-801-85414-8.
-
Gene H. Golub and Charles F. Van Loan. Matrix Computations. The Johns Hopkins University Press, 3rd edition, 1996. ISBN 0-801-85414-8.
-
Matrix Computations
, vol.1996
-
-
Golub, G.H.1
Van Loan, C.F.2
-
13
-
-
0004151494
-
-
Cambridge University Press, ISBN 0-521-38632-2.
-
Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1986. ISBN 0-521-38632-2.
-
(1986)
Matrix Analysis
-
-
Horn, R.A.1
Johnson, C.R.2
-
14
-
-
70350579633
-
Prediction, expectation, and surprise: Methods, designs, and study of a deployed traffic forecasting service
-
Eric Horvitz, Johnson Apacible, Raman Sarin, and Lin Liao. Prediction, expectation, and surprise: Methods, designs, and study of a deployed traffic forecasting service. In Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI), pages 275-283, 2005.
-
(2005)
Proceedings of the 21st Conference on Uncertainty in Artificial Intelligence (UAI)
, pp. 275-283
-
-
Horvitz, E.1
Apacible, J.2
Sarin, R.3
Liao, L.4
-
17
-
-
0036832954
-
Near-optimal reinforcement learning in polynomial time
-
Michael J. Kearns and Satinder P Singh. Near-optimal reinforcement learning in polynomial time. Machine Learning, 49(2-3):209-232, 2002.
-
(2002)
Machine Learning
, vol.49
, Issue.2-3
, pp. 209-232
-
-
Kearns, M.J.1
Singh, S.P.2
-
19
-
-
84941465845
-
A lower bound for discrimination in terms of variation
-
January
-
Solomon Kullback. A lower bound for discrimination in terms of variation. IEEE Transactions on Information Theory, 13(1): 126-127, January 1967.
-
(1967)
IEEE Transactions on Information Theory
, vol.13
, Issue.1
, pp. 126-127
-
-
Kullback, S.1
-
26
-
-
84898980684
-
Autonomous helicopter flight via reinforcement learning
-
Andrew Ng, H.Jin Kim, Michael Jordan, and Shankar Sastry. Autonomous helicopter flight via reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) 16, pages 799-806, 2004.
-
(2004)
Advances in Neural Information Processing Systems (NIPS)
, vol.16
, pp. 799-806
-
-
Ng, A.1
Kim, H.J.2
Jordan, M.3
Sastry, S.4
-
27
-
-
33749251297
-
An analytic solution to discrete Bayesian reinforcement learning
-
Pascal Poupart, Nikos Vlassis, Jesse Hoey, and Kevin Regan. An analytic solution to discrete Bayesian reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning (ICML), pages 697-704, 2006.
-
(2006)
Proceedings of the 23rd International Conference on Machine Learning (ICML)
, pp. 697-704
-
-
Poupart, P.1
Vlassis, N.2
Hoey, J.3
Regan, K.4
-
29
-
-
85162058047
-
Online linear regression and its application to modelbased reinforcement learning
-
Alexander L. Střehl and Michael L. Littman. Online linear regression and its application to modelbased reinforcement learning. In Advances in Neural Information Processing Systems (NIPS) 20, pages 1417-1424, 2008.
-
(2008)
Advances in Neural Information Processing Systems (NIPS)
, vol.20
, pp. 1417-1424
-
-
Střehl, A.L.1
Littman, M.L.2
-
32
-
-
0000985504
-
TD-Gammon, a self-teaching backgammon program, achieves master-level play
-
Gerald J. Tesauro. TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Computation, 6(2):215-219, 1994.
-
(1994)
Neural Computation
, vol.6
, Issue.2
, pp. 215-219
-
-
Tesauro, G.J.1
-
33
-
-
0000011340
-
Some matrix-inequalities and metrization of matrix-space
-
John von Neumann. Some matrix-inequalities and metrization of matrix-space. Tomsk University Review, 1:286-300, 1937.
-
(1937)
Tomsk University Review
, vol.1
, pp. 286-300
-
-
Von Neumann, J.1
-
34
-
-
0004049893
-
-
PhD thesis, King's College, University of Cambridge, United Kingdom
-
Christopher J.C.H. Watkins. Learning from delayed rewards. PhD thesis, King's College, University of Cambridge, United Kingdom, 1989.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.J.C.H.1
|