-
2
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learning, vol. 3, pp. 9-44, 1988.
-
(1988)
Mach. Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
4
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Mach. Learning, vol. 16, pp. 185-202, 1994.
-
(1994)
Mach. Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
5
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Comp., vol. 6, no. 6, pp. 1185-1201, 1994.
-
(1994)
Neural Comp.
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
6
-
-
0028388685
-
TD(λ) converges with probability 1
-
P. D. Dayan and T. J. Sejnowski, "TD(λ) converges with probability 1," Mach. Learning, vol. 14, pp. 295-301, 1994.
-
(1994)
Mach. Learning
, vol.14
, pp. 295-301
-
-
Dayan, P.D.1
Sejnowski, T.J.2
-
8
-
-
0000430514
-
The convergence of TD(λ) for general λ
-
P. D. Dayan, "The convergence of TD(λ) for general λ" Mach. Learning, vol. 8, pp. 341-362, 1992.
-
(1992)
Mach. Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.D.1
-
9
-
-
0013419177
-
On the worst-case analysis of temporal-difference learning algorithms
-
R. E. Schapire and M. K. Warmuth, "On the worst-case analysis of temporal-difference learning algorithms," Mach. Learning, vol. 22, pp. 95-122, 1996.
-
(1996)
Mach. Learning
, vol.22
, pp. 95-122
-
-
Schapire, R.E.1
Warmuth, M.K.2
-
10
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Mach. Learning, vol. 22, pp. 59-94, 1996.
-
(1996)
Mach. Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
12
-
-
0000224681
-
Reinforcement learning with soft state aggregation
-
G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds Cambridge, MA: MIT Press
-
S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds Cambridge, MA: MIT Press, 1995.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.I.3
-
13
-
-
85151728371
-
Residual algorithms Reinforcement learning with function approximation
-
July 9-12, Prieditis and Russell, Eds. San Francisco, CA: Morgan Kaufman
-
L. C. Baird, " Residual algorithms Reinforcement learning with function approximation," in Machine Learning: Proceedings 12th Int. Conf., July 9-12, Prieditis and Russell, Eds. San Francisco, CA: Morgan Kaufman, 1995.
-
(1995)
Machine Learning: Proceedings 12th Int. Conf.
-
-
Baird, L.C.1
-
14
-
-
0001133021
-
Generalization in reinforcement learning: Safely approximating the value function
-
MIT Press
-
J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems, vol. 7. MIT Press, 1995.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
-
-
Boyan, J.A.1
Moore, A.W.2
-
15
-
-
33746944751
-
On the virtues of linear learning and trajectory distributions
-
Carnegie Mellon Univ., Tech. Rep. CMU-CS-95-206
-
R. S. Sutton, "On the virtues of linear learning and trajectory distributions," in Proc. Wkshp. Value Function Approximation, Mach. Learning Conf., Carnegie Mellon Univ., Tech. Rep. CMU-CS-95-206, 1995.
-
(1995)
Proc. Wkshp. Value Function Approximation, Mach. Learning Conf.
-
-
Sutton, R.S.1
-
17
-
-
33746958635
-
-
private communication
-
L. Gurvits, 1996, private communication.
-
(1996)
-
-
Gurvits, L.1
-
21
-
-
33746944242
-
On the settling time of the congested GI/G/1 queue
-
G. D. Stamoulis and J. N. Tsitsiklis, "On the settling time of the congested GI/G/1 queue," Adv. Appl. Probability, vol. 22, pp. 929-956, 1990.
-
(1990)
Adv. Appl. Probability
, vol.22
, pp. 929-956
-
-
Stamoulis, G.D.1
Tsitsiklis, J.N.2
-
22
-
-
0010940692
-
On the cut-off phenomenon in some queueing systems
-
P. Konstantopoulos and F. Baccelli, "On the cut-off phenomenon in some queueing systems," J. Appl. Probability, vol. 28, pp. 683-694, 1991.
-
(1991)
J. Appl. Probability
, vol.28
, pp. 683-694
-
-
Konstantopoulos, P.1
Baccelli, F.2
-
23
-
-
0000268954
-
A counterexample to temporal-difference learning
-
D. P. Bertsekas, "A counterexample to temporal-difference learning," Neural Comp., vol. 7, pp. 270-279, 1994.
-
(1994)
Neural Comp.
, vol.7
, pp. 270-279
-
-
Bertsekas, D.P.1
-
24
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
S. P. Singh and R. S. Sutton, "Reinforcement learning with replacing eligibility traces," Mach. Learning, vol. 22, pp. 123-158, 1996.
-
(1996)
Mach. Learning
, vol.22
, pp. 123-158
-
-
Singh, S.P.1
Sutton, R.S.2
-
25
-
-
0000723997
-
Generalization in reinforcement learning: Successful examples using sparse coarse coding
-
D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press
-
R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996.
-
(1996)
Advances in Neural Information Processing Systems
, vol.8
-
-
Sutton, R.S.1
|