-
2
-
-
0000268954
-
A counterexample to temporal differences learning
-
Bertsekas, D. P. 1995. A counterexample to temporal differences learning. Neural Computation 7: 270-279.
-
(1995)
Neural Computation
, vol.7
, pp. 270-279
-
-
Bertsekas, D.P.1
-
3
-
-
4243567726
-
Temporal differences-based policy iteration and application in neuro-dynamic programming
-
Lab. for Info. and Decision Systems Report LIDS-P-2349. Cambridge, MA: MIT
-
Bertsekas, D. P., and Ioffe, S. 1996. Temporal differences-based policy iteration and application in neuro-dynamic programming. Lab. for Info. and Decision Systems Report LIDS-P-2349. Cambridge, MA: MIT.
-
(1996)
-
-
Bertsekas, D.P.1
Ioffe, S.2
-
7
-
-
0034389611
-
Gradient convergence in gradient methods with errors
-
Bertsekas, D. P., and Tsitsiklis, J. N. 2000. Gradient convergence in gradient methods with errors. SIAM J. Optim. 10: 627-642.
-
(2000)
SIAM J. Optim.
, vol.10
, pp. 627-642
-
-
Bertsekas, D.P.1
Tsitsiklis, J.N.2
-
8
-
-
0036832950
-
Technical update: Least-squares temporal difference learning
-
Boyan, J. A. 2002. Technical update: least-squares temporal difference learning. To appear in Machine Learning, 49.
-
(2002)
Machine Learning
, vol.49
-
-
Boyan, J.A.1
-
9
-
-
0001771345
-
Linear least-squares algorithms for temporal difference learning
-
Bradtke, S. J., and Barto, A. G. 1996. Linear least-squares algorithms for temporal difference learning. Machine Learning 22: 33-57.
-
(1996)
Machine Learning
, vol.22
, pp. 33-57
-
-
Bradtke, S.J.1
Barto, A.G.2
-
10
-
-
0028388685
-
TD(λ) converges with probability 1
-
Dayan, P., and Sejnowski, T. J. 1994. TD(λ) converges with probability 1. Machine Learning 14: 295-301.
-
(1994)
Machine Learning
, vol.14
, pp. 295-301
-
-
Dayan, P.1
Sejnowski, T.J.2
-
12
-
-
0003786198
-
-
Working paper, Princeton, NJ: Siemens Corporate Research
-
Gurvits, L., Lin, L., and Hanson, S. J. 1994. Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems. Working paper, Princeton, NJ: Siemens Corporate Research.
-
(1994)
Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems
-
-
Gurvits, L.1
Lin, L.2
Hanson, S.J.3
-
14
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M. I., and Singh S. P. 1994. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6: 1185-1201.
-
(1994)
Neural Computation
, vol.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
19
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
20
-
-
0035283402
-
On the convergence of temporal-difference learning with linear function approximation
-
Tadić, V. 2001. On the convergence of temporal-difference learning with linear function approximation. Machine Learning 42: 241-267.
-
(2001)
Machine Learning
, vol.42
, pp. 241-267
-
-
Tadić, V.1
-
21
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
Tsitsiklis, J. N., and Van Roy, B. 1997. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42: 674-690.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
|