-
3
-
-
0000430514
-
The convergence of TD(λ) for general lambda
-
Dayan, P. (1992). The convergence of TD(λ) for general lambda. Machine Learning, 8, 341-362.
-
(1992)
Machine Learning
, vol.8
, pp. 341-362
-
-
Dayan, P.1
-
4
-
-
0028388685
-
TD(λ) converges with probability 1
-
Dayan, P., & Sejnowski, T. J. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
-
(1994)
Machine Learning
, vol.14
, pp. 295-301
-
-
Dayan, P.1
Sejnowski, T.J.2
-
6
-
-
85156203891
-
Stable fitted reinforcement learning
-
G. Tesauro, D. Touretzky, & T. Lean (Eds.), Cambridge, MA: MIT Press
-
Gordon G. J. (1996). Stable fitted reinforcement learning. In G. Tesauro, D. Touretzky, & T. Lean (Eds.), Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference (pp. 1052-1058). Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems: Proceedings of the 1996 Conference
, pp. 1052-1058
-
-
Gordon, G.J.1
-
7
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185-1201.
-
(1994)
Neural Computation
, vol.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.P.3
-
9
-
-
0346575867
-
-
August 23-26. Theoretical Physics Institute, University of Minnesota
-
Pineda, F. J. (1995, August 23-26). Generalization in TD(λ). Theoretical Physics Institute, University of Minnesota.
-
(1995)
Generalization in TD(λ)
-
-
Pineda, F.J.1
-
10
-
-
0346575866
-
Analytical mean squared error curves in temporal difference learning
-
M. Mozer, M. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
-
Singh, S. P., & Dayan, P. (1996). Analytical mean squared error curves in temporal difference learning. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems, 9. Cambridge, MA: MIT Press.
-
(1996)
Advances in Neural Information Processing Systems
, vol.9
-
-
Singh, S.P.1
Dayan, P.2
-
11
-
-
85153965130
-
Reinforcement learning with soft state aggregation
-
G. Tesauro, D. Touretzky, & T. Lean (Eds.), Cambridge, MA: MIT Press
-
Singh, S. P., Jaakkola, T., & Jordan, M. (1995). Reinforcement learning with soft state aggregation. In G. Tesauro, D. Touretzky, & T. Lean (Eds.), Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference (pp. 361-368). Cambridge, MA: MIT Press.
-
(1995)
Advances in Neural Information Processing Systems: Proceedings of the 1995 Conference
, pp. 361-368
-
-
Singh, S.P.1
Jaakkola, T.2
Jordan, M.3
-
12
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
13
-
-
0001046225
-
Practial issues in temporal difference learning
-
Tesauro, G. (1992). Practial issues in temporal difference learning. Machine Learning, 8, 257-277.
-
(1992)
Machine Learning
, vol.8
, pp. 257-277
-
-
Tesauro, G.1
-
14
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58-68.
-
(1995)
Communications of the ACM
, vol.38
, pp. 58-68
-
-
Tesauro, G.1
-
15
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16, 185-202.
-
(1994)
Machine Learning
, vol.16
, pp. 185-202
-
-
Tsitsiklis, J.N.1
-
16
-
-
0029752470
-
Feature-based methods for large scale dynamic programming
-
Tsitsiklis, J. N., & Van Roy, B. (1996a). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.
-
(1996)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
|