-
1
-
-
0027554566
-
Temporal-difference methods and Markov models
-
Barnard, E. (1993). Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics, 23(2), 357-365.
-
(1993)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.23
, Issue.2
, pp. 357-365
-
-
Barnard, E.1
-
2
-
-
2442603180
-
Monte Carlo matrix inversion and reinforcement learning
-
San Mateo, CA. Morgan Kaufmann
-
Barto, A. G. & Duff, M. (1994). Monte Carlo matrix inversion and reinforcement learning. In Advances in Neural Information Processing Systems 6, pages 687-694, San Mateo, CA. Morgan Kaufmann.
-
(1994)
Advances in Neural Information Processing Systems
, vol.6
, pp. 687-694
-
-
Barto, A.G.1
Duff, M.2
-
3
-
-
0020970738
-
Neuronlike elements that can solve difficult learning control problems
-
Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 835-846.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.13
, pp. 835-846
-
-
Barto, A.G.1
Sutton, R.S.2
Anderson, C.W.3
-
5
-
-
0000430514
-
The convergence of TD(λ) for general λ
-
Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8(3/4), 341-362.
-
(1992)
Machine Learning
, vol.8
, Issue.3-4
, pp. 341-362
-
-
Dayan, P.1
-
6
-
-
0028388685
-
TD(λ) converges with probability 1
-
Dayan, P. & Sejnowski, T. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
-
(1994)
Machine Learning
, vol.14
, pp. 295-301
-
-
Dayan, P.1
Sejnowski, T.2
-
7
-
-
80051745292
-
Rigorous learning curve bounds from statistical mechanics
-
San Mateo, CA. Morgan Kauffman
-
Haussler, D., Kearns, M., Seung, H. S., & Tishby, N. (1994). Rigorous learning curve bounds from statistical mechanics. In Proceedings of the 7th Annual ACM Workshop on Computational Learning Theory, pages 76-87, San Mateo, CA. Morgan Kauffman.
-
(1994)
Proceedings of the 7th Annual ACM Workshop on Computational Learning Theory
, pp. 76-87
-
-
Haussler, D.1
Kearns, M.2
Seung, H.S.3
Tishby, N.4
-
8
-
-
0000439891
-
On the convergence of stochastic iterative dynamic programming algorithms
-
Jaakkola, T., Jordan, M. I., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
-
(1994)
Neural Computation
, vol.6
, Issue.6
, pp. 1185-1201
-
-
Jaakkola, T.1
Jordan, M.I.2
Singh, S.3
-
9
-
-
85088329770
-
Learning curves bounds for Markov decision processes with undiscounted rewards
-
Saul, L. K. & Singh, S. (1996). Learning curves bounds for Markov decision processes with undiscounted rewards. In Proceedings of COLT.
-
(1996)
Proceedings of COLT
-
-
Saul, L.K.1
Singh, S.2
-
10
-
-
0029753630
-
Reinforcement learning with replacing eligibility traces
-
Singh, S. & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, Vol. 22, 123-158.
-
(1996)
Machine Learning
, vol.22
, pp. 123-158
-
-
Singh, S.1
Sutton, R.S.2
-
11
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
12
-
-
0028497630
-
Asynchronous stochastic approximation and Q-learning
-
Tsitsiklis, J. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185-202.
-
(1994)
Machine Learning
, vol.16
, Issue.3
, pp. 185-202
-
-
Tsitsiklis, J.1
-
13
-
-
84968491631
-
A note on the inversion of matrices by random walks
-
Wasow, W. R. (1952). A note on the inversion of matrices by random walks. Math. Tables Other Aids Comput., 6, 78-81.
-
(1952)
Math. Tables Other Aids Comput.
, vol.6
, pp. 78-81
-
-
Wasow, W.R.1
|