-
1
-
-
0002882372
-
KnightCap: A chess progream that learns by combining TD(λ) with game-tree search
-
Baxter, J., Tridgell, A., Weaver, L. (1998). KnightCap: A chess progream that learns by combining TD(λ) with game-tree search. Proceedings of the Fifteenth International Conference on Machine Learning, pp. 28-36.
-
(1998)
Proceedings of the Fifteenth International Conference on Machine Learning
, pp. 28-36
-
-
Baxter, J.1
Tridgell, A.2
Weaver, L.3
-
3
-
-
85156187730
-
Improving elevator performance using re-inforcement learning
-
MIT Press, Cambridge, MA
-
Crites, R. H., and Barto, A. G. (1996). Improving elevator performance using re-inforcement learning. In Advances in Neural Information Processing Systems 9, pp. 1017-1023. MIT Press, Cambridge, MA.
-
(1996)
Advances in Neural Information Processing Systems
, vol.9
, pp. 1017-1023
-
-
Crites, R.H.1
Barto, A.G.2
-
5
-
-
84956885505
-
-
CRL Report 334. Communications Research Laboratory, Mc-Master University, Hamilton, Ontario
-
Nie, J., and Haykin, S. (1996). A dynamic channel assignment policy through Q-learning. CRL Report 334. Communications Research Laboratory, Mc-Master University, Hamilton, Ontario.
-
(1996)
A Dynamic Channel Assignment Policy through Q-Learning
-
-
Nie, J.1
Haykin, S.2
-
6
-
-
84899003140
-
Multi-time models for temporally abstract planning
-
MIT Press, Cambridge, MA
-
Precup, D., Sutton, R.S. (1998). Multi-time models for temporally abstract planning. Advances in Neural Information Processing Systems 11. MIT Press, Cambridge, MA.
-
(1998)
Advances in Neural Information Processing Systems
, vol.11
-
-
Precup, D.1
Sutton, R.S.2
-
7
-
-
84898972974
-
Reinforcement learning for dynamic channel allocation in cellular telephone systems
-
MIT Press, Cambridge, MA
-
Singh, S. P., and Bertsekas, D. (1997). Reinforcement learning for dynamic channel allocation in cellular telephone systems. In Advances in Neural Information Processing Systems 10, pp. 974-980. MIT Press, Cambridge, MA.
-
(1997)
Advances in Neural Information Processing Systems
, vol.10
, pp. 974-980
-
-
Singh, S.P.1
Bertsekas, D.2
-
8
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44.
-
(1988)
Machine Learning
, vol.3
, pp. 9-44
-
-
Sutton, R.S.1
-
10
-
-
0003899594
-
-
Technical Report 98-74, Department of Computer Science, University of Massachusetts
-
Sutton, R. S., Precup, D., Singh, S. (1998). Between MDPs and semi-MDPs: Learning, planning, and representing knowledge at multiple temporal scales. Technical Report 98-74, Department of Computer Science, University of Massachusetts.
-
(1998)
Between Mdps and Semi-Mdps: Learning, Planning, and Representing Knowledge at Multiple Temporal Scales
-
-
Sutton, R.S.1
Precup, D.2
Singh, S.3
-
11
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
Tesauro, G. J. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58-68.
-
(1995)
Communications of the ACM
, vol.38
, pp. 58-68
-
-
Tesauro, G.J.1
-
13
-
-
85156225449
-
High-performance job-shop scheduling with a time-delay TD(λ) network
-
MIT Press, Cambridge, MA
-
Zhang, W., and Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD(λ) network. In Advances in Neural Information Processing Systems 9, pp. 1024-1030. MIT Press, Cambridge, MA.
-
(1996)
Advances in Neural Information Processing Systems
, vol.9
, pp. 1024-1030
-
-
Zhang, W.1
Dietterich, T.G.2
|