-
1
-
-
0002201501
-
-
[Barto et al, 1989] Technical Report 89-95, Department of Computer and Information Science, University of Massachusetts, Amherst, Massachusetts. Also published in Learning and Computational Neuroscience: Foundations ofAdaptive Networks, Michael Gabriel and John Moore, editors. The MIT Press, Cambridge, Massachusetts, 1991
-
[Barto et al, 1989] Barto, A. G.; Sutton, R. S.; and Watkins, C. J. C. H. 1989. Learning and sequential decision making. Technical Report 89-95, Department of Computer and Information Science, University of Massachusetts, Amherst, Massachusetts. Also published in Learning and Computational Neuroscience: Foundations ofAdaptive Networks, Michael Gabriel and John Moore, editors. The MIT Press, Cambridge, Massachusetts, 1991.
-
(1989)
Learning and sequential decision making
-
-
Barto, A. G.1
Sutton, R. S.2
Watkins, C. J. C. H.3
-
3
-
-
0011530731
-
-
[Boyan, 1992] Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, England
-
[Boyan, 1992] Boyan, Justin A. 1992. Modular neural networks for learning context-dependent game strategies. Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, England.
-
(1992)
Modular neural networks for learning context-dependent game strategies
-
-
Boyan, Justin A.1
-
7
-
-
0001201756
-
Some studies in machine learning using the game of checkers
-
[Samuel, 1959] Reprinted in E. A. Feigenbaum and J. Feldman, editors, Computers and Thought, McGraw-Hill, New York 1963
-
[Samuel, 1959] Samuel, A. L. 1959. Some studies in machine learning using the game of checkers. IBM Journal of Research and Development 3:211-229. Reprinted in E. A. Feigenbaum and J. Feldman, editors, Computers and Thought, McGraw-Hill, New York 1963.
-
(1959)
IBM Journal of Research and Development
, vol.3
, pp. 211-229
-
-
Samuel, A. L.1
-
8
-
-
0000433333
-
Using the td(lambda) algorithm to learn an evaluation function for the game of go
-
[Schraudolph et al, 1994] San Mateo, CA. Morgan Kaufman. To appear
-
[Schraudolph et al, 1994] Schraudolph, Nicol N.; Dayan, Peter; and Sejnowski, Terrence J. 1994. Using the td(lambda) algorithm to learn an evaluation function for the game of go. In Advances in Neural Information Processing Systems 6, San Mateo, CA. Morgan Kaufman. To appear.
-
(1994)
Advances in Neural Information Processing Systems
, vol.6
-
-
Schraudolph, Nicol N.1
Dayan, Peter2
Sejnowski, Terrence J.3
-
9
-
-
85152626183
-
A reinforcement learning method for maximizing undiscounted rewards
-
[Schwartz, 1993] Amherst, Massachusetts. Morgan Kaufmann
-
[Schwartz, 1993] Schwartz, Anton 1993. A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, Massachusetts. Morgan Kaufmann. 298-305.
-
(1993)
Proceedings of the Tenth International Conference on Machine Learning
, pp. 298-305
-
-
Schwartz, Anton1
-
12
-
-
85152198941
-
Multi-agent reinforcement learning: independent vs. cooperative agents
-
[Tan, 1993] Amherst, Massachusetts. Morgan Kaufmann
-
[Tan, 1993] Tan, M. 1993. Multi-agent reinforcement learning: independent vs. cooperative agents. In Proceedings of the Tenth International Conference on Machine Learning, Amherst, Massachusetts. Morgan Kaufmann.
-
(1993)
Proceedings of the Tenth International Conference on Machine Learning
-
-
Tan, M.1
-
13
-
-
2542485629
-
Practical issues in temporal difference
-
[Tesauro, 1992] Moody, J. E.; Lippman, D. S.; and Hanson, S. J., editors 1992, San Mateo, CA. Morgan Kaufman
-
[Tesauro, 1992] Tesauro, G. J. 1992. Practical issues in temporal difference. In Moody, J. E.; Lippman, D. S.; and Hanson, S. J., editors 1992, Advances in Neural Information Processing Systems 4, San Mateo, CA. Morgan Kaufman. 259-266.
-
(1992)
Advances in Neural Information Processing Systems
, vol.4
, pp. 259-266
-
-
Tesauro, G. J.1
-
14
-
-
0141824325
-
Stochastic dynamic programming
-
1981], Morgan Kaufmann, Amsterdam
-
[Van Der Wal, 1981] Van Der Wal, J. 1981. Stochastic dynamic programming. In Mathematical Centre Tracts 139. Morgan Kaufmann, Amsterdam.
-
(1981)
Mathematical Centre Tracts
, vol.139
-
-
Van, Der Wal1
Van Der Wal, J.2
-
15
-
-
84884079276
-
-
[von Neumann and Morgenstern, 1947] Princeton University Press, Princeton, New Jersey
-
[von Neumann and Morgenstern, 1947] von Neumann, J. and Morgenstern, O. 1947. Theory of Games and Economic Behavior. Princeton University Press, Princeton, New Jersey.
-
(1947)
Theory of Games and Economic Behavior
-
-
von Neumann, J.1
Morgenstern, O.2
-
16
-
-
34249833101
-
Q-learning
-
[Watkins and Dayan, 1992]
-
[Watkins and Dayan, 1992] Watkins, C. J. C. H. and Dayan, P. 1992. Q-learning. Machine Learning 8(3):279-292.
-
(1992)
Machine Learning
, vol.8
, Issue.3
, pp. 279-292
-
-
Watkins, C. J. C. H.1
Dayan, P.2
-
18
-
-
0001875923
-
An adaptive communication protocol for cooperating mobile robots
-
[Yanco and Stein, 1993] Meyer, Jean-Arcady; Roitblat, H. L.; and Wilson, Stewart W., editors 1993, MIT Press/Bradford Books. 4 7 8 8 5
-
[Yanco and Stein, 1993] Yanco, Holly and Stein, Lynn Andrea 1993. An adaptive communication protocol for cooperating mobile robots. In Meyer, Jean-Arcady; Roitblat, H. L.; and Wilson, Stewart W., editors 1993, From Animals to Animats: Proceedings of the Second International Conference on the Simultion ofAdaptive Behavior. MIT Press/Bradford Books. 4 7 8 ^ 8 5 .
-
(1993)
From Animals to Animats: Proceedings of the Second International Conference on the Simultion ofAdaptive Behavior
-
-
Yanco, Holly1
Stein, Lynn Andrea2
|