-
1
-
-
0029210635
-
Learning to act using real-time dynamic programming
-
Technical Report 91-57, Computer Science Department, University of Massachusetts
-
A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:91-138, 1995. Technical Report 91-57, Computer Science Department, University of Massachusetts, Vol. 59., 1991.
-
(1995)
Artificial Intelligence
, vol.72-59
, pp. 91-138
-
-
Barto, A.G.1
Bradtke, S.J.2
Singh, S.P.3
-
2
-
-
0011530731
-
-
Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, UK
-
Justin A. Boyan. Modular Neural Networks for Learning Context-Dependent Game Strategies. Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, UK, August 1992.
-
(1992)
Modular Neural Networks for Learning Context-Dependent Game Strategies
-
-
Boyan, J.A.1
-
4
-
-
84947905499
-
-
PhD thesis, Zentrum für Kognitionwissenschaften, Universits Bremen, FB3 Informatik, Postfach 330 440, 28334 Bremen, Germany
-
M. Heger. Risk-sensitive decision making. PhD thesis, Zentrum für Kognitionwissenschaften, Universits Bremen, FB3 Informatik, Postfach 330 440, 28334 Bremen, Germany, 1996.
-
(1996)
Risk-Sensitive Decision Making
-
-
Heger, M.1
-
5
-
-
0025400088
-
Real-time heuristic search
-
R. E. Korf. Real-time heuristic search. Artificial Intelligence, 42:189-211, 1990.
-
(1990)
Artificial Intelligence
, vol.42
, pp. 189-211
-
-
Korf, R.E.1
-
6
-
-
0001961616
-
A Generalized Reinforcement Learning Model: Convergence and applications
-
M. L. Littman and Cs. Szepesvs A Generalized Reinforcement Learning Model: convergence and applications. In Int. Conf. on Machine Learning, 1996. http://iserv.iki.kfki.hu/asl-publs.html.
-
(1996)
Int. Conf. on Machine Learning
-
-
Littman, M.L.1
Szepesvs, C.S.2
-
7
-
-
0000433333
-
Using the TD(),) algorithm to learn an evaluation function for the game of Go
-
Morgan Kaufmann, San Mateo, CA
-
Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski. Using the TD() algorithm to learn an evaluation function for the game of Go. In Advances in Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA, 1994.
-
(1994)
Advances in Neural Information Processing Systems 6
-
-
Schraudolph, N.N.1
Dayan, P.2
Sejnowski, T.J.3
-
9
-
-
0000537133
-
A two-sample test for a linear hypothesis whose power is independent of variance
-
C. Stein. A two-sample test for a linear hypothesis whose power is independent of variance. Ann. Math. Statist., 16, 1945.
-
(1945)
Ann. Math. Statist.
, pp. 16
-
-
Stein, C.1
-
10
-
-
84947946047
-
-
Technical Report 96-101, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1., HUNGARY
-
Cs. Szepesvári. Certainty equivalence policies are self-optimizing under minimax optimality. Technical Report 96-101, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1., HUNGARY, August 1996. URL: http://www.inf.u-szeged.hu/~rgai.
-
(1996)
Certainty Equivalence Policies are Self-Optimizing under Minimax Optimality
-
-
Szepesvári, C.S.1
-
11
-
-
84947921159
-
-
Technical Report 96-100, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1, HUNGARY, August
-
Cs. Szepesvs Some basic facts concerning minimax sequential decision problems. Technical Report 96-100, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1, HUNGARY, August 1996. URL: http://www.inf.uszeged.hu/-rgai.
-
(1996)
Some Basic Facts concerning Minimax Sequential Decision Problems
-
-
Szepesvs, C.S.1
-
12
-
-
84947946048
-
Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms
-
in preparation
-
Cs. Szepesvdri and M. Littman. Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Operations Research, 1996. in preparation.
-
(1996)
Operations Research
-
-
Szepesvdri, C.S.1
Littman, M.2
-
13
-
-
0029276036
-
Temporal difference learning and TD-Gammon
-
Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 58-67, March 1995.
-
(1995)
Communications of the ACM
, pp. 58-67
-
-
Tesauro, G.1
|