SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 1224, Issue , 1997, Pages 242-249

Learning and exploitation do not conflict under minimax optimality

(1) Szepesvári, Csaba a

a UNIVERSITY OF SZEGED (Hungary)

Author keywords

Dynamic games; Reinforcement learning; Self optimizing systems

Indexed keywords

COST FUNCTIONS; DYNAMIC PROGRAMMING; MACHINE LEARNING;

ACTION SELECTION; ASYMPTOTICALLY OPTIMAL; DETERMINISTIC GAMES; DYNAMIC GAME; OPTIMAL STRATEGIES; OPTIMALITY CRITERIA; REAL-TIME DYNAMIC PROGRAMMING; SELF-OPTIMIZING SYSTEMS;

REINFORCEMENT LEARNING;

EID: 84947910334 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/3-540-62858-4_89 Document Type: Conference Paper

Times cited : (7)

References (14)

1
- 0029210635
- Learning to act using real-time dynamic programming
- Technical Report 91-57, Computer Science Department, University of Massachusetts
- A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:91-138, 1995. Technical Report 91-57, Computer Science Department, University of Massachusetts, Vol. 59., 1991.
- (1995) Artificial Intelligence , vol.72-59 , pp. 91-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

2
- 0011530731
- Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, UK
- Justin A. Boyan. Modular Neural Networks for Learning Context-Dependent Game Strategies. Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, UK, August 1992.
- (1992) Modular Neural Networks for Learning Context-Dependent Game Strategies
- Boyan, J.A.¹

3
- 0003634432
- Springer-VerlagBerlin
- E. B. Dynkin and A. A. Yushkevich. Controlled Markov Processes. Springer-Verlag, Berlin, 1979.
- (1979) Controlled Markov Processes
- Dynkin, E.B.¹ Yushkevich, A.A.²

4
- 84947905499
- PhD thesis, Zentrum für Kognitionwissenschaften, Universits Bremen, FB3 Informatik, Postfach 330 440, 28334 Bremen, Germany
- M. Heger. Risk-sensitive decision making. PhD thesis, Zentrum für Kognitionwissenschaften, Universits Bremen, FB3 Informatik, Postfach 330 440, 28334 Bremen, Germany, 1996.
- (1996) Risk-Sensitive Decision Making
- Heger, M.¹

5
- 0025400088
- Real-time heuristic search
- R. E. Korf. Real-time heuristic search. Artificial Intelligence, 42:189-211, 1990.
- (1990) Artificial Intelligence , vol.42 , pp. 189-211
- Korf, R.E.¹

6
- 0001961616
- A Generalized Reinforcement Learning Model: Convergence and applications
- M. L. Littman and Cs. Szepesvs A Generalized Reinforcement Learning Model: convergence and applications. In Int. Conf. on Machine Learning, 1996. http://iserv.iki.kfki.hu/asl-publs.html.
- (1996) Int. Conf. on Machine Learning
- Littman, M.L.¹ Szepesvs, C.S.²

7
- 0000433333
- Using the TD(),) algorithm to learn an evaluation function for the game of Go
- Morgan Kaufmann, San Mateo, CA
- Nicol N. Schraudolph, Peter Dayan, and Terrence J. Sejnowski. Using the TD() algorithm to learn an evaluation function for the game of Go. In Advances in Neural Information Processing Systems 6, Morgan Kaufmann, San Mateo, CA, 1994.
- (1994) Advances in Neural Information Processing Systems 6
- Schraudolph, N.N.¹ Dayan, P.² Sejnowski, T.J.³

8
- 0000392613
- Stochastic games
- L. S. Shapley. Stochastic games. Proceedings of the National Academy of Sciences of the United States of America, 39:1095-1100, 1953.
- (1953) Proceedings of the National Academy of Sciences of the United States of America , vol.39 , pp. 1095-1100
- Shapley, L.S.¹

9
- 0000537133
- A two-sample test for a linear hypothesis whose power is independent of variance
- C. Stein. A two-sample test for a linear hypothesis whose power is independent of variance. Ann. Math. Statist., 16, 1945.
- (1945) Ann. Math. Statist. , pp. 16
- Stein, C.¹

10
- 84947946047
- Technical Report 96-101, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1., HUNGARY
- Cs. Szepesvári. Certainty equivalence policies are self-optimizing under minimax optimality. Technical Report 96-101, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1., HUNGARY, August 1996. URL: http://www.inf.u-szeged.hu/~rgai.
- (1996) Certainty Equivalence Policies are Self-Optimizing under Minimax Optimality
- Szepesvári, C.S.¹

11
- 84947921159
- Technical Report 96-100, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1, HUNGARY, August
- Cs. Szepesvs Some basic facts concerning minimax sequential decision problems. Technical Report 96-100, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1, HUNGARY, August 1996. URL: http://www.inf.uszeged.hu/-rgai.
- (1996) Some Basic Facts concerning Minimax Sequential Decision Problems
- Szepesvs, C.S.¹

12
- 84947946048
- Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms
- in preparation
- Cs. Szepesvdri and M. Littman. Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Operations Research, 1996. in preparation.
- (1996) Operations Research
- Szepesvdri, C.S.¹ Littman, M.²

13
- 0029276036
- Temporal difference learning and TD-Gammon
- Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 58-67, March 1995.
- (1995) Communications of the ACM , pp. 58-67
- Tesauro, G.¹

14
- 0003215153
- Learning to play the game of chess
- Sebastian Thrun. Learning to play the game of chess. In Neural Information Processing Systems '7, 1995.
- (1995) Neural Information Processing Systems '7
- Thrun, S.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.