메뉴 건너뛰기




Volumn 1224, Issue , 1997, Pages 242-249

Learning and exploitation do not conflict under minimax optimality

Author keywords

Dynamic games; Reinforcement learning; Self optimizing systems

Indexed keywords

COST FUNCTIONS; DYNAMIC PROGRAMMING; MACHINE LEARNING;

EID: 84947910334     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/3-540-62858-4_89     Document Type: Conference Paper
Times cited : (7)

References (14)
  • 1
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Technical Report 91-57, Computer Science Department, University of Massachusetts
    • A. G. Barto, S. J. Bradtke, and S. P. Singh. Learning to act using real-time dynamic programming. Artificial Intelligence, 72:91-138, 1995. Technical Report 91-57, Computer Science Department, University of Massachusetts, Vol. 59., 1991.
    • (1995) Artificial Intelligence , vol.72-59 , pp. 91-138
    • Barto, A.G.1    Bradtke, S.J.2    Singh, S.P.3
  • 2
    • 0011530731 scopus 로고
    • Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, UK
    • Justin A. Boyan. Modular Neural Networks for Learning Context-Dependent Game Strategies. Master's thesis, Department of Engineering and Computer Laboratory, University of Cambridge, Cambridge, UK, August 1992.
    • (1992) Modular Neural Networks for Learning Context-Dependent Game Strategies
    • Boyan, J.A.1
  • 4
    • 84947905499 scopus 로고    scopus 로고
    • PhD thesis, Zentrum für Kognitionwissenschaften, Universits Bremen, FB3 Informatik, Postfach 330 440, 28334 Bremen, Germany
    • M. Heger. Risk-sensitive decision making. PhD thesis, Zentrum für Kognitionwissenschaften, Universits Bremen, FB3 Informatik, Postfach 330 440, 28334 Bremen, Germany, 1996.
    • (1996) Risk-Sensitive Decision Making
    • Heger, M.1
  • 5
    • 0025400088 scopus 로고
    • Real-time heuristic search
    • R. E. Korf. Real-time heuristic search. Artificial Intelligence, 42:189-211, 1990.
    • (1990) Artificial Intelligence , vol.42 , pp. 189-211
    • Korf, R.E.1
  • 6
    • 0001961616 scopus 로고    scopus 로고
    • A Generalized Reinforcement Learning Model: Convergence and applications
    • M. L. Littman and Cs. Szepesvs A Generalized Reinforcement Learning Model: convergence and applications. In Int. Conf. on Machine Learning, 1996. http://iserv.iki.kfki.hu/asl-publs.html.
    • (1996) Int. Conf. on Machine Learning
    • Littman, M.L.1    Szepesvs, C.S.2
  • 9
    • 0000537133 scopus 로고
    • A two-sample test for a linear hypothesis whose power is independent of variance
    • C. Stein. A two-sample test for a linear hypothesis whose power is independent of variance. Ann. Math. Statist., 16, 1945.
    • (1945) Ann. Math. Statist. , pp. 16
    • Stein, C.1
  • 10
    • 84947946047 scopus 로고    scopus 로고
    • Technical Report 96-101, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1., HUNGARY
    • Cs. Szepesvári. Certainty equivalence policies are self-optimizing under minimax optimality. Technical Report 96-101, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1., HUNGARY, August 1996. URL: http://www.inf.u-szeged.hu/~rgai.
    • (1996) Certainty Equivalence Policies are Self-Optimizing under Minimax Optimality
    • Szepesvári, C.S.1
  • 11
    • 84947921159 scopus 로고    scopus 로고
    • Technical Report 96-100, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1, HUNGARY, August
    • Cs. Szepesvs Some basic facts concerning minimax sequential decision problems. Technical Report 96-100, Research Group on Artificial Intelligence, JATE-MTA, Szeged 6720, Aradi vrt tere 1, HUNGARY, August 1996. URL: http://www.inf.uszeged.hu/-rgai.
    • (1996) Some Basic Facts concerning Minimax Sequential Decision Problems
    • Szepesvs, C.S.1
  • 12
    • 84947946048 scopus 로고    scopus 로고
    • Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms
    • in preparation
    • Cs. Szepesvdri and M. Littman. Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Operations Research, 1996. in preparation.
    • (1996) Operations Research
    • Szepesvdri, C.S.1    Littman, M.2
  • 13
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Gerald Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 58-67, March 1995.
    • (1995) Communications of the ACM , pp. 58-67
    • Tesauro, G.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.