메뉴 건너뛰기




Volumn , Issue , 2001, Pages 953-958

R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

EXPLORATION VS. EXPLOITATIONS; MODEL-BASED REINFORCEMENT LEARNING; OPTIMAL POLICIES; POLYNOMIAL-TIME; POLYNOMIAL-TIME ALGORITHMS; SINGLE CONTROLLERS; STOCHASTIC GAME; ZERO-SUM STOCHASTIC GAMES;

EID: 84880854156     PISSN: 10450823     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (55)

References (18)
  • 3
    • 0034247018 scopus 로고    scopus 로고
    • A near-optimal polynomial time algorithm for learning in certain classes of stochastic games
    • R. Brafman and M. Tennenholtz. A near-optimal polynomial time algorithm for learning in certain classes of stochastic games. Artificial Intelligence, 121(1-2):31-47, 2000.
    • (2000) Artificial Intelligence , vol.121 , Issue.1-2 , pp. 31-47
    • Brafman, R.1    Tennenholtz, M.2
  • 4
    • 0012251278 scopus 로고
    • On Nonterminating Stochastic Games
    • A.J. Hoffman and R.M. Karp. On Nonterminating Stochastic Games. Management Science, 12(5):359-370, 1966.
    • (1966) Management Science , vol.12 , Issue.5 , pp. 359-370
    • Hoffman, A.J.1    Karp, R.M.2
  • 10
    • 0001961616 scopus 로고    scopus 로고
    • A generalized reinforcement-learning model: Convergence and apllications
    • M. L. Littman and Csaba Szepesvri. A generalized reinforcement-learning model: Convergence and apllications. In Proc. 13th Intl. Conf. on Machine Learning, pages 310-318, 1996.
    • (1996) Proc. 13th Intl. Conf. on Machine Learning , pp. 310-318
    • Littman, M.L.1    Szepesvri, C.2
  • 11
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • M. L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proc. 11th Intl. Conf. on Machine Learning, pages 157-163, 1994.
    • (1994) Proc. 11th Intl. Conf. on Machine Learning , pp. 157-163
    • Littman, M.L.1
  • 12
    • 0038675791 scopus 로고
    • On repeated games with incomplete information played by non-bayesian players
    • N. Megiddo. On repeated games with incomplete information played by non-bayesian players. International Journal of Game Theory, 9:157-167, 1980.
    • (1980) International Journal of Game Theory , vol.9 , pp. 157-167
    • Megiddo, N.1
  • 13
    • 0041166779 scopus 로고    scopus 로고
    • Dynamic Non-Bayesian Decision-Making
    • D. Monderer and M. Tennenholtz. Dynamic Non-Bayesian Decision-Making. J. of AI Research, 7:231-248, 1997.
    • (1997) J. of AI Research , vol.7 , pp. 231-248
    • Monderer, D.1    Tennenholtz, M.2
  • 14
    • 0042996810 scopus 로고
    • Prioratized sweeping: Reinforcement learning with less data and less real time
    • A. W. Moore and C. G. Atkenson. Prioratized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 1993.
    • (1993) Machine Learning , pp. 13
    • Moore, A.W.1    Atkenson, C.G.2
  • 17
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Morgan Kaufmann
    • R. S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proc. of the 7th Intl. Conf. on Machine Learning. Morgan Kaufmann, 1990.
    • (1990) Proc. of the 7th Intl. Conf. on Machine Learning
    • Sutton, R.S.1
  • 18
    • 0032050241 scopus 로고    scopus 로고
    • Model-based average reward reinforcement learning
    • PII S0004370298000022
    • P. Tadepalli and D. Ok. Model-based average reward reinforcement learning. Artificial Intelligence, 100:177-224, 1998. (Pubitemid 128403240)
    • (1998) Artificial Intelligence , vol.100 , Issue.1-2 , pp. 177-224
    • Tadepalli, P.1    Ok, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.