메뉴 건너뛰기




Volumn 22, Issue 1-3, 1996, Pages 159-195

Average reward reinforcement learning: foundations, algorithms, and empirical results

Author keywords

Markov decision processes; Reinforcement learning

Indexed keywords

AUTOMATA THEORY; DYNAMIC PROGRAMMING; LEARNING ALGORITHMS; MARKOV PROCESSES; OPTIMAL CONTROL SYSTEMS; PERFORMANCE; SENSITIVITY ANALYSIS;

EID: 0029752592     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1007/BF00114727     Document Type: Article
Times cited : (356)

References (8)
  • 1
    • 27144476969 scopus 로고    scopus 로고
    • Personal Communication
    • Baird, L. Personal Communication.
    • Baird, L.1
  • 2
  • 3
    • 0029210635 scopus 로고
    • Learning to act using real-time dynamic programming
    • Barto, A., Bradtke, S. & Singh, S. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72(1):81-138.
    • (1995) Artificial Intelligence , vol.72 , Issue.1 , pp. 81-138
    • Barto, A.1    Bradtke, S.2    Singh, S.3
  • 7
    • 0043046693 scopus 로고
    • Process-oriented planning and average-reward optimality
    • Morgan Kaufmann
    • Boutilier, C. & Puterman, M., (1905) Process-oriented planning and average-reward optimality. In Proceedings of the Fourteenth JCAI, pages 1096-1103. Morgan Kaufmann.
    • (1905) Proceedings of the Fourteenth JCAI , pp. 1096-1103
    • Boutilier, C.1    Puterman, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.