메뉴 건너뛰기




Volumn 2111, Issue , 2001, Pages 605-615

Optimizing average reward using discounted rewards

Author keywords

[No Author keywords available]

Indexed keywords

DYNAMIC PROGRAMMING; MARKOV PROCESSES; REINFORCEMENT LEARNING;

EID: 84943252297     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/3-540-44581-1_40     Document Type: Conference Paper
Times cited : (34)

References (10)
  • 2
    • 0004142943 scopus 로고    scopus 로고
    • Technical report, Australian National University, Research School of Information Sciences and Engineering
    • J. Baxter and P. Bartlett. Direct gradient-based reinforcement learning. Technical report, Australian National University, Research School of Information Sciences and Engineering, July 1999.
    • (1999) Direct Gradient-Based Reinforcement Learning
    • Baxter, J.1    Bartlett, P.2
  • 9
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • John N. Tsitsiklis and Benjamin Van Roy. Average cost temporal-difference learning. Automatica, 35:319–349, 1999.
    • (1999) Automatica , vol.35 , pp. 319-349
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 10
    • 84943317419 scopus 로고    scopus 로고
    • On average versus discounted reward temporal-difference learning
    • (forthcoming)
    • John N. Tsitsiklis and Benjamin Van Roy. On average versus discounted reward temporal-difference learning. Machine Learning, 2001. (forthcoming).
    • (2001) Machine Learning
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.