메뉴 건너뛰기




Volumn 25, Issue 1, 1996, Pages 5-22

Exploration bonuses and dual control

Author keywords

Certainty equivalence; Dynamic programming; Exploration bonuses; Non stationary environment; Reinforcement learning

Indexed keywords

ADAPTIVE CONTROL SYSTEMS; DYNAMIC PROGRAMMING; MATHEMATICAL MODELS; OPTIMAL CONTROL SYSTEMS; PARAMETER ESTIMATION; PROBABILITY; STATISTICAL METHODS;

EID: 0030260201     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1007/bf00115298     Document Type: Article
Times cited : (91)

References (4)
  • 2
    • 0002679852 scopus 로고
    • A survey of algorithmic methods for partially observed Markov decision processes
    • Lovejoy, W.S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28, 47-66.
    • (1991) Annals of Operations Research , vol.28 , pp. 47-66
    • Lovejoy, W.S.1
  • 3
    • 0019909899 scopus 로고
    • A survey of partially observable Markov decision processes: Theory, models and algorithms
    • Monahan, G.E. (1982). A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Science, 28, 1-16.
    • (1982) Management Science , vol.28 , pp. 1-16
    • Monahan, G.E.1
  • 4
    • 0027684215 scopus 로고
    • Prioritized sweeping: Reinforcement learning with less data and less real time
    • Moore, A.W. & Atkeson, C.G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
    • (1993) Machine Learning , vol.13 , pp. 103-130
    • Moore, A.W.1    Atkeson, C.G.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.