SCOPUS 정보 검색 플랫폼

Volumn 25, Issue 1, 1996, Pages 5-22

Exploration bonuses and dual control

Author keywords

Certainty equivalence; Dynamic programming; Exploration bonuses; Non stationary environment; Reinforcement learning

Indexed keywords

ADAPTIVE CONTROL SYSTEMS; DYNAMIC PROGRAMMING; MATHEMATICAL MODELS; OPTIMAL CONTROL SYSTEMS; PARAMETER ESTIMATION; PROBABILITY; STATISTICAL METHODS;

EXPLORATION BONUSES; NONSTATIONARY ENVIRONMENT; REINFORCEMENT LEARNING;

LEARNING SYSTEMS;

EID: 0030260201 PISSN: 08856125 EISSN: None Source Type: Journal
DOI: 10.1007/bf00115298 Document Type: Article

Times cited : (91)

References (4)

1
- 0029210635
- Learning to act using real-time dynamic programming
- Barto, A.G., Bradtke, S.J. & Singh, S.P. (1995). Learning to act using real-time dynamic programming. Artificial Intelligence, 72, 81-138.
- (1995) Artificial Intelligence , vol.72 , pp. 81-138
- Barto, A.G.¹ Bradtke, S.J.² Singh, S.P.³

2
- 0002679852
- A survey of algorithmic methods for partially observed Markov decision processes
- Lovejoy, W.S. (1991). A survey of algorithmic methods for partially observed Markov decision processes. Annals of Operations Research, 28, 47-66.
- (1991) Annals of Operations Research , vol.28 , pp. 47-66
- Lovejoy, W.S.¹

3
- 0019909899
- A survey of partially observable Markov decision processes: Theory, models and algorithms
- Monahan, G.E. (1982). A survey of partially observable Markov decision processes: Theory, models and algorithms. Management Science, 28, 1-16.
- (1982) Management Science , vol.28 , pp. 1-16
- Monahan, G.E.¹

4
- 0027684215
- Prioritized sweeping: Reinforcement learning with less data and less real time
- Moore, A.W. & Atkeson, C.G. (1993). Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning, 13, 103-130.
- (1993) Machine Learning , vol.13 , pp. 103-130
- Moore, A.W.¹ Atkeson, C.G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.