SCOPUS 정보 검색 플랫폼

Volumn 2111, Issue , 2001, Pages 605-615

Optimizing average reward using discounted rewards

Author keywords

[No Author keywords available]

Indexed keywords

DYNAMIC PROGRAMMING; MARKOV PROCESSES; REINFORCEMENT LEARNING;

APPROXIMATE HESSIANS; AVERAGE REWARD; BELLMAN EQUATIONS; BIASED ESTIMATES; DISCOUNT FACTORS; DISCOUNTED REWARD; MIXING TIME; POLICY GRADIENT;

COMPUTATION THEORY;

EID: 84943252297 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/3-540-44581-1_40 Document Type: Conference Paper

Times cited : (34)

References (10)

1
- 84943272903
- Technical report, Australian National University
- P. Bartlett and J. Baxter. Estimation and approximation bounds for gradientbased reinforcement learning. Technical report, Australian National University, 2000.
- (2000) Estimation and Approximation Bounds for Gradientbased Reinforcement Learning
- Bartlett, P.¹ Baxter, J.²

2
- 0004142943
- Technical report, Australian National University, Research School of Information Sciences and Engineering
- J. Baxter and P. Bartlett. Direct gradient-based reinforcement learning. Technical report, Australian National University, Research School of Information Sciences and Engineering, July 1999.
- (1999) Direct Gradient-Based Reinforcement Learning
- Baxter, J.¹ Bartlett, P.²

4
- 0003565783
- Athena Scientific
- D. P. Bertsekas. Dynamic Programming and Optimal Control, Volumes 1 and 2. Athena Scientific, 1995.
- (1995) Dynamic Programming and Optimal Control , vol.1
- Bertsekas, D.P.¹

5
- 0009011171
- Technical report, Massachusetts Institute of Technology
- P. Marbach and J. Tsitsiklis. Simulation-based optimization of markov reward processes. Technical report, Massachusetts Institute of Technology, 1998.
- (1998) Simulation-Based Optimization of Markov Reward Processes
- Marbach, P.¹ Tsitsiklis, J.²

6
- 2142812536
- Learning without state-estimation in partially observable markovian decision processes
- S. Singh, T. Jaakkola, and M. I. Jordan. Learning without state-estimation in partially observable markovian decision processes. Proc.11th International Conference on Machine Learning, 1994.
- (1994) Proc.11Th International Conference on Machine Learning
- Singh, S.¹ Jaakkola, T.² Jordan, M.I.³

7
- 84898939480
- Policy gradient methods for reinforcement learningwith function approximation
- R. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learningwith function approximation. Neural Information Processing Systems, 13, 2000.
- (2000) Neural Information Processing Systems , pp. 13
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

8
- 0004102479
- MIT Press
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

9
- 0033221519
- Average cost temporal-difference learning
- John N. Tsitsiklis and Benjamin Van Roy. Average cost temporal-difference learning. Automatica, 35:319–349, 1999.
- (1999) Automatica , vol.35 , pp. 319-349
- Tsitsiklis, J.N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.