SCOPUS 정보 검색 플랫폼

Volumn 35, Issue 11, 1999, Pages 1799-1808

Average cost temporal-difference learning

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; BOUNDARY CONDITIONS; COMPUTER SIMULATION; CONVERGENCE OF NUMERICAL METHODS; MARKOV PROCESSES; TABLE LOOKUP;

APPROXIMATION ERROR; AVERAGE COST; NEURODYNAMIC PROGRAMMING; REINFORCEMENT LEARNING; TEMPORAL DIFFERENCE;

DYNAMIC PROGRAMMING;

EID: 0033221519 PISSN: 00051098 EISSN: None Source Type: Journal
DOI: 10.1016/S0005-1098(99)00099-0 Document Type: Article

Times cited : (144)

References (11)

1
- 0004030716
- Ph.D. Thesis, Department of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, MA.
- Abounadi, J. (1998). Stochastic Approximation for Non-expansive Maps: Application to Q-learning Algorithms. Ph.D. Thesis, Department of Electrical Engineering, Massachusetts Institute of Technology, Cambridge, MA.
- (1998) Stochastic Approximation for Non-expansive Maps: Application to Q-learning Algorithms
- Abounadi, J.¹

2
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamicprogramming. Belmont, MA: Athena Scientific.
- (1996) Neuro-dynamicprogramming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 0003778897
- Berlin: Springer
- Benveniste, A., Metivier, M., & Priouret, P. (1990). Adaptive Algorithms and Stochastic Approximations. Berlin: Springer.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

4
- 0000430514
- The convergence of TD(λ) for general λ
- Dayan, P. D. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8, 341-362.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.D.¹

5
- 0004169893
- Boston, MA
- Gallager, R. G. (1996). Discrete Stochastic Processes'. Boston, MA.
- (1996) Discrete Stochastic Processes
- Gallager, R.G.¹

6
- 0003786198
- preprint
- Gurvits, L., Lin, L. J., & Hanson, S. J. (1994). Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems, preprint.
- (1994) Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems
- Gurvits, L.¹ Lin, L.J.² Hanson, S.J.³

7
- 0029752592
- Average reward reinforcement learning: Foundations, algorithms, and empirical results
- Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22, 1-38.
- (1996) Machine Learning , vol.22 , pp. 1-38
- Mahadevan, S.¹

9
- 85031526506
- Unpublished
- Pmeda, F. (1996). Mean-field Analysis for Batched Td(λ). Unpublished.
- (1996) Mean-field Analysis for Batched Td(λ)
- Pmeda, F.¹

10
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

11
- 0031143730
- An Analysis of Temporal- Difference Learning with Function Approximation
- Tsitsiklis, J. N., & Van Roy, B. (1997). An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control, 42(5), 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.