SCOPUS 정보 검색 플랫폼

Journal of Optimization Theory and Applications

Volumn 105, Issue 3, 2000, Pages 589-608

On the existence of fixed points for approximate value iteration and temporal-difference learning

(2) De Farias, D P a Van Roy, B a

a Stanford University (United States)

Author keywords

Dynamic programming; Neurodynamic programming; Reinforcement learning; Temporal difference learning; Value iteration

Indexed keywords

DYNAMIC PROGRAMMING; ITERATIVE METHODS; ORDINARY DIFFERENTIAL EQUATIONS;

CURSE OF DIMENSIONALITY; DYNAMIC PROGRAMS; FIXED POINTS; NEURO-DYNAMIC PROGRAMMING; REINFORCEMENT LEARNINGS; SIMPLE ALGORITHM; SIMPLER ALGORITHMS; TEMPORAL DIFFERENCE LEARNING; VALUE ITERATION; VALUE ITERATION ALGORITHM;

REINFORCEMENT LEARNING;

EID: 0034342516 PISSN: 00223239 EISSN: None Source Type: Journal
DOI: 10.1023/A:1004641123405 Document Type: Article

Times cited : (67)

References (11)

1
- 84968519017
- Functional approximations and dynamic programming
- BELLMAN, R., and DREYFUS, S., Functional Approximations and Dynamic Programming, Mathematical Tables and Other Aids to Computation, Vol. 13, pp. 247-251, 1959.
- (1959) Mathematical Tables and Other Aids to Computation , vol.13 , pp. 247-251
- Bellman, R.¹ Dreyfus, S.²

2
- 33847202724
- Learning to predict by the method of temporal differences
- SUTTON, R. S., Learning to Predict by the Method of Temporal Differences, Machine Learning, Vol. 3, pp. 9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

3
- 0003786198
- Preprint
- GURVITS, L., LIN, L. J., and HANSON, S. J., Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems, Preprint, 1994.
- (1994) Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems
- Gurvits, L.¹ Lin, L.J.² Hanson, S.J.³

4
- 0003276733
- Mean-field analysis for batched TD(λ)
- PINEDA, F., Mean-Field Analysis for Batched TD(λ), Neural Computation, Vol. 9, pp. 1403-1419, 1997.
- (1997) Neural Computation , vol.9 , pp. 1403-1419
- Pineda, F.¹

5
- 0031143730
- An analysis of temporal-difference learning with function approximation
- TSITSIKLIS, J. N., and VAN ROY, B., An Analysis of Temporal-Difference Learning with Function Approximation. IEEE Transactions on Automatic Control, Vol. 42, pp. 674-690, 1997.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

6
- 0000430514
- The Convergence of TD(λ) for General λ
- DAYAN, P. D., The Convergence of TD(λ) for General λ, Machine Learning, Vol. 8, pp. 341-362, 1992.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.D.¹

7
- 0003487482
- Athena Scientific, Belmont, Massachusetts
- BERTSEKAS, D. P., and TSITSIKLIS, J. N., Neurodynamic Programming, Athena Scientific, Belmont, Massachusetts, 1995.
- (1995) Neurodynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

8
- 0003787427
- PhD Dissertation, MIT
- VAN ROY, B., Learning and Value Function Approximation in Complex Decision Processes, PhD Dissertation, MIT, 1998.
- (1998) Learning and Value Function Approximation in Complex Decision Processes
- Van Roy, B.¹

9
- 0004169893
- Kluwer Academic Publishers, Boston, Massachusetts
- GALLAGER, R. G., Discrete Stochastic Processes, Kluwer Academic Publishers, Boston, Massachusetts, 1996.
- (1996) Discrete Stochastic Processes
- Gallager, R.G.¹

10
- 0031388983
- A neurodynamic programming approach to retailer inventory management
- VAN ROY, B., BERTSEKAS, D. P., LEE, Y., and TSITSIKLIS, J. N., A Neurodynamic Programming Approach to Retailer Inventory Management, Proceedings of the IEEE Conference on Decision and Control, pp. 4052-4057, 1997.
- (1997) Proceedings of the IEEE Conference on Decision and Control , pp. 4052-4057
- Van Roy, B.¹ Bertsekas, D.P.² Lee, Y.³ Tsitsiklis, J.N.⁴

11
- 0003778897
- Springer Verlag, Berlin, Germany
- BENVENISTE, A., MÉTIVIER, M., and PRIOURET, P., Adaptive Algorithms and Stochastic Approximation, Springer Verlag, Berlin, Germany, 1990.
- (1990) Adaptive Algorithms and Stochastic Approximation
- Benveniste, A.¹ Métivier, M.² Priouret, P.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.