SCOPUS 정보 검색 플랫폼

Handbook of Learning and Approximate Dynamic Programming

Volumn , Issue , 2004, Pages 235-259

Improved temporal difference methods with linear function approximation

(3) Bertsekas, Dimitri P a Nedich, Angelia b Borkar, Vivek S c

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

b BAE SYSTEMS (United Kingdom)

c TATA INSTITUTE OF FUNDAMENTAL RESEARCH (India)

Author keywords

Argon; Convergence; Eigenvalues and eigenfunctions; Function approximation; Markov processes; Trajectory; Vectors

Indexed keywords

APPROXIMATION ALGORITHMS; ARGON; COST FUNCTIONS; DYNAMIC PROGRAMMING; EIGENVALUES AND EIGENFUNCTIONS; MARKOV PROCESSES; TRAJECTORIES; VECTORS;

CONVERGENCE; DISCOUNTED COSTS; FUNCTION APPROXIMATION; INFINITE HORIZONS; LINEAR COST FUNCTIONS; LINEAR FUNCTIONS; TEMPORAL DIFFERENCE METHODS; TEMPORAL-DIFFERENCE ALGORITHM;

ITERATIVE METHODS;

EID: 85036496976 PISSN: None EISSN: None Source Type: Book
DOI: 10.1109/9780470544785.ch9 Document Type: Chapter

Times cited : (46)

References (19)

1
- 0003778897
- Springer-Verlag, New York
- A. Benveniste, M. Metivier, and P. Priouret, Adaptive Algorithms and Stochastic Approximations, Springer-Verlag, New York, 1990.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

2
- 4243567726
- Temporal differences-based policy iteration and applications in neuro-dynamic programming
- MIT, Cambridge, MA
- D. P. Bertsekas and S. Ioffe, Temporal differences-based policy iteration and applications in neuro-dynamic programming, Lab. for Info, and Decision Systems Report LIDS-P-2349, MIT, Cambridge, MA, 1996.
- (1996) Lab. For Info, and Decision Systems Report LIDS-P-2349
- Bertsekas, D.P.¹ Ioffe, S.²

3
- 84980552700
- 2nd Edition, Athena Scientific, Belmont, MA
- D. P. Bertsekas, Dynamic Programming and Optimal Control, 2nd Edition, Athena Scientific, Belmont, MA, 2001.
- (2001) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

4
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

5
- 0034389611
- Gradient convergence in gradient methods with errors
- D. P. Bertsekas and J. N. Tsitsiklis, Gradient convergence in gradient methods with errors, SIAM Journal Optimization, vol. 10, pp. 627-642, 2000.
- (2000) SIAM Journal Optimization , vol.10 , pp. 627-642
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 0036832950
- Technical update: Least-squares temporal difference learning
- J. A. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, vol. 49, pp. 1-15,2002.
- (2002) Machine Learning , vol.49 , pp. 1-15
- Boyan, J.A.¹

7
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto, Linear least-squares algorithms for temporal difference learning, Machine Learning, vol. 22, pp. 33-57, 1996.
- (1996) Machine Learning , vol.22 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

8
- 0000430514
- The convergence of TD(A) for general A
- P. D. Dayan, The convergence of TD(A) for general A, Machine Learning, vol. 8, pp. 341-362,1992.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.D.¹

9
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- D. P. de Farias and B. Van Roy, On the existence of fixed points for approximate value iteration and temporal-difference learning, Journal of Optimization Theory and Applications, vol. 105,2000.
- (2000) Journal of Optimization Theory and Applications , vol.105
- De Farias, D.P.¹ Van Roy, B.²

10
- 0003786198
- New methods and theorems, Preprint
- L. Gurvits, L. J. Lin, and S. J. Hanson, Incremental learning of evaluation functions for absorbing Markov chains: New methods and theorems, Preprint, 1994.
- (1994) Incremental Learning of Evaluation Functions for Absorbing Markov Chains
- Gurvits, L.¹ Lin, L.J.² Hanson, S.J.³

11
- 85036579695
- The asymptotic mean squared error of temporal difference learning, Unpublished Report
- MIT, Cambridge, MA
- V. R. Konda and J. N. Tsitsiklis, The asymptotic mean squared error of temporal difference learning, Unpublished Report, Lab. for Information and Decision Systems, MIT, Cambridge, MA, 2003.
- (2003) Lab. For Information and Decision Systems
- Konda, V.R.¹ Tsitsiklis, J.N.²

12
- 0042758707
- Ph.D. Thesis, Dept, of Electrical Engineering and Computer Science, MIT, Cambridge, MA
- V. R. Konda, Actor-Critic Algorithms, Ph.D. Thesis, Dept, of Electrical Engineering and Computer Science, MIT, Cambridge, MA, 2002.
- (2002) Actor-Critic Algorithms
- Konda, V.R.¹

13
- 0037288398
- Least squares policy evaluation algorithms with linear function approximation
- A. Nedic and D. P. Bertsekas, Least squares policy evaluation algorithms with linear function approximation, Discrete Event Dynamic Systems: Theory and Applications, vol. 13, pp. 79-110, 2003.
- (2003) Discrete Event Dynamic Systems: Theory and Applications , vol.13 , pp. 79-110
- Nedic, A.¹ Bertsekas, D.P.²

14
- 0003276733
- Mean-field analysis for batched TD(A)
- F. Pineda, Mean-field analysis for batched TD(A), Neural Computation, pp.1403-1419,1997.
- (1997) Neural Computation , pp. 1403-1419
- Pineda, F.¹

15
- 0003998452
- Wiley, New York
- M. L. Puterman, Markov Decision Processes, Wiley, New York, 1994.
- (1994) Markov Decision Processes
- Puterman, M.L.¹

16
- 0004102479
- MIT Press, Cambridge, MA
- R. S. Sutton and A. G. Barto, Reinforcement Learning, MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

17
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol. 3, pp. 9-44,1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

18
- 0003787427
- Ph.D. Thesis, MIT, Cambridge, MA
- B. Van Roy, Learning and Value Function Approximation in Complex Decision Processes, Ph.D. Thesis, MIT, Cambridge, MA, 1998.
- (1998) Learning and Value Function Approximation in Complex Decision Processes
- Van Roy, B.¹

19
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Trans, on Automatic Control, vol. 42, pp. 674-690,1997.
- (1997) IEEE Trans, on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.