SCOPUS 정보 검색 플랫폼

Discrete Event Dynamic Systems: Theory and Applications

Volumn 13, Issue 1-2, 2003, Pages 79-110

Least squares policy evaluation algorithms with linear function approximation

(2) Nedic A a Bertsekas, D P a

a USA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

CONVERGENCE OF NUMERICAL METHODS; DYNAMIC PROGRAMMING; FUNCTION EVALUATION; GRADIENT METHODS; LEAST SQUARES APPROXIMATIONS; LINEAR PROGRAMMING; PROBABILITY DISTRIBUTIONS; PROBLEM SOLVING;

LEAST SQUARES SUBPROBLEMS; LINEAR FUNCTION APPROXIMATION; POLICY EVALUATION ALGORITHMS;

ALGORITHMS;

EID: 0037288398 PISSN: 09246703 EISSN: None Source Type: Journal
DOI: 10.1023/A:1022192903948 Document Type: Article

Times cited : (159)

References (21)

1
- 0003944893
- New York: Academic Press Inc.
- Ash, R. B. 1972. Real Analysis and Probability. New York: Academic Press Inc.
- (1972) Real Analysis and Probability
- Ash, R.B.¹

2
- 0000268954
- A counterexample to temporal differences learning
- Bertsekas, D. P. 1995. A counterexample to temporal differences learning. Neural Computation 7: 270-279.
- (1995) Neural Computation , vol.7 , pp. 270-279
- Bertsekas, D.P.¹

3
- 4243567726
- Temporal differences-based policy iteration and application in neuro-dynamic programming
- Lab. for Info. and Decision Systems Report LIDS-P-2349. Cambridge, MA: MIT
- Bertsekas, D. P., and Ioffe, S. 1996. Temporal differences-based policy iteration and application in neuro-dynamic programming. Lab. for Info. and Decision Systems Report LIDS-P-2349. Cambridge, MA: MIT.
- (1996)
- Bertsekas, D.P.¹ Ioffe, S.²

4
- 0003713964
- Belmont, MA: Athena Scientific
- Bertsekas, D. P. 1999. Nonlinear Programming, 2nd edition. Belmont, MA: Athena Scientific.
- (1999) Nonlinear Programming, 2nd Edition
- Bertsekas, D.P.¹

5
- 0003565783
- Belmont, MA: Athena Scientific
- Bertsekas, D. P. 2001. Dynamic Programming and Optimal Control, 2nd edition. Belmont, MA: Athena Scientific.
- (2001) Dynamic Programming and Optimal Control, 2nd Edition
- Bertsekas, D.P.¹

6
- 0003487482
- Belmont, MA: Athena Scientific
- Bertsekas, D. P., and Tsitsiklis, J. N. 1996. Neuro-Dynamic Programming. Belmont, MA: Athena Scientific.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

7
- 0034389611
- Gradient convergence in gradient methods with errors
- Bertsekas, D. P., and Tsitsiklis, J. N. 2000. Gradient convergence in gradient methods with errors. SIAM J. Optim. 10: 627-642.
- (2000) SIAM J. Optim. , vol.10 , pp. 627-642
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

8
- 0036832950
- Technical update: Least-squares temporal difference learning
- Boyan, J. A. 2002. Technical update: least-squares temporal difference learning. To appear in Machine Learning, 49.
- (2002) Machine Learning , vol.49
- Boyan, J.A.¹

9
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- Bradtke, S. J., and Barto, A. G. 1996. Linear least-squares algorithms for temporal difference learning. Machine Learning 22: 33-57.
- (1996) Machine Learning , vol.22 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

10
- 0028388685
- TD(λ) converges with probability 1
- Dayan, P., and Sejnowski, T. J. 1994. TD(λ) converges with probability 1. Machine Learning 14: 295-301.
- (1994) Machine Learning , vol.14 , pp. 295-301
- Dayan, P.¹ Sejnowski, T.J.²

11
- 0004169893
- Boston, MA: Kluwer Academic Publishers
- Gallager, R. G. 1995. Discrete Stochastic Processes. Boston, MA: Kluwer Academic Publishers.
- (1995) Discrete Stochastic Processes
- Gallager, R.G.¹

12
- 0003786198
- Working paper, Princeton, NJ: Siemens Corporate Research
- Gurvits, L., Lin, L., and Hanson, S. J. 1994. Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems. Working paper, Princeton, NJ: Siemens Corporate Research.
- (1994) Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems
- Gurvits, L.¹ Lin, L.² Hanson, S.J.³

13
- 0004236492
- Baltimore, MD: Johns Hopkins University Press
- Golub, G. H., and Van Loan, C. F. 1996. Matrix Computations, 3rd edition. Baltimore, MD: Johns Hopkins University Press.
- (1996) Matrix Computations, 3rd Edition
- Golub, G.H.¹ Van Loan, C.F.²

14
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M. I., and Singh S. P. 1994. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation 6: 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

15
- 0003979966
- New York: Van Nostrand Company
- Kemeny, J. G., and Snell, J. L. 1967. Finite Markov Chains. New York: Van Nostrand Company.
- (1967) Finite Markov Chains
- Kemeny, J.G.¹ Snell, J.L.²

16
- 0004239351
- Amsterdam: North-Holland
- Neveu, J. 1975. Discrete Parameter Martingales. Amsterdam: North-Holland.
- (1975) Discrete Parameter Martingales
- Neveu, J.¹

17
- 0003586471
- New York: John Wiley Inc.
- Parzen, E. 1962. Modern Probability Theory and Its Applications. New York: John Wiley Inc.
- (1962) Modern Probability Theory and Its Applications
- Parzen, E.¹

18
- 0003998452
- New York: John Wiley Inc.
- Puterman, M. L. 1994. Markovian Decision Problems. New York: John Wiley Inc.
- (1994) Markovian Decision Problems
- Puterman, M.L.¹

19
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3: 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

20
- 0035283402
- On the convergence of temporal-difference learning with linear function approximation
- Tadić, V. 2001. On the convergence of temporal-difference learning with linear function approximation. Machine Learning 42: 241-267.
- (2001) Machine Learning , vol.42 , pp. 241-267
- Tadić, V.¹

21
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J. N., and Van Roy, B. 1997. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42: 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.