SCOPUS 정보 검색 플랫폼

IEEE Transactions on Automatic Control

Volumn 54, Issue 7, 2009, Pages 1515-1531

Convergence results for some temporal difference methods based on least squares

(2) Yu, Huizhen a,b Bertsekas, Dimitri P a

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

b UNIVERSITY OF HELSINKI (Finland)

Author keywords

Approximation methods; Convergence of numerical methods; Dynamic programming; Markov processes

Indexed keywords

APPROXIMATION METHODS; AVERAGE COST; CONVERGENCE RATES; CONVERGENCE RESULTS; DISCOUNTED COSTS; FINITE-STATE; INFINITE HORIZONS; LEAST SQUARE; LINEAR FUNCTIONS; MARKOV DECISION PROCESS; POLICY EVALUATION; RATE OF CONVERGENCE; STATIONARY POLICY; STEPSIZE; TEMPORAL DIFFERENCES;

APPROXIMATION THEORY; COSTS; DYNAMIC PROGRAMMING; MARKOV PROCESSES; NUMBER THEORY; SYSTEMS ENGINEERING;

CONVERGENCE OF NUMERICAL METHODS;

EID: 67949109470 PISSN: 00189286 EISSN: None Source Type: Journal
DOI: 10.1109/TAC.2009.2022097 Document Type: Article

Times cited : (98)

References (30)

1
- 4043069840
- Actor-critic algorithms
- V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms," SIAM J. Control Optim., vol. 42, no. 4, pp. 1143-1166, 2003.
- (2003) SIAM J. Control Optim , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.R.¹ Tsitsiklis, J.N.²

2
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

3
- 0000430514
- The convergence of TD(λ) for general λ
- P. D. Dayan, "The convergence of TD(λ) for general λ," Machine Learning, vol. 8, pp. 341-362, 1992.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.D.¹

4
- 0003786198
- Princeton, NJ
- L. Gurvits, L. J. Lin, and S. J. Hanson, Incremental Learning of Evaluation Functions for Absorbing Markov Chains: NewMethods and Theorems Siemans Corporate Research, Princeton, NJ, 1994.
- (1994) Incremental Learning of Evaluation Functions for Absorbing Markov Chains: NewMethods and Theorems Siemans Corporate Research
- Gurvits, L.¹ Lin, L.J.² Hanson, S.J.³

5
- 0003276733
- Mean-field analysis for batched TD(λ)
- F. Pineda, "Mean-field analysis for batched TD(λ)," Neural Computation, pp. 1403-1419, 1997.
- (1997) Neural Computation , pp. 1403-1419
- Pineda, F.¹

6
- 0031143730
- An analysis of temporal-difference learning with function approximation
- May
- J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Control vol. 42, no. 5, pp. 674-690, May 1997.
- (1997) IEEE Trans. Automat. Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

7
- 0033221519
- Average cost temporal-difference learning
- J. N. Tsitsiklis and B. Van Roy, "Average cost temporal-difference learning," Automatica, vol. 35, no. 11, pp. 1799-1808, 1999.
- (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

8
- 0036832957
- On average versus discounted reward temporal-difference learning
- J. N. Tsitsiklis and B. Van Roy, "On average versus discounted reward temporal-difference learning," Machine Learning, vol. 49, pp. 179-191, 2002.
- (2002) Machine Learning , vol.49 , pp. 179-191
- Tsitsiklis, J.N.¹ Van Roy, B.²

9
- 0035249254
- Simulation-based optimization of Markov reward processes
- Feb
- P. Marbach and J. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. Automat. Control, vol. 46, no. 2, pp. 191-209, Feb. 2001.
- (2001) IEEE Trans. Automat. Control , vol.46 , Issue.2 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, J.N.²

10
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 2, pp. 33-57, 1996.
- (1996) Machine Learning , vol.22 , Issue.2 , pp. 33-57
- Bradtke, S.J.¹ Barto, A.G.²

11
- 0038595396
- Least-squares temporal difference learning
- J. A. Boyan, "Least-squares temporal difference learning," in Proc. 16th Int. Conf. Machine Learning, 1999, pp. 49-56.
- (1999) Proc. 16th Int. Conf. Machine Learning , pp. 49-56
- Boyan, J.A.¹

12
- 67949102334
- D. P. Bertsekas and S. Ioffe, Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming MIT, Cambridge, MA, LIDS Tech. Rep. LIDS-P-2349, 1996.
- D. P. Bertsekas and S. Ioffe, Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming MIT, Cambridge, MA, LIDS Tech. Rep. LIDS-P-2349, 1996.

13
- 0042758707
- Ph.D. dissertation, Dept. Comput. Sci. Elect. Eng, MIT, Cambridge, MA
- V. R. Konda, "Actor-critic algorithms," Ph.D. dissertation, Dept. Comput. Sci. Elect. Eng., MIT, Cambridge, MA, 2002.
- (2002) Actor-critic algorithms
- Konda, V.R.¹

14
- 0003487482
- Belmont, MA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

15
- 4644323293
- Least-squares policy iteration
- M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Machine Learning Res., vol. 4, pp. 1107-1149, 2003.
- (2003) J. Machine Learning Res , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

16
- 0037288398
- Least squares policy evaluation algorithms with linear function approximation
- A. Nedić and D. P. Bertsekas, "Least squares policy evaluation algorithms with linear function approximation," Discrete Event Dynamic Systems: Theory and Applications, vol. 13, pp. 79-110, 2003.
- (2003) Discrete Event Dynamic Systems: Theory and Applications , vol.13 , pp. 79-110
- Nedić, A.¹ Bertsekas, D.P.²

17
- 85036496976
- D. P. Bertsekas, V. S. Borkar, and A. Nedic', Improved Temporal Difference Methods With Linear Function Approximation, in Learning and Approximate Dynamic Programming, A. Barto, W. Powell, and J. Si, Eds. New York: IEEE Press, 2004, LIDS Tech. Rep. 2573, 2003.
- D. P. Bertsekas, V. S. Borkar, and A. Nedic', "Improved Temporal Difference Methods With Linear Function Approximation," in Learning and Approximate Dynamic Programming, A. Barto, W. Powell, and J. Si, Eds. New York: IEEE Press, 2004, LIDS Tech. Rep. 2573, 2003.

18
- 80053276028
- A function approximation approach to estimation of policy gradient for POMDP with structured polices
- H.Yu, "A function approximation approach to estimation of policy gradient for POMDP with structured polices," in Proc. 21st Conf. Uncertainty Artif. Intell., 2005, pp. 642-657.
- (2005) Proc. 21st Conf. Uncertainty Artif. Intell , pp. 642-657
- Yu, H.¹

19
- 0004007508
- Cambridge, MA: MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning. Cambridge, MA: MIT Press, 1998.
- (1998) Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

20
- 58449131194
- New Error Bounds for Approximations From Projected Linear Equations Univ. Helsinki, Helsinki, Finland
- Tech. Rep. C-2008-43
- H. Yu and D. P. Bertsekas, New Error Bounds for Approximations From Projected Linear Equations Univ. Helsinki, Helsinki, Finland, Tech. Rep. C-2008-43, 2008.
- (2008)
- Yu, H.¹ Bertsekas, D.P.²

21
- 0003602606
- New York: Academic Press
- J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables. New York: Academic Press, 1970.
- (1970) Iterative Solution of Nonlinear Equations in Several Variables
- Ortega, J.M.¹ Rheinboldt, W.C.²

22
- 0004169430
- Berlin, Germany: Springer-Verlag
- M. Duflo, Random Iterative Models. Berlin, Germany: Springer-Verlag, 1997.
- (1997) Random Iterative Models
- Duflo, M.¹

23
- 9944258743
- 2nd ed. New York: Springer-Verlag
- H. J. Kushner and G. G. Yin, Stochastic Approximation and Recursive Algorithms and Applications, 2nd ed. New York: Springer-Verlag, 2003.
- (2003) Stochastic Approximation and Recursive Algorithms and Applications
- Kushner, H.J.¹ Yin, G.G.²

24
- 0034342516
- On the existence of fixed points for approximate value iteration and temporal-difference learning
- D. P. de Farias and B. Van Roy, "On the existence of fixed points for approximate value iteration and temporal-difference learning," J. Optim. Theory Appl., vol. 105, no. 3, pp. 589-608, 2000.
- (2000) J. Optim. Theory Appl , vol.105 , Issue.3 , pp. 589-608
- de Farias, D.P.¹ Van Roy, B.²

25
- 0033351917
- Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
- Oct
- J. N. Tsitsiklis and B. Van Roy, "Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives," IEEE Trans. Automat. Control, vol. 44, no. 10, pp. 1840-1851, Oct. 1999.
- (1999) IEEE Trans. Automat. Control , vol.44 , Issue.10 , pp. 1840-1851
- Tsitsiklis, J.N.¹ Van Roy, B.²

26
- 67949097658
- H. Yu and D. P. Bertsekas, A Least Squares Q-Learning Algorithm for Optimal Stopping Problems MIT, Cambridge, MA, LIDS Tech. Rep. 2731, 2006.
- H. Yu and D. P. Bertsekas, A Least Squares Q-Learning Algorithm for Optimal Stopping Problems MIT, Cambridge, MA, LIDS Tech. Rep. 2731, 2006.

27
- 84927748655
- Q-learning algorithms for optimal stopping based on least squares
- H. Yu and D. P. Bertsekas, "Q-learning algorithms for optimal stopping based on least squares," in Proc. Eur. Control Conf., 2007, pp. 2368-2375.
- (2007) Proc. Eur. Control Conf , pp. 2368-2375
- Yu, H.¹ Bertsekas, D.P.²

28
- 28544451799
- Stochastic approximation with 'controlled Markov' noise
- V. S. Borkar, "Stochastic approximation with 'controlled Markov' noise," Syst. Control Lett., vol. 55, pp. 139-145, 2006.
- (2006) Syst. Control Lett , vol.55 , pp. 139-145
- Borkar, V.S.¹

29
- 58849087743
- New Delhi, India: Hindustan Book Agency
- V. S. Borkar, Stochastic Approximation: A Dynamic Viewpoint. New Delhi, India: Hindustan Book Agency, 2008.
- (2008) Stochastic Approximation: A Dynamic Viewpoint
- Borkar, V.S.¹

30
- 61849106433
- Projected equation methods for approximate solution of large linear systems
- May
- D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Sci. Appl. Math., vol. 227, no. 1, pp. 27-50, May 2009.
- (2009) J. Comput. Sci. Appl. Math , vol.227 , Issue.1 , pp. 27-50
- Bertsekas, D.P.¹ Yu, H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.