메뉴 건너뛰기




Volumn 54, Issue 7, 2009, Pages 1515-1531

Convergence results for some temporal difference methods based on least squares

Author keywords

Approximation methods; Convergence of numerical methods; Dynamic programming; Markov processes

Indexed keywords

APPROXIMATION METHODS; AVERAGE COST; CONVERGENCE RATES; CONVERGENCE RESULTS; DISCOUNTED COSTS; FINITE-STATE; INFINITE HORIZONS; LEAST SQUARE; LINEAR FUNCTIONS; MARKOV DECISION PROCESS; POLICY EVALUATION; RATE OF CONVERGENCE; STATIONARY POLICY; STEPSIZE; TEMPORAL DIFFERENCES;

EID: 67949109470     PISSN: 00189286     EISSN: None     Source Type: Journal    
DOI: 10.1109/TAC.2009.2022097     Document Type: Article
Times cited : (98)

References (30)
  • 1
    • 4043069840 scopus 로고    scopus 로고
    • Actor-critic algorithms
    • V. R. Konda and J. N. Tsitsiklis, "Actor-critic algorithms," SIAM J. Control Optim., vol. 42, no. 4, pp. 1143-1166, 2003.
    • (2003) SIAM J. Control Optim , vol.42 , Issue.4 , pp. 1143-1166
    • Konda, V.R.1    Tsitsiklis, J.N.2
  • 2
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 3
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • P. D. Dayan, "The convergence of TD(λ) for general λ," Machine Learning, vol. 8, pp. 341-362, 1992.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.D.1
  • 5
    • 0003276733 scopus 로고    scopus 로고
    • Mean-field analysis for batched TD(λ)
    • F. Pineda, "Mean-field analysis for batched TD(λ)," Neural Computation, pp. 1403-1419, 1997.
    • (1997) Neural Computation , pp. 1403-1419
    • Pineda, F.1
  • 6
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • May
    • J. N. Tsitsiklis and B. Van Roy, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Control vol. 42, no. 5, pp. 674-690, May 1997.
    • (1997) IEEE Trans. Automat. Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 7
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • J. N. Tsitsiklis and B. Van Roy, "Average cost temporal-difference learning," Automatica, vol. 35, no. 11, pp. 1799-1808, 1999.
    • (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 8
    • 0036832957 scopus 로고    scopus 로고
    • On average versus discounted reward temporal-difference learning
    • J. N. Tsitsiklis and B. Van Roy, "On average versus discounted reward temporal-difference learning," Machine Learning, vol. 49, pp. 179-191, 2002.
    • (2002) Machine Learning , vol.49 , pp. 179-191
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 9
    • 0035249254 scopus 로고    scopus 로고
    • Simulation-based optimization of Markov reward processes
    • Feb
    • P. Marbach and J. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. Automat. Control, vol. 46, no. 2, pp. 191-209, Feb. 2001.
    • (2001) IEEE Trans. Automat. Control , vol.46 , Issue.2 , pp. 191-209
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 10
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke and A. G. Barto, "Linear least-squares algorithms for temporal difference learning," Machine Learning, vol. 22, no. 2, pp. 33-57, 1996.
    • (1996) Machine Learning , vol.22 , Issue.2 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 12
    • 67949102334 scopus 로고    scopus 로고
    • D. P. Bertsekas and S. Ioffe, Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming MIT, Cambridge, MA, LIDS Tech. Rep. LIDS-P-2349, 1996.
    • D. P. Bertsekas and S. Ioffe, Temporal Differences-Based Policy Iteration and Applications in Neuro-Dynamic Programming MIT, Cambridge, MA, LIDS Tech. Rep. LIDS-P-2349, 1996.
  • 13
    • 0042758707 scopus 로고    scopus 로고
    • Ph.D. dissertation, Dept. Comput. Sci. Elect. Eng, MIT, Cambridge, MA
    • V. R. Konda, "Actor-critic algorithms," Ph.D. dissertation, Dept. Comput. Sci. Elect. Eng., MIT, Cambridge, MA, 2002.
    • (2002) Actor-critic algorithms
    • Konda, V.R.1
  • 15
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • M. G. Lagoudakis and R. Parr, "Least-squares policy iteration," J. Machine Learning Res., vol. 4, pp. 1107-1149, 2003.
    • (2003) J. Machine Learning Res , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 17
    • 85036496976 scopus 로고    scopus 로고
    • D. P. Bertsekas, V. S. Borkar, and A. Nedic', Improved Temporal Difference Methods With Linear Function Approximation, in Learning and Approximate Dynamic Programming, A. Barto, W. Powell, and J. Si, Eds. New York: IEEE Press, 2004, LIDS Tech. Rep. 2573, 2003.
    • D. P. Bertsekas, V. S. Borkar, and A. Nedic', "Improved Temporal Difference Methods With Linear Function Approximation," in Learning and Approximate Dynamic Programming, A. Barto, W. Powell, and J. Si, Eds. New York: IEEE Press, 2004, LIDS Tech. Rep. 2573, 2003.
  • 18
    • 80053276028 scopus 로고    scopus 로고
    • A function approximation approach to estimation of policy gradient for POMDP with structured polices
    • H.Yu, "A function approximation approach to estimation of policy gradient for POMDP with structured polices," in Proc. 21st Conf. Uncertainty Artif. Intell., 2005, pp. 642-657.
    • (2005) Proc. 21st Conf. Uncertainty Artif. Intell , pp. 642-657
    • Yu, H.1
  • 20
    • 58449131194 scopus 로고    scopus 로고
    • New Error Bounds for Approximations From Projected Linear Equations Univ. Helsinki, Helsinki, Finland
    • Tech. Rep. C-2008-43
    • H. Yu and D. P. Bertsekas, New Error Bounds for Approximations From Projected Linear Equations Univ. Helsinki, Helsinki, Finland, Tech. Rep. C-2008-43, 2008.
    • (2008)
    • Yu, H.1    Bertsekas, D.P.2
  • 24
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • D. P. de Farias and B. Van Roy, "On the existence of fixed points for approximate value iteration and temporal-difference learning," J. Optim. Theory Appl., vol. 105, no. 3, pp. 589-608, 2000.
    • (2000) J. Optim. Theory Appl , vol.105 , Issue.3 , pp. 589-608
    • de Farias, D.P.1    Van Roy, B.2
  • 25
    • 0033351917 scopus 로고    scopus 로고
    • Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives
    • Oct
    • J. N. Tsitsiklis and B. Van Roy, "Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing financial derivatives," IEEE Trans. Automat. Control, vol. 44, no. 10, pp. 1840-1851, Oct. 1999.
    • (1999) IEEE Trans. Automat. Control , vol.44 , Issue.10 , pp. 1840-1851
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 26
    • 67949097658 scopus 로고    scopus 로고
    • H. Yu and D. P. Bertsekas, A Least Squares Q-Learning Algorithm for Optimal Stopping Problems MIT, Cambridge, MA, LIDS Tech. Rep. 2731, 2006.
    • H. Yu and D. P. Bertsekas, A Least Squares Q-Learning Algorithm for Optimal Stopping Problems MIT, Cambridge, MA, LIDS Tech. Rep. 2731, 2006.
  • 27
    • 84927748655 scopus 로고    scopus 로고
    • Q-learning algorithms for optimal stopping based on least squares
    • H. Yu and D. P. Bertsekas, "Q-learning algorithms for optimal stopping based on least squares," in Proc. Eur. Control Conf., 2007, pp. 2368-2375.
    • (2007) Proc. Eur. Control Conf , pp. 2368-2375
    • Yu, H.1    Bertsekas, D.P.2
  • 28
    • 28544451799 scopus 로고    scopus 로고
    • Stochastic approximation with 'controlled Markov' noise
    • V. S. Borkar, "Stochastic approximation with 'controlled Markov' noise," Syst. Control Lett., vol. 55, pp. 139-145, 2006.
    • (2006) Syst. Control Lett , vol.55 , pp. 139-145
    • Borkar, V.S.1
  • 30
    • 61849106433 scopus 로고    scopus 로고
    • Projected equation methods for approximate solution of large linear systems
    • May
    • D. P. Bertsekas and H. Yu, "Projected equation methods for approximate solution of large linear systems," J. Comput. Sci. Appl. Math., vol. 227, no. 1, pp. 27-50, May 2009.
    • (2009) J. Comput. Sci. Appl. Math , vol.227 , Issue.1 , pp. 27-50
    • Bertsekas, D.P.1    Yu, H.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.