메뉴 건너뛰기




Volumn 382, Issue , 2009, Pages

Fast gradient-descent methods for temporal-difference learning with linear function approximation

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL REQUIREMENTS; COMPUTER GO; CONVERGENCE RATES; FUNCTION APPROXIMATORS; GRADIENT-DESCENT; LEARNING RATES; LINEAR FUNCTIONS; OBJECTIVE FUNCTIONS; POLICY PROBLEM; TEMPORAL DIFFERENCE LEARNING; TEMPORAL DIFFERENCES; TEST PROBLEM;

EID: 70049090437     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1553374.1553501     Document Type: Conference Paper
Times cited : (232)

References (18)
  • 1
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • Antos, A., Szepesv́ari, Cs., Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.Machine Learning 71:89-129.
    • (2008) Machine Learning , vol.71 , pp. 89-129
    • Antos, A.1    Szepesv́ari, C.2    Munos, R.3
  • 2
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th Int. Conf. on Machine Learning, pp. 30-37.
    • (1995) Proceedings of the 12th Int. Conf. on Machine Learning , pp. 30-37
    • Baird, L.C.1
  • 5
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two timescales
    • Borkar, V. S. (1997). Stochastic approximation with two timescales. Systems and Control Letters 29:291-294.
    • (1997) Systems and Control Letters , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 6
    • 0033876515 scopus 로고    scopus 로고
    • The ODE method for convergence of stochastic approximation and reinforcement learning
    • Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control And Optimization 38(2):447-469.
    • (2000) SIAM Journal on Control And Optimization , vol.38 , Issue.2 , pp. 447-469
    • Borkar, V.S.1    Meyn, S.P.2
  • 7
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning 49:233-246.
    • (2002) Machine Learning , vol.49 , pp. 233-246
    • Boyan, J.1
  • 8
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S., Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning 22:33-57.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.1    Barto, A.G.2
  • 9
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning 8:341-362.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 10
    • 33750737011 scopus 로고    scopus 로고
    • Incremental least-square temporal difference learning
    • Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings AAAI, pp. 356-361.
    • (2006) Proceedings AAAI , pp. 356-361
    • Geramifard, A.1    Bowling, M.2    Sutton, R.S.3
  • 11
    • 0024909476 scopus 로고
    • Convergent activation dynamics in continuous time networks
    • Hirsch, M. W. (1989). Convergent activation dynamics in continuous time networks. Neural Networks 2:331-349.
    • (1989) Neural Networks , vol.2 , pp. 331-349
    • Hirsch, M.W.1
  • 16
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning 3:9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 17
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • Sutton, R. S., Precup D., and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112:181-211.
    • (1999) Artificial Intelligence , vol.112 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 18
    • 77956513316 scopus 로고    scopus 로고
    • A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation
    • Sutton, R. S., Szepesv́ari, Cs., Maei, H. R. (2009). A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation. Advances in Neural Information Processing Systems 21.
    • (2009) Advances in Neural Information Processing Systems , vol.21
    • Sutton, R.S.1    Szepesv́ari, C.2    Maei, H.R.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.