메뉴 건너뛰기




Volumn , Issue , 2009, Pages 1609-1616

A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; IMPORTANCE SAMPLING; LEAST SQUARES APPROXIMATIONS; MARKOV PROCESSES; STOCHASTIC SYSTEMS;

EID: 77956513316     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (332)

References (24)
  • 4
    • 0033876515 scopus 로고    scopus 로고
    • O.D.E. method for convergence of stochastic approximation and reinforcement learning
    • Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control And Optimization, 38(2):447-469. (Pubitemid 30594546)
    • (2000) SIAM Journal on Control and Optimization , vol.38 , Issue.2 , pp. 447-469
    • Borkar, V.S.1    Meyn, S.P.2
  • 5
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • DOI 10.1023/A:1017936530646
    • Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246. (Pubitemid 34325688)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.A.1
  • 6
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S., Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57. (Pubitemid 126724362)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1
  • 7
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8:341-362.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 8
    • 33750737011 scopus 로고    scopus 로고
    • Incremental least-squares temporal difference learning
    • Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
    • Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings of the National Conference on Artificial Intelligence, pp. 356-361. (Pubitemid 44705310)
    • (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
    • Geramifard, A.1    Bowling, M.2    Sutton, R.S.3
  • 18
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3:9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 20
    • 0033170372 scopus 로고    scopus 로고
    • Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    • DOI 10.1016/S0004-3702(99)00052-1
    • Sutton, R.S., Precup D. and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211. (Pubitemid 32079890)
    • (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
    • Sutton, R.S.1    Precup, D.2    Singh, S.3
  • 22
    • 0035283402 scopus 로고    scopus 로고
    • On the convergence of temporal-difference learning with linear function approximation
    • DOI 10.1023/A:1007609817671
    • Tadic, V. (2001). On the convergence of temporal-difference learning with linear function approximation. In Machine Learning 42:241-267 (Pubitemid 32188797)
    • (2001) Machine Learning , vol.42 , Issue.3 , pp. 241-267
    • Tadic, V.1
  • 23
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • PII S0018928697034375
    • Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42:674-690. (Pubitemid 127760263)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.