메뉴 건너뛰기




Volumn , Issue , 2010, Pages 615-622

Finite-sample analysis of LSTD

Author keywords

[No Author keywords available]

Indexed keywords

GENERALIZATION BOUND; LEAST SQUARE; MARKOV CHAIN; POLICY EVALUATION; SAMPLE ANALYSIS; STATIONARY DISTRIBUTION; VALUE FUNCTIONS;

EID: 77956549349     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (62)

References (8)
  • 1
    • 40849145988 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
    • Antos, A., Szepesvári, Cs., and Munos, R. Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path. Machine Learning Journal, 71:89-129, 2008.
    • (2008) Machine Learning Journal , vol.71 , pp. 89-129
    • Antos, A.1    Szepesvári, Cs.2    Munos, R.3
  • 2
    • 33645411999 scopus 로고    scopus 로고
    • Dynamic programming and optimal control
    • Bertsekas, D. Dynamic Programming and Optimal Control. Athena Scientific, 2001.
    • (2001) Athena Scientific
    • Bertsekas, D.1
  • 4
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S. and Barto, A. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57, 1996.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.1    Barto, A.2
  • 8
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal difference learning with function approximation
    • Tsitsiklis, J. and Van Roy, B. An analysis of temporal difference learning with function approximation. IEEE Transactions on Automatic Control, 42:674- 690, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.