메뉴 건너뛰기




Volumn , Issue , 2010, Pages 1071-1078

Least-Squares λ Policy Iteration: Bias-variance trade-off in control problems

Author keywords

[No Author keywords available]

Indexed keywords

IN-CONTROL; LARGE SPACES; LEAST SQUARE; PERFORMANCE BOUNDS; POLICY ITERATION; TETRIS GAME; TRAINING SAMPLE; VALUE FUNCTION APPROXIMATION; VALUE ITERATION;

EID: 77956525931     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (23)

References (16)
  • 1
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and applications in neuro-dynamic programming
    • Bertsekas, D. and Ioffe, S. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical report, MIT, 1996.
    • (1996) Technical Report, MIT
    • Bertsekas, D.1    Ioffe, S.2
  • 3
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • Boyan, J. A. Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246, 2002.
    • (2002) Machine Learning , vol.49 , pp. 233-246
    • Boyan, J.A.1
  • 4
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Bradtke, S. J. and Barto, A.G. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57, 1996.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2
  • 8
    • 0037288398 scopus 로고    scopus 로고
    • Least squares policy evaluation algorithms with linear function approximation
    • Nedić, A. and Bertsekas, D. P. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13(1-2): 79-110, 2003.
    • (2003) Discrete Event Dynamic Systems , vol.13 , Issue.1-2 , pp. 79-110
    • Nedić, A.1    Bertsekas, D.P.2
  • 10
    • 1942482175 scopus 로고    scopus 로고
    • Optimality of reinforcement learning algorithms with linear function approximation
    • Schoknecht, Ralf. Optimality of reinforcement learning algorithms with linear function approximation. In NIPS, pp. 1555-1562, 2002.
    • (2002) NIPS , pp. 1555-1562
    • Schoknecht, R.1
  • 16
    • 67949109470 scopus 로고    scopus 로고
    • Convergence results for some temporal difference methods based on least squares
    • Yu, H. and Bertsekas, D. P. Convergence Results for Some Temporal Difference Methods Based on Least Squares. IEEE Trans. Automatic Control, 54:1515-1531, 2009.
    • (2009) IEEE Trans. Automatic Control , vol.54 , pp. 1515-1531
    • Yu, H.1    Bertsekas, D.P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.