메뉴 건너뛰기




Volumn , Issue , 2010, Pages 486-491

Online least-squares policy iteration for reinforcement learning control

Author keywords

[No Author keywords available]

Indexed keywords

E-LEARNING; ITERATIVE METHODS; LEARNING ALGORITHMS; LEAST SQUARES APPROXIMATIONS; MACHINE LEARNING; PETROLEUM RESERVOIR EVALUATION;

EID: 77957782880     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/acc.2010.5530856     Document Type: Conference Paper
Times cited : (66)

References (14)
  • 3
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • J. Boyan, "Technical update: Least-squares temporal difference learning," Machine Learning, vol. 49, pp. 233-246, 2002.
    • (2002) Machine Learning , vol.49 , pp. 233-246
    • Boyan, J.1
  • 4
    • 0037288398 scopus 로고    scopus 로고
    • Least-squares policy evaluation algorithms with linear function approximation
    • A. Nedić and D. P. Bertsekas, "Least-squares policy evaluation algorithms with linear function approximation," Discrete Event Dynamic Systems: Theory and Applications, vol. 13, no. 1-2, pp. 79-110, 2003.
    • (2003) Discrete Event Dynamic Systems: Theory and Applications , vol.13 , Issue.1-2 , pp. 79-110
    • Nedić, A.1    Bertsekas, D.P.2
  • 7
    • 0042758707 scopus 로고    scopus 로고
    • Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, US
    • V. Konda, "Actor-critic algorithms," Ph.D. dissertation, Massachusetts Institute of Technology, Cambridge, US, 2002.
    • (2002) Actor-critic Algorithms
    • Konda, V.1
  • 8
    • 67949109470 scopus 로고    scopus 로고
    • Convergence results for some temporal difference methods based on least squares
    • H. Yu and D. P. Bertsekas, "Convergence results for some temporal difference methods based on least squares," IEEE Transactions on Automatic Control, vol. 54, no. 7, pp. 1515-1531, 2009.
    • (2009) IEEE Transactions on Automatic Control , vol.54 , Issue.7 , pp. 1515-1531
    • Yu, H.1    Bertsekas, D.P.2
  • 12
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and applications in neuro-dynamic programming
    • US, available at
    • D. P. Bertsekas and S. Ioffe, "Temporal differences-based policy iteration and applications in neuro-dynamic programming," Massachusetts Institute of Technology, Cambridge, US, Tech. Rep. LIDS-P-2349, 1996, available at http://web.mit.edu/dimitrib/www/Tempdif.pdf.
    • (1996) Massachusetts Institute of Technology, Cambridge, Tech. Rep. LIDS-P-2349
    • Bertsekas, D.P.1    Ioffe, S.2
  • 13
    • 33847202724 scopus 로고
    • Learning to predict by the method of temporal differences
    • R. S. Sutton, "Learning to predict by the method of temporal differences," Machine Learning, vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 14
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • S. Singh, T. Jaakkola, M. L. Littman, and Cs. Szepesvári, "Convergence results for single-step on-policy reinforcement-learning algorithms," Machine Learning, vol. 38, no. 3, pp. 287-308, 2000.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szepesvári, Cs.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.