메뉴 건너뛰기




Volumn , Issue , 2011, Pages

The fixed points of off-policy TD

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; REINFORCEMENT LEARNING;

EID: 85162349973     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (64)

References (19)
  • 3
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • P. Dayan. The convergence of TD(λ) for general λ. Machine Learning, 8(3-4), 1992.
    • (1992) Machine Learning , vol.8 , Issue.3-4
    • Dayan, P.1
  • 4
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1994.
    • (1994) Neural Computation , vol.6 , pp. 6
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 5
    • 77956086700 scopus 로고    scopus 로고
    • Low-rank optimization on the cone of positive semidefinite matrices
    • M. Journee, F. Bach, P.A. Absil, and R. Sepulchre. Low-rank optimization on the cone of positive semidefinite matrices. SIAM Journal on Optimization, 20(5):2327-2351, 2010.
    • (2010) SIAM Journal on Optimization , vol.20 , Issue.5 , pp. 2327-2351
    • Journee, M.1    Bach, F.2    Absil, P.A.3    Sepulchre, R.4
  • 13
    • 85162310185 scopus 로고    scopus 로고
    • Personal communication
    • B. Scherrer. Personal communication, 2011.
    • (2011)
    • Scherrer, B.1
  • 14
    • 84860607818 scopus 로고    scopus 로고
    • minfunc
    • M. Schmidt. minfunc, 2005. Available at http://www.cs.ubc.ca/~schmidtm/ Software/minFunc.html.
    • (2005) M. Schmidt
  • 16
    • 77956513316 scopus 로고    scopus 로고
    • A convergent O(n) algorithm for off-policy temporal-different learning with linear function approximation
    • R.S. Sutton, Cs. Szepesvari, and H.R. Maei. A convergent O(n) algorithm for off-policy temporal-different learning with linear function approximation. In Advances in Neural Information Processing, 2008.
    • (2008) Advances in Neural Information Processing
    • Sutton, R.S.1    Szepesvari, Cs.2    Maei, H.R.3
  • 17
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions and Auotomatic Control, 42:674-690, 1997.
    • (1997) IEEE Transactions and Auotomatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 18
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal difference learning
    • J.N. Tsitsiklis and B. Van Roy. Average cost temporal difference learning. Automatica, 35(11):1799-1808, 1999.
    • (1999) Automatica , vol.35 , Issue.11 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 19
    • 77953119098 scopus 로고    scopus 로고
    • Error bounds for approximations from projected linear equations
    • H. Yu and D. P. Bertsekas. Error bounds for approximations from projected linear equations. Mathematics of Operations Research, 35:306-329, 2010.
    • (2010) Mathematics of Operations Research , vol.35 , pp. 306-329
    • Yu, H.1    Bertsekas, D.P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.