메뉴 건너뛰기




Volumn , Issue , 2011, Pages 481-488

Incremental basis construction from temporal difference error

Author keywords

[No Author keywords available]

Indexed keywords

BASIS FUNCTIONS; BELLMAN ERROR; DISCOUNT FACTORS; LINEAR COMBINATIONS; ONLINE VERSIONS; REWARD FUNCTION; TEMPORAL DIFFERENCE ERRORS; VALUE FUNCTIONS;

EID: 80053457849     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (11)

References (18)
  • 1
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • L. Baird. Residual algorithms: Reinforcement learning with function approximation. In ICML'95, 1995.
    • (1995) ICML'95
    • Baird, L.1
  • 2
    • 85162041278 scopus 로고    scopus 로고
    • Predictive state temporal difference learning
    • B. Boots and G. J. Gordon. Predictive state temporal difference learning. In NIPS'10, 2010.
    • (2010) NIPS'10
    • Boots, B.1    Gordon, G.J.2
  • 3
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • ISSN 0885-6125
    • J. A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246, 2002. ISSN 0885-6125.
    • (2002) Machine Learning , vol.49 , pp. 233-246
    • Boyan, J.A.1
  • 4
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S. J. Bradtke, A. G. Barto, and L. P. Kaelbling. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57, 1996.
    • (1996) Machine Learning , vol.22 , pp. 33-57
    • Bradtke, S.J.1    Barto, A.G.2    Kaelbling, L.P.3
  • 5
    • 80053451079 scopus 로고    scopus 로고
    • Incremental least-squares temporal difference learning
    • A. Geramifard, M. Bowling, and R. S. Sutton. Incremental least-squares temporal difference learning. In AAAI'06, 2006.
    • (2006) AAAI'06
    • Geramifard, A.1    Bowling, M.2    Sutton, R.S.3
  • 7
    • 33749263205 scopus 로고    scopus 로고
    • Automatic basis function construction for approximate dynamic programming and reinforcement learning
    • P. W. Keller, S. Mannor, and D. Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. In ICML'06, 2006.
    • (2006) ICML'06
    • Keller, P.W.1    Mannor, S.2    Precup, D.3
  • 8
    • 71149121683 scopus 로고    scopus 로고
    • Regularization and feature selection in least-squares temporal difference learning
    • Z. J. Kolter and A. Y. Ng. Regularization and feature selection in least-squares temporal difference learning. In ICML'09, 2009.
    • (2009) ICML'09
    • Kolter, Z.J.1    Ng, A.Y.2
  • 10
    • 77954101982 scopus 로고    scopus 로고
    • Gq(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
    • H. R. Maei and R. S. Sutton. Gq(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In AGI'10, 2010.
    • (2010) AGI'10
    • Maei, H.R.1    Sutton, R.S.2
  • 11
    • 85161990353 scopus 로고    scopus 로고
    • Basis construction from power series expansions of value functions
    • S. Mahadevan and B. Liu. Basis construction from power series expansions of value functions. In NIPS'10, 2010.
    • (2010) NIPS'10
    • Mahadevan, S.1    Liu, B.2
  • 12
    • 35748957806 scopus 로고    scopus 로고
    • Proto-value functions: A laplacian framework for learning representation and control in markov decision processes
    • S. Mahadevan, M. Maggioni, and C. Guestrin. Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research, 8: 2007, 2006.
    • (2006) Journal of Machine Learning Research , vol.8 , pp. 2007
    • Mahadevan, S.1    Maggioni, M.2    Guestrin, C.3
  • 14
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3:9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 16
    • 77956513316 scopus 로고    scopus 로고
    • A convergent o(n) algorithm for off-policy temporal-difference learning with linear function approximation
    • R. S. Sutton, C. Szepesvári, and H. R. Maei. A convergent o(n) algorithm for off-policy temporal-difference learning with linear function approximation. In NIPS'08, 2008.
    • (2008) NIPS'08
    • Sutton, R.S.1    Szepesvári, C.2    Maei, H.R.3
  • 18
    • 26944495251 scopus 로고    scopus 로고
    • Feature-discovering approximate value iteration methods
    • J.-H. Wu and R. Givan. Feature-discovering approximate value iteration methods. In SARA'05, pages 321-331, 2005.
    • (2005) SARA'05 , pp. 321-331
    • Wu, J.-H.1    Givan, R.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.