메뉴 건너뛰기




Volumn , Issue , 2010, Pages

Predictive State Temporal Difference learning

Author keywords

[No Author keywords available]

Indexed keywords

HIGH-DIMENSIONAL; HIGHER-DIMENSIONAL; NEW APPROACHES; REINFORCEMENT LEARNINGS; SETS OF FEATURES; SUBSPACE IDENTIFICATION; TEMPORAL DIFFERENCE LEARNING; TEMPORAL DIFFERENCE REINFORCEMENT LEARNING; TEMPORAL DIFFERENCES; VALUE FUNCTION APPROXIMATION;

EID: 85162041278     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (26)

References (25)
  • 1
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3(1):9-44, 1988.
    • (1988) Machine Learning , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 2
    • 0038595396 scopus 로고    scopus 로고
    • Least-squares temporal difference learning
    • Morgan Kaufmann, San Francisco, CA
    • Justin A. Boyan. Least-squares temporal difference learning. In Proc. Intl. Conf. Machine Learning, pages 49-56. Morgan Kaufmann, San Francisco, CA, 1999.
    • (1999) Proc. Intl. Conf. Machine Learning , pp. 49-56
    • Boyan, J.A.1
  • 3
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • Steven J. Bradtke and Andrew G. Barto. Linear least-squares algorithms for temporal difference learning. In Machine Learning, pages 22-33, 1996.
    • (1996) Machine Learning , pp. 22-33
    • Bradtke, S.J.1    Barto, A.G.2
  • 4
    • 4644323293 scopus 로고    scopus 로고
    • Least-squares policy iteration
    • Michail G. Lagoudakis and Ronald Parr. Least-squares policy iteration. J. Mach. Learn. Res., 4:1107-1149, 2003.
    • (2003) J. Mach. Learn. Res. , vol.4 , pp. 1107-1149
    • Lagoudakis, M.G.1    Parr, R.2
  • 6
    • 14344256568 scopus 로고    scopus 로고
    • Learning low dimensional predictive representations
    • Matthew Rosencrantz, Geoffrey J. Gordon, and Sebastian Thrun. Learning low dimensional predictive representations. In Proc. ICML, 2004.
    • (2004) Proc. ICML
    • Rosencrantz, M.1    Gordon, G.J.2    Thrun, S.3
  • 8
    • 85156266716 scopus 로고    scopus 로고
    • Value-directed compression of pomdps
    • Pascal Poupart and Craig Boutilier. Value-directed compression of pomdps. In NIPS, pages 1547-1554, 2002.
    • (2002) NIPS , pp. 1547-1554
    • Poupart, P.1    Boutilier, C.2
  • 17
    • 0034198996 scopus 로고    scopus 로고
    • Observable operator models for discrete stochastic time series
    • Herbert Jaeger. Observable operator models for discrete stochastic time series. Neural Computation, 12:1371-1398, 2000.
    • (2000) Neural Computation , vol.12 , pp. 1371-1398
    • Jaeger, H.1
  • 18
    • 31844457132 scopus 로고    scopus 로고
    • Predictive state representations: A new theory for modeling dynamical systems
    • Satinder Singh, Michael James, and Matthew Rudary. Predictive state representations: A new theory for modeling dynamical systems. In Proc. UAI, 2004.
    • (2004) Proc. UAI
    • Singh, S.1    James, M.2    Rudary, M.3
  • 21
    • 84898066687 scopus 로고    scopus 로고
    • A spectral algorithm for learning hidden Markov models
    • Daniel Hsu, Sham Kakade, and Tong Zhang. A spectral algorithm for learning hidden Markov models. In COLT, 2009.
    • (2009) COLT
    • Hsu, D.1    Kakade, S.2    Zhang, T.3
  • 22
    • 84860648072 scopus 로고    scopus 로고
    • Improving approximate value iteration using memories and predictive state representations
    • Michael R. James, Ton Wessling, and Nikos A. Vlassis. Improving approximate value iteration using memories and predictive state representations. In AAAI, 2006.
    • (2006) AAAI
    • James, M.R.1    Wessling, T.2    Vlassis, N.A.3
  • 24
    • 0033351917 scopus 로고    scopus 로고
    • Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives
    • John N. Tsitsiklis and Benjamin Van Roy. Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives. IEEE Transactions on Automatic Control, 44:1840-1851, 1997.
    • (1997) IEEE Transactions on Automatic Control , vol.44 , pp. 1840-1851
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 25
    • 33646435300 scopus 로고    scopus 로고
    • A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
    • David Choi and Benjamin Roy. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems, 16(2):207-239, 2006.
    • (2006) Discrete Event Dynamic Systems , vol.16 , Issue.2 , pp. 207-239
    • Choi, D.1    Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.