메뉴 건너뛰기




Volumn 15, Issue , 2014, Pages 289-333

Off-policy learning with eligibility traces: A survey

Author keywords

Eligibility Traces; Off policy learning; Reinforcement learning; Value function estimation

Indexed keywords

MARKOV PROCESSES; REINFORCEMENT LEARNING;

EID: 84897081792     PISSN: 15324435     EISSN: 15337928     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (103)

References (33)
  • 1
    • 84897041261 scopus 로고    scopus 로고
    • Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path
    • A. Antos, Cs. Szepesvári, and R. Munos. Learning near-optimal policies with Bellmanresidual minimization based fitted policy iteration and a single sample path. In Conference on Learning Theory (COLT), 2006.
    • (2006) Conference on Learning Theory (COLT)
    • Antos, A.1    Szepesvári, C.2    Munos, R.3
  • 4
    • 4243567726 scopus 로고    scopus 로고
    • Temporal differences-based policy iteration and applications in neuro-dynamic programming
    • D.P. Bertsekas and S. Ioffe. Temporal differences-based policy iteration and applications in neuro-dynamic programming. Technical report, MIT, 1996.
    • (1996) Technical Report MIT
    • Bertsekas, D.P.1    Ioffe, S.2
  • 6
    • 61849106433 scopus 로고    scopus 로고
    • Projected equation methods for approximate solution of large linear systems
    • D.P. Bertsekas and H. Yu. Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics, 227:27-50, 2009.
    • (2009) Journal of Computational and Applied Mathematics , vol.227 , pp. 27-50
    • Bertsekas, D.P.1    Yu, H.2
  • 7
    • 84882309474 scopus 로고    scopus 로고
    • The tradeoffs of large scale learning
    • S. Sra, S. Nowozin, and S.J. Wright, editors, MIT Press
    • L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In S. Sra, S. Nowozin, and S.J. Wright, editors, Optimization for Machine Learning, pages 351-368. MIT Press, 2011.
    • (2011) Optimization for Machine Learning , pp. 351-368
    • Bottou, L.1    Bousquet, O.2
  • 8
    • 0036832950 scopus 로고    scopus 로고
    • Technical update: Least-squares temporal difference learning
    • DOI 10.1023/A:1017936530646
    • J.A. Boyan. Technical update: Least-squares temporal difference learning. Machine Learning, 49 (2-3):233-246, 1999. (Pubitemid 34325688)
    • (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
    • Boyan, J.A.1
  • 9
    • 0001771345 scopus 로고    scopus 로고
    • Linear least-squares algorithms for temporal difference learning
    • S.J. Bradtke and A.G. Barto. Linear least-squares algorithms for temporal difference learning. Machine Learning, 22 (1-3):33-57, 1996. (Pubitemid 126724362)
    • (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
    • Bradtke, S.J.1
  • 10
    • 33646435300 scopus 로고    scopus 로고
    • A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning
    • D. Choi and B. Van Roy. A generalized Kalman filter for fixed point approximation and efficient temporal-difference learning. Discrete Event Dynamic Systems, 16:207-239, 2006.
    • (2006) Discrete Event Dynamic Systems , vol.16 , pp. 207-239
    • Choi, D.1    Van Roy, B.2
  • 18
    • 77954101982 scopus 로고    scopus 로고
    • GQ (-): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
    • H.R. Maei and R.S. Sutton. GQ (-): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Conference on Artificial General Intelligence (AGI), 2010.
    • (2010) Conference on Artificial General Intelligence (AGI)
    • Maei, H.R.1    Sutton, R.S.2
  • 20
    • 0037288398 scopus 로고    scopus 로고
    • Least squares policy evaluation algorithms with linear function approximation
    • A. Nedic and D.P. Bertsekas. Least squares policy evaluation algorithms with linear function approximation. Discrete Event Dynamic Systems, 13:79-110, 2003.
    • (2003) Discrete Event Dynamic Systems , vol.13 , pp. 79-110
    • Nedic, A.1    Bertsekas, D.P.2
  • 23
    • 3042683131 scopus 로고    scopus 로고
    • Combining importance sampling and temporal difference control variates to simulate Markov chains
    • R.S. Randhawa and S. Juneja. Combining importance sampling and temporal difference control variates to simulate Markov chains. ACM Transactions on Modeling and Computer Simulation, 14 (1):1-30, 2004.
    • (2004) ACM Transactions on Modeling and Computer Simulation , vol.14 , Issue.1 , pp. 1-30
    • Randhawa, R.S.1    Juneja, S.2
  • 25
    • 77956551905 scopus 로고    scopus 로고
    • Should one compute the temporal difference fix point or minimize the Bellman residual? the unified oblique projection view
    • B. Scherrer. Should one compute the temporal difference fix point or minimize the Bellman residual? The unified oblique projection view. In International Conference on Machine Learning (ICML), 2010.
    • (2010) International Conference on Machine Learning (ICML)
    • Scherrer, B.1
  • 30
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • PII S0018928697034375
    • J.N. Tsitsiklis and B. Van Roy. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42 (5):674-690, 1997. (Pubitemid 127760263)
    • (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 31
    • 77956517288 scopus 로고    scopus 로고
    • Convergence of least-squares temporal difference methods under general conditions
    • H. Yu. Convergence of least-squares temporal difference methods under general conditions. In International Conference on Machine Learning (ICML), 2010a.
    • (2010) International Conference on Machine Learning (ICML)
    • Yu, H.1
  • 32
    • 79960454066 scopus 로고    scopus 로고
    • Least squares temporal difference methods: An analysis under general condtions
    • University of Helsinki, September
    • H. Yu. Least squares temporal difference methods: An analysis under general condtions. Technical Report C-2010-39, University of Helsinki, September 2010b.
    • (2010) Technical Report C-2010-2039
    • Yu, H.1
  • 33
    • 58449131194 scopus 로고    scopus 로고
    • New error bounds for approximations from projected linear equations
    • Dept. Computer Science, Univ. of Helsinki, July
    • H. Yu and D.P. Bertsekas. New error bounds for approximations from projected linear equations. Technical Report C-2008-43, Dept. Computer Science, Univ. of Helsinki, July 2008.
    • (2008) Technical Report C-2008-2043
    • Yu, H.1    Bertsekas, D.P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.