메뉴 건너뛰기




Volumn 3, Issue , 2014, Pages 1973-1988

A new Q(λ) with interim forward view and Monte Carlo equivalence

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; MONTE CARLO METHODS; PARALLEL PROCESSING SYSTEMS; PARAMETER ESTIMATION; REINFORCEMENT LEARNING;

EID: 84919913727     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (20)

References (22)
  • 2
    • 84897081792 scopus 로고    scopus 로고
    • Off-policy learning with eligibility traces: A survey
    • Geist, M., Scherrer, B. (2014). Off-policy learning with eligibility traces: A survey. Journal of Machine Learning Research 15:289-333.
    • (2014) Journal of Machine Learning Research , vol.15 , pp. 289-333
    • Geist, M.1    Scherrer, B.2
  • 5
    • 77954101982 scopus 로고    scopus 로고
    • GQ(A): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
    • Atlantis Press
    • Maei, H. R., Sutton, R. S. (2010). GQ(A): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, pp. 91-96. Atlantis Press.
    • (2010) Proceedings of the Third Conference on Artificial General Intelligence , pp. 91-96
    • Maei, H.R.1    Sutton, R.S.2
  • 6
    • 84896357393 scopus 로고    scopus 로고
    • Multi-timescale nexting in a reinforcement learning robot
    • Modayil, J., White, A., Sutton, R. S. (2014). Multi-timescale nexting in a reinforcement learning robot. Adaptive Behavior 22(2): 146-160.
    • (2014) Adaptive Behavior , vol.22 , Issue.2 , pp. 146-160
    • Modayil, J.1    White, A.2    Sutton, R.S.3
  • 11
    • 0032114627 scopus 로고    scopus 로고
    • Analytical mean squared error curves for temporal difference learning
    • Singh, S. P., Dayan, P. (1998). Analytical mean squared error curves for temporal difference learning. Machine Learning 52:5-40.
    • (1998) Machine Learning , vol.52 , pp. 5-40
    • Singh, S.P.1    Dayan, P.2
  • 12
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Singh, S. P., Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning 22:123-158.
    • (1996) Machine Learning , vol.22 , pp. 123-158
    • Singh, S.P.1    Sutton, R.S.2
  • 13
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning 3:9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 22
    • 77956517288 scopus 로고    scopus 로고
    • Convergence of least-squares temporal difference methods under general conditions
    • Yu, H. (2010). Convergence of least-squares temporal difference methods under general conditions. In Proceedings of the 27th International Conference on Machine Learning, pp. 1207-1214.
    • (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 1207-1214
    • Yu, H.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.