메뉴 건너뛰기




Volumn 32, Issue 1, 1998, Pages 5-40

Analytical mean squared error curves for temporal difference learning

Author keywords

Bias; Eligibility trace; Markov reward process; Monte Carlo; MSE; Reinforcement learning; Temporal difference; Variance

Indexed keywords

BIAS; ELIGIBILITY TRACE; MARKOV REWARD PROCESS; MEAN SQUARED ERROR CURVES; REINFORCEMENT LEARNING; TEMPORAL DIFFERENCE LEARNING; VARIANCE;

EID: 0032114627     PISSN: 08856125     EISSN: None     Source Type: Journal    
DOI: 10.1023/A:1007495401240     Document Type: Article
Times cited : (38)

References (15)
  • 2
    • 2442603180 scopus 로고
    • Monte Carlo matrix inversion and reinforcement learning
    • San Mateo, CA. Morgan Kaufmann
    • Barto, A. G. & Duff, M. (1994). Monte Carlo matrix inversion and reinforcement learning. In Advances in Neural Information Processing Systems 6, pages 687-694, San Mateo, CA. Morgan Kaufmann.
    • (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 687-694
    • Barto, A.G.1    Duff, M.2
  • 5
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general λ
    • Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8(3/4), 341-362.
    • (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 341-362
    • Dayan, P.1
  • 6
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • Dayan, P. & Sejnowski, T. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
    • (1994) Machine Learning , vol.14 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.2
  • 8
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M. I., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.3
  • 9
    • 85088329770 scopus 로고    scopus 로고
    • Learning curves bounds for Markov decision processes with undiscounted rewards
    • Saul, L. K. & Singh, S. (1996). Learning curves bounds for Markov decision processes with undiscounted rewards. In Proceedings of COLT.
    • (1996) Proceedings of COLT
    • Saul, L.K.1    Singh, S.2
  • 10
    • 0029753630 scopus 로고    scopus 로고
    • Reinforcement learning with replacing eligibility traces
    • Singh, S. & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, Vol. 22, 123-158.
    • (1996) Machine Learning , vol.22 , pp. 123-158
    • Singh, S.1    Sutton, R.S.2
  • 11
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 12
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis, J. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185-202.
    • (1994) Machine Learning , vol.16 , Issue.3 , pp. 185-202
    • Tsitsiklis, J.1
  • 13
    • 84968491631 scopus 로고
    • A note on the inversion of matrices by random walks
    • Wasow, W. R. (1952). A note on the inversion of matrices by random walks. Math. Tables Other Aids Comput., 6, 78-81.
    • (1952) Math. Tables Other Aids Comput. , vol.6 , pp. 78-81
    • Wasow, W.R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.