메뉴 건너뛰기




Volumn 9, Issue 7, 1997, Pages 1403-1419

Mean-Field Theory for Batched TD(λ)

Author keywords

[No Author keywords available]

Indexed keywords


EID: 0003276733     PISSN: 08997667     EISSN: None     Source Type: Journal    
DOI: 10.1162/neco.1997.9.7.1403     Document Type: Article
Times cited : (19)

References (17)
  • 3
    • 0000430514 scopus 로고
    • The convergence of TD(λ) for general lambda
    • Dayan, P. (1992). The convergence of TD(λ) for general lambda. Machine Learning, 8, 341-362.
    • (1992) Machine Learning , vol.8 , pp. 341-362
    • Dayan, P.1
  • 4
    • 0028388685 scopus 로고
    • TD(λ) converges with probability 1
    • Dayan, P., & Sejnowski, T. J. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
    • (1994) Machine Learning , vol.14 , pp. 295-301
    • Dayan, P.1    Sejnowski, T.J.2
  • 7
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M. I., & Singh, S. P. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6, 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 9
    • 0346575867 scopus 로고
    • August 23-26. Theoretical Physics Institute, University of Minnesota
    • Pineda, F. J. (1995, August 23-26). Generalization in TD(λ). Theoretical Physics Institute, University of Minnesota.
    • (1995) Generalization in TD(λ)
    • Pineda, F.J.1
  • 10
    • 0346575866 scopus 로고    scopus 로고
    • Analytical mean squared error curves in temporal difference learning
    • M. Mozer, M. Jordan, & T. Petsche (Eds.), Cambridge, MA: MIT Press
    • Singh, S. P., & Dayan, P. (1996). Analytical mean squared error curves in temporal difference learning. In M. Mozer, M. Jordan, & T. Petsche (Eds.), Advances in Neural Information Processing Systems, 9. Cambridge, MA: MIT Press.
    • (1996) Advances in Neural Information Processing Systems , vol.9
    • Singh, S.P.1    Dayan, P.2
  • 12
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 13
    • 0001046225 scopus 로고
    • Practial issues in temporal difference learning
    • Tesauro, G. (1992). Practial issues in temporal difference learning. Machine Learning, 8, 257-277.
    • (1992) Machine Learning , vol.8 , pp. 257-277
    • Tesauro, G.1
  • 14
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Tesauro, G. (1995). Temporal difference learning and TD-Gammon. Communications of the ACM, 38, 58-68.
    • (1995) Communications of the ACM , vol.38 , pp. 58-68
    • Tesauro, G.1
  • 15
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis, J. N. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16, 185-202.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 16
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • Tsitsiklis, J. N., & Van Roy, B. (1996a). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.