메뉴 건너뛰기




Volumn , Issue , 2009, Pages 177-184

A theoretical and empirical analysis of expected sarsa

Author keywords

[No Author keywords available]

Indexed keywords

BEHAVIOR POLICY; DIFFERENCE METHOD; EMPIRICAL ANALYSIS; HIGHER LEARNING; LEARNING RATES; MODEL FREE; MULTIPLE DOMAINS; Q-LEARNING; STOCHASTICITY; ZERO VARIANCE;

EID: 67650505307     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ADPRL.2009.4927542     Document Type: Conference Paper
Times cited : (230)

References (12)
  • 4
    • 85012688561 scopus 로고
    • Princeton, NJ.: Princeton University Press
    • R. E. Bellman, Dynamic Programming. Princeton, NJ.: Princeton University Press, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 5
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learning, Vol. 3, pp. 9-44, 1988.
    • (1988) Machine Learning , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 7
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press Cambridge MA
    • J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. MIT Press Cambridge MA, 1995, pp. 369-376.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 369-376
    • Boyan, J.A.1    Moore, A.W.2
  • 8
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • A. Prieditis and S. Russell, Eds. Morgan Kaufmann Publishers, San Francisco, CA
    • G. Gordon, "Stable function approximation in dynamic programming," in Machine Learning: Proceedings of the Twelfth International Conference, A. Prieditis and S. Russell, Eds. Morgan Kaufmann Publishers, San Francisco, CA, 1995, pp. 261-268.
    • (1995) Machine Learning: Proceedings of the Twelfth International Conference , pp. 261-268
    • Gordon, G.1
  • 9
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • A. Prieditis and S. Russell, Eds. Morgan Kaufmann Publishers, San Francisco, CA
    • L. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Machine Learning: Proceedings of the Twelfth International Conference, A. Prieditis and S. Russell, Eds. Morgan Kaufmann Publishers, San Francisco, CA, 1995, pp. 30-37.
    • (1995) Machine Learning: Proceedings of the Twelfth International Conference , pp. 30-37
    • Baird, L.1
  • 11
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • R. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems 8, 1996, pp. 1038-1044.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1044
    • Sutton, R.1
  • 12
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • DOI 10.1023/A:1007678930559
    • S. Singh, T. Jaakkola, M. L. Littman, and C. Szepesvári, "Convergence results for single-step on-policy reinforcement-learning algorithms," Machine Learning, Vol. 38, no. 3, pp. 287-308, 2000. (Pubitemid 30572449)
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.1    Jaakkola, T.2    Littman, M.L.3    Szepesvari, C.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.