메뉴 건너뛰기




Volumn , Issue , 1994, Pages 369-376

Generalization in Reinforcement Learning: Safely Approximating the Value Function

Author keywords

[No Author keywords available]

Indexed keywords

REINFORCEMENT LEARNING; TABLE LOOKUP;

EID: 85153940465     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (312)

References (16)
  • 2
    • 84968468700 scopus 로고
    • Polynomial approximation-a new computational technique in dynamic programming; Allocation processes
    • R. Bellman, R. Kalaba, and B. Kotkin. Polynomial approximation-a new computational technique in dynamic programming; Allocation processes. Mathematics of Computation, 17, 1963.
    • (1963) Mathematics of Computation , vol.17
    • Bellman, R.1    Kalaba, R.2    Kotkin, B.3
  • 4
    • 0000859970 scopus 로고
    • Reinforcement learning applied to linear quadratic regulation
    • S S. J, Hanson, J. Cowan, and C. L. Giles, editors, Morgan Kaufmann
    • S, J. Bradtke. Reinforcement learning applied to linear quadratic regulation. In S. J, Hanson, J. Cowan, and C. L. Giles, editors, NIPS-5. Morgan Kaufmann, 1993.
    • (1993) NIPS-5
    • Bradtke, J.1
  • 5
    • 84947318637 scopus 로고
    • Locally weighted regression'. An approach to regression analysis by local fitting
    • W, September
    • W, S. Cleveland and S. J. Delvin. Locally weighted regression'. An approach to regression analysis by local fitting. JASA, 83 (403):596-610, September 1988.
    • (1988) JASA , vol.83 , Issue.403 , pp. 596-610
    • Cleveland, S.1    Delvin, S. J.2
  • 9
    • 85153930412 scopus 로고
    • Using TD (A) to learn an evaluation function for the game of Go
    • J. D. Cowan, G. Tesauro, and J. Alspector, editors, Morgan Kaufmann
    • N. Schraudolph, P. Dayan, and T. Sejnowski. Using TD (A) to learn an evaluation function for the game of Go. In J. D. Cowan, G. Tesauro, and J. Alspector, editors, NÍPS-6. Morgan Kaufmann, 1994.
    • (1994) NÍPS-6
    • Schraudolph, N.1    Dayan, P.2    Sejnowski, T.3
  • 10
    • 0028497385 scopus 로고
    • An upper bound on the toss from approximate optimal-value functions
    • Technical Note (to appear)
    • S. P. Singh and R. Yee. An upper bound on the toss from approximate optimal-value functions. Machine Learning, 1994. Technical Note (to appear).
    • (1994) Machine Learning
    • Singh, S. P.1    Yee, R.2
  • 11
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. Sutton. Learning to predict by the methods of temporal differences. Machine Learning, 3, 1988.
    • (1988) Machine Learning , vol.3
    • Sutton, R.1
  • 12
    • 0001046225 scopus 로고
    • Practical issues in temporal difference learning
    • (/4), May
    • G. Tesauro. Practical issues in temporal difference learning. Machine Learning, 8 (3/4), May 1992.
    • (1992) Machine Learning , vol.8 , Issue.3
    • Tesauro, G.1
  • 16
    • 5844332810 scopus 로고
    • Technical Report COINS 92-16, Univ. of Massachusetts
    • R. Yee. Abstraction in control learning. Technical Report COINS 92-16, Univ. of Massachusetts, 1992.
    • (1992) Abstraction in control learning
    • Yee, R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.