메뉴 건너뛰기




Volumn , Issue , 2008, Pages 664-671

An analysis of reinforcement learning with function approximation

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; MACHINE LEARNING; STOCHASTIC SYSTEMS; FUNCTIONS; LEARNING SYSTEMS; PROBABILITY DENSITY FUNCTION; REINFORCEMENT; REINFORCEMENT LEARNING; ROBOT LEARNING;

EID: 56449091120     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1390156.1390240     Document Type: Conference Paper
Times cited : (229)

References (24)
  • 1
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • Baird, L. (1995). Residual algorithms: Reinforcement learning with function approximation. Proc. 12th Int. Conf. Machine Learning (pp. 30-37).
    • (1995) Proc. 12th Int. Conf. Machine Learning , pp. 30-37
    • Baird, L.1
  • 4
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • Borkar, V. (1997). Stochastic approximation with two time scales. Systems & Control Letters, 29, 291-294.
    • (1997) Systems & Control Letters , vol.29 , pp. 291-294
    • Borkar, V.1
  • 6
    • 0034342516 scopus 로고    scopus 로고
    • On the existence of fixed points for approximate value iteration and temporal-difference learning
    • de Farias, D., & Van Roy, B. (2000). On the existence of fixed points for approximate value iteration and temporal-difference learning. Journal of Optimization Theory and Applications, 105, 589-608.
    • (2000) Journal of Optimization Theory and Applications , vol.105 , pp. 589-608
    • de Farias, D.1    Van Roy, B.2
  • 7
    • 0030487036 scopus 로고    scopus 로고
    • Logarithmic Sobolev inequalities for finite Markov chains
    • Diaconis, P., & Saloff-Coste, L. (1996). Logarithmic Sobolev inequalities for finite Markov chains. Annals of Applied Probability, 6, 695-750.
    • (1996) Annals of Applied Probability , vol.6 , pp. 695-750
    • Diaconis, P.1    Saloff-Coste, L.2
  • 8
    • 0038595393 scopus 로고
    • Stable function approximation in dynamic programming
    • CMU-CS-95-103, School of Computer Science, Carnegie Mellon University
    • Gordon, G. (1995). Stable function approximation in dynamic programming (Technical Report CMU-CS-95-103). School of Computer Science, Carnegie Mellon University.
    • (1995) Technical Report
    • Gordon, G.1
  • 9
    • 57649089060 scopus 로고    scopus 로고
    • λ, Technical Report, CMU Learning Lab Internal Report
    • Gordon, G. (1996). Chattering in SARSA(λ). (Technical Report). CMU Learning Lab Internal Report.
    • (1996) Chattering in SARSA
    • Gordon, G.1
  • 11
    • 0000566364 scopus 로고
    • Computable bounds for geometric convergence rates of Markov chains
    • Meyn, S., & Tweedie, R. (1994). Computable bounds for geometric convergence rates of Markov chains. Annals of Applied Probability, 4, 981-1011.
    • (1994) Annals of Applied Probability , vol.4 , pp. 981-1011
    • Meyn, S.1    Tweedie, R.2
  • 12
    • 0036832956 scopus 로고    scopus 로고
    • Kernel-based reinforcement learning
    • Ormoneit, D., & Sen, S. (2002). Kernel-based reinforcement learning. Machine Learning, 49, 161-178.
    • (2002) Machine Learning , vol.49 , pp. 161-178
    • Ormoneit, D.1    Sen, S.2
  • 13
    • 56449099734 scopus 로고    scopus 로고
    • On the existence of fixed-points for Q-learning and SARSA in partially observable domains
    • Perkins, T., & Pendrith, M. (2002). On the existence of fixed-points for Q-learning and SARSA in partially observable domains. Proc. 19th Int. Conf. Machine Learning (pp. 490-497).
    • (2002) Proc. 19th Int. Conf. Machine Learning , pp. 490-497
    • Perkins, T.1    Pendrith, M.2
  • 16
    • 56449114755 scopus 로고    scopus 로고
    • Ribeiro, C., & Szepesvári, C. (1996). Q-learning combined with spreading: Convergence and results. Proc. ISRF-IEE Int. Conf. Intelligent and Cognitive Systems (pp. 32-36).
    • Ribeiro, C., & Szepesvári, C. (1996). Q-learning combined with spreading: Convergence and results. Proc. ISRF-IEE Int. Conf. Intelligent and Cognitive Systems (pp. 32-36).
  • 17
    • 3042638629 scopus 로고    scopus 로고
    • Quantitative convergence rates of Markov chains: A simple account
    • Rosenthal, J. (2002). Quantitative convergence rates of Markov chains: A simple account. Electronic Communications in Probability, 7, 123-128.
    • (2002) Electronic Communications in Probability , vol.7 , pp. 123-128
    • Rosenthal, J.1
  • 19
    • 84947807317 scopus 로고    scopus 로고
    • Open theoretical questions in reinforcement learning
    • Sutton, R. (1999). Open theoretical questions in reinforcement learning. Lecture Notes in Computer Science, 1572, 11-17.
    • (1999) Lecture Notes in Computer Science , vol.1572 , pp. 11-17
    • Sutton, R.1
  • 21
    • 0035283402 scopus 로고    scopus 로고
    • On the convergence of temporal-difference learning with linear function approximation
    • Tadić, V. (2001). On the convergence of temporal-difference learning with linear function approximation. Machine Learning, 42, 241-267.
    • (2001) Machine Learning , vol.42 , pp. 241-267
    • Tadić, V.1
  • 22
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • Tsitsiklis, J., & Van Roy, B. (1996a). An analysis of temporal-difference learning with function approximation. IEEE Trans. Automatic Control, 42, 674-690.
    • (1996) IEEE Trans. Automatic Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.1    Van Roy, B.2
  • 23
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • Tsitsiklis, J., & Van Roy, B. (1996b). Feature-based methods for large scale dynamic programming. Machine Learning, 22, 59-94.
    • (1996) Machine Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.1    Van Roy, B.2
  • 24
    • 0004049893 scopus 로고
    • Doctoral dissertation, King's College, University of Cambridge
    • Watkins, C. (1989). Learning from delayed rewards. Doctoral dissertation, King's College, University of Cambridge.
    • (1989) Learning from delayed rewards
    • Watkins, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.