메뉴 건너뛰기




Volumn 3201, Issue , 2004, Pages 477-488

Convergence and divergence in standard and averaging reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; DECISION THEORY; DYNAMIC PROGRAMMING; FUNCTIONS; ITERATIVE METHODS; MARKOV PROCESSES; PROBABILITY; ARTIFICIAL INTELLIGENCE; LEARNING SYSTEMS;

EID: 22944460232     PISSN: 03029743     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1007/978-3-540-30115-8_44     Document Type: Conference Paper
Times cited : (29)

References (18)
  • 1
    • 49649148257 scopus 로고
    • A theory of cerebellar function
    • J. S. Albus. A theory of cerebellar function. Mathematical Biosciences, 10:25-61, 1975.
    • (1975) Mathematical Biosciences , vol.10 , pp. 25-61
    • Albus, J.S.1
  • 2
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • A. Prieditis and S. Russell, editors Morgan Kaufmann Publishers, San Francisco, CA
    • L. Baird. Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 30-37. Morgan Kaufmann Publishers, San Francisco, CA, 1995.
    • (1995) Machine Learning: Proceedings of the Twelfth International Conference , pp. 30-37
    • Baird, L.1
  • 4
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, editors. MIT Press, Cambridge MA
    • J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 369-376. MIT Press, Cambridge MA, 1995.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 369-376
    • Boyan, J.A.1    Moore, A.W.2
  • 5
    • 0038595393 scopus 로고
    • Stable function approximation in dynamic programming
    • Carnegie Mellon University
    • G.J. Gordon. Stable function approximation in dynamic programming. Technical Report CMU-CS-95-103, Carnegie Mellon University, 1995.
    • (1995) Technical Report CMU-CS-95-103
    • Gordon, G.J.1
  • 6
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6:1185-1201, 1994.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 8
    • 22944468429 scopus 로고    scopus 로고
    • A convergent form of approximate policy iteration
    • Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors. MIT Press
    • T.J. Perkins and D. Precup. A convergent form of approximate policy iteration. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13. MIT Press, 2002.
    • (2002) Advances in Neural Information Processing Systems , vol.13
    • Perkins, T.J.1    Precup, D.2
  • 10
    • 0345161982 scopus 로고
    • On-line Q-learning using connectionist sytems
    • Cambridge University, UK
    • G.A. Rummery and M. Niranjan. On-line Q-learning using connectionist sytems. Technical Report CUED/F-INFENG-TR 166, Cambridge University, UK, 1994.
    • (1994) Technical Report , vol.CUED-F-INFENG-TR 166
    • Rummery, G.A.1    Niranjan, M.2
  • 11
    • 0033901602 scopus 로고    scopus 로고
    • Convergence results for single-step on-policy reinforcement-learning algorithms
    • S.P. Singh, T. Jaakkola, M.L. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
    • (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
    • Singh, S.P.1    Jaakkola, T.2    Littman, M.L.3    Szepesvari, C.4
  • 12
    • 85156221438 scopus 로고    scopus 로고
    • Generalization in reinforcement learning: Successful examples using sparse coarse coding
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors. MIT Press, Cambridge MA
    • R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038-1045. MIT Press, Cambridge MA, 1996.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1045
    • Sutton, R.S.1
  • 15
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • G.J. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58-68, 1995.
    • (1995) Communications of the ACM , vol.38 , pp. 58-68
    • Tesauro, G.J.1
  • 16
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185-202, 1994.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.