SCOPUS 정보 검색 플랫폼

Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science)

Volumn 3201, Issue , 2004, Pages 477-488

Convergence and divergence in standard and averaging reinforcement learning

(1) Wiering, Marco A a

a UTRECHT UNIVERSITY (Netherlands)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; DECISION THEORY; DYNAMIC PROGRAMMING; FUNCTIONS; ITERATIVE METHODS; MARKOV PROCESSES; PROBABILITY; ARTIFICIAL INTELLIGENCE; LEARNING SYSTEMS;

FUNCTION APPROXIMATORS (FA); LINEAR FUNCTIONS; REINFORCEMENT LEARNING; VALUE ITERATION (VI); CONVENTIONAL REINFORCEMENT LEARNING; DISCOUNT FACTORS; FUNCTION APPROXIMATORS; OPTIMAL POLICIES; REINFORCEMENT LEARNING METHOD; VALUE FUNCTIONS; VALUE ITERATION;

LEARNING SYSTEMS; REINFORCEMENT LEARNING;

EID: 22944460232 PISSN: 03029743 EISSN: None Source Type: Conference Proceeding
DOI: 10.1007/978-3-540-30115-8_44 Document Type: Conference Paper

Times cited : (29)

References (18)

1
- 49649148257
- A theory of cerebellar function
- J. S. Albus. A theory of cerebellar function. Mathematical Biosciences, 10:25-61, 1975.
- (1975) Mathematical Biosciences , vol.10 , pp. 25-61
- Albus, J.S.¹

2
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- A. Prieditis and S. Russell, editors Morgan Kaufmann Publishers, San Francisco, CA
- L. Baird. Residual algorithms: Reinforcement learning with function approximation. In A. Prieditis and S. Russell, editors, Machine Learning: Proceedings of the Twelfth International Conference, pages 30-37. Morgan Kaufmann Publishers, San Francisco, CA, 1995.
- (1995) Machine Learning: Proceedings of the Twelfth International Conference , pp. 30-37
- Baird, L.¹

3
- 0003787146
- Princeton University Press
- R. Bellman. Dynamic Programming. Princeton University Press, 1957.
- (1957) Dynamic Programming
- Bellman, R.¹

4
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- G. Tesauro, D. S. Touretzky, and T. K. Leen, editors. MIT Press, Cambridge MA
- J. A. Boyan and A. W. Moore. Generalization in reinforcement learning: Safely approximating the value function. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, pages 369-376. MIT Press, Cambridge MA, 1995.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 369-376
- Boyan, J.A.¹ Moore, A.W.²

5
- 0038595393
- Stable function approximation in dynamic programming
- Carnegie Mellon University
- G.J. Gordon. Stable function approximation in dynamic programming. Technical Report CMU-CS-95-103, Carnegie Mellon University, 1995.
- (1995) Technical Report CMU-CS-95-103
- Gordon, G.J.¹

6
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. I. Jordan, and S. P. Singh. On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6:1185-1201, 1994.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

7
- 0029679044
- Reinforcement learning: A survey
- L. P. Kaelbling, M. L. Littman, and A. W. Moore. Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4:237-285, 1996.
- (1996) Journal of Artificial Intelligence Research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

8
- 22944468429
- A convergent form of approximate policy iteration
- Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors. MIT Press
- T.J. Perkins and D. Precup. A convergent form of approximate policy iteration. In Todd K. Leen, Thomas G. Dietterich, and Volker Tresp, editors, Advances in Neural Information Processing Systems 13. MIT Press, 2002.
- (2002) Advances in Neural Information Processing Systems , vol.13
- Perkins, T.J.¹ Precup, D.²

9
- 22944479888
- The stability of general discounted reinforcement learning with linear function approximation
- S.I. Reynolds. The stability of general discounted reinforcement learning with linear function approximation. In Proceedings of the UK Workshop on Computational Intelligence (UKCI-02), pages 139-146, 2002.
- (2002) Proceedings of the UK Workshop on Computational Intelligence (UKCI-02) , pp. 139-146
- Reynolds, S.I.¹

10
- 0345161982
- On-line Q-learning using connectionist sytems
- Cambridge University, UK
- G.A. Rummery and M. Niranjan. On-line Q-learning using connectionist sytems. Technical Report CUED/F-INFENG-TR 166, Cambridge University, UK, 1994.
- (1994) Technical Report , vol.CUED-F-INFENG-TR 166
- Rummery, G.A.¹ Niranjan, M.²

11
- 0033901602
- Convergence results for single-step on-policy reinforcement-learning algorithms
- S.P. Singh, T. Jaakkola, M.L. Littman, and C. Szepesvari. Convergence results for single-step on-policy reinforcement-learning algorithms. Machine Learning, 38(3):287-308, 2000.
- (2000) Machine Learning , vol.38 , Issue.3 , pp. 287-308
- Singh, S.P.¹ Jaakkola, T.² Littman, M.L.³ Szepesvari, C.⁴

12
- 85156221438
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors. MIT Press, Cambridge MA
- R. S. Sutton. Generalization in reinforcement learning: Successful examples using sparse coarse coding. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8, pages 1038-1045. MIT Press, Cambridge MA, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1038-1045
- Sutton, R.S.¹

13
- 0004102479
- The MIT press, Cambridge MA, A Bradford Book
- R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. The MIT press, Cambridge MA, A Bradford Book, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

14
- 22944477020
- Convergent value function approximation methods
- C. Szepesvari and W.D. Smart. Convergent value function approximation methods. 2004. Accepted in the International Conference om Machine Learning (ICML'04).
- (2004) Accepted in the International Conference Om Machine Learning (ICML'04)
- Szepesvari, C.¹ Smart, W.D.²

15
- 0029276036
- Temporal difference learning and TD-Gammon
- G.J. Tesauro. Temporal difference learning and TD-Gammon. Communications of the ACM, 38:58-68, 1995.
- (1995) Communications of the ACM , vol.38 , pp. 58-68
- Tesauro, G.J.¹

16
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- J. N Tsitsiklis. Asynchronous stochastic approximation and Q-learning. Machine Learning, 16:185-202, 1994.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

17
- 0004049893
- PhD thesis, King's College, Cambridge, England
- C. J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, England, 1989.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

18
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan. Q-learning. Machine Learning, 8:279-292, 1992.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.