SCOPUS 정보 검색 플랫폼

Volumn 42, Issue 5, 1997, Pages 674-690

An analysis of temporal-difference learning with function approximation

(2) Tsitsiklis, John N a,b Van Roy, Benjamin b

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Dynamic programming; Function approximation; Markov chains; Neuro dynamic programming; Renforcement learning; Temporal difference learning

Indexed keywords

APPROXIMATION THEORY; CONVERGENCE OF NUMERICAL METHODS; COSTS; DYNAMIC PROGRAMMING; MARKOV PROCESSES; OPTIMAL CONTROL SYSTEMS; PARAMETER ESTIMATION; PROBABILITY; STATE SPACE METHODS;

COST TO GO FUNCTION APPROXIMATION; MARKOV CHAIN; REINFORCEMENT LEARNING; TEMPORAL DIFFERENCE LEARNING;

LEARNING ALGORITHMS;

EID: 0031143730 PISSN: 00189286 EISSN: None Source Type: Journal
DOI: 10.1109/9.580874 Document Type: Article

Times cited : (1135)

References (25)

1
- 0003565783
- Belmont, MA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific, 1995.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

2
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learning, vol. 3, pp. 9-44, 1988.
- (1988) Mach. Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

3
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learning, vol. 8, pp. 279-292, 1992.
- (1992) Mach. Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

4
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- J. N. Tsitsiklis, "Asynchronous stochastic approximation and Q-learning," Mach. Learning, vol. 16, pp. 185-202, 1994.
- (1994) Mach. Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

5
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakkola, M. I. Jordan, and S. P. Singh, "On the convergence of stochastic iterative dynamic programming algorithms," Neural Comp., vol. 6, no. 6, pp. 1185-1201, 1994.
- (1994) Neural Comp. , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.P.³

6
- 0028388685
- TD(λ) converges with probability 1
- P. D. Dayan and T. J. Sejnowski, "TD(λ) converges with probability 1," Mach. Learning, vol. 14, pp. 295-301, 1994.
- (1994) Mach. Learning , vol.14 , pp. 295-301
- Dayan, P.D.¹ Sejnowski, T.J.²

7
- 0003786198
- preprint
- L. Gurvits, L. J. Lin, and S. J. Hanson, "Incremental learning of evaluation functions for absorbing Markov chains: New methods and theorems," 1994, preprint.
- (1994) Incremental Learning of Evaluation Functions for Absorbing Markov Chains: New Methods and Theorems
- Gurvits, L.¹ Lin, L.J.² Hanson, S.J.³

8
- 0000430514
- The convergence of TD(λ) for general λ
- P. D. Dayan, "The convergence of TD(λ) for general λ" Mach. Learning, vol. 8, pp. 341-362, 1992.
- (1992) Mach. Learning , vol.8 , pp. 341-362
- Dayan, P.D.¹

9
- 0013419177
- On the worst-case analysis of temporal-difference learning algorithms
- R. E. Schapire and M. K. Warmuth, "On the worst-case analysis of temporal-difference learning algorithms," Mach. Learning, vol. 22, pp. 95-122, 1996.
- (1996) Mach. Learning , vol.22 , pp. 95-122
- Schapire, R.E.¹ Warmuth, M.K.²

10
- 0029752470
- Feature-based methods for large scale dynamic programming
- J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large scale dynamic programming," Mach. Learning, vol. 22, pp. 59-94, 1996.
- (1996) Mach. Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

11
- 0038595393
- Stable function approximation in dynamic programming
- G. J. Gordon, "Stable function approximation in dynamic programming," Carnegie Melon Univ., Tech. Rep. CMU-CS-95-103, 1995.
- (1995) Carnegie Melon Univ., Tech. Rep. CMU-CS-95-103
- Gordon, G.J.¹

12
- 0000224681
- Reinforcement learning with soft state aggregation
- G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds Cambridge, MA: MIT Press
- S. P. Singh, T. Jaakkola, and M. I. Jordan, "Reinforcement learning with soft state aggregation," in Advances in Neural Information Processing Systems, vol. 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds Cambridge, MA: MIT Press, 1995.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

13
- 85151728371
- Residual algorithms Reinforcement learning with function approximation
- July 9-12, Prieditis and Russell, Eds. San Francisco, CA: Morgan Kaufman
- L. C. Baird, " Residual algorithms Reinforcement learning with function approximation," in Machine Learning: Proceedings 12th Int. Conf., July 9-12, Prieditis and Russell, Eds. San Francisco, CA: Morgan Kaufman, 1995.
- (1995) Machine Learning: Proceedings 12th Int. Conf.
- Baird, L.C.¹

14
- 0001133021
- Generalization in reinforcement learning: Safely approximating the value function
- MIT Press
- J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems, vol. 7. MIT Press, 1995.
- (1995) Advances in Neural Information Processing Systems , vol.7
- Boyan, J.A.¹ Moore, A.W.²

15
- 33746944751
- On the virtues of linear learning and trajectory distributions
- Carnegie Mellon Univ., Tech. Rep. CMU-CS-95-206
- R. S. Sutton, "On the virtues of linear learning and trajectory distributions," in Proc. Wkshp. Value Function Approximation, Mach. Learning Conf., Carnegie Mellon Univ., Tech. Rep. CMU-CS-95-206, 1995.
- (1995) Proc. Wkshp. Value Function Approximation, Mach. Learning Conf.
- Sutton, R.S.¹

16
- 0003487482
- Belmont, MA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

17
- 33746958635
- private communication
- L. Gurvits, 1996, private communication.
- (1996)
- Gurvits, L.¹

18
- 84866222418
- preprint
- F. Pineda, "Mean-field analysis for batched TD(λ)," 1996, preprint.
- (1996) Mean-field Analysis for Batched TD(λ)
- Pineda, F.¹

19
- 0003778897
- Berlin: Springer-Verlag
- A. Benveniste, M. Métivier, and P. Prioret, Adaptive Algorithms and Stochastic Approximations. Berlin: Springer-Verlag, 1990.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Métivier, M.² Prioret, P.³

20
- 0003944095
- Englewood Cliffs, NJ: Prentice Hall
- J. Walrand, An Introduction to Queueing Networks. Englewood Cliffs, NJ: Prentice Hall, 1988.
- (1988) An Introduction to Queueing Networks
- Walrand, J.¹

21
- 33746944242
- On the settling time of the congested GI/G/1 queue
- G. D. Stamoulis and J. N. Tsitsiklis, "On the settling time of the congested GI/G/1 queue," Adv. Appl. Probability, vol. 22, pp. 929-956, 1990.
- (1990) Adv. Appl. Probability , vol.22 , pp. 929-956
- Stamoulis, G.D.¹ Tsitsiklis, J.N.²

22
- 0010940692
- On the cut-off phenomenon in some queueing systems
- P. Konstantopoulos and F. Baccelli, "On the cut-off phenomenon in some queueing systems," J. Appl. Probability, vol. 28, pp. 683-694, 1991.
- (1991) J. Appl. Probability , vol.28 , pp. 683-694
- Konstantopoulos, P.¹ Baccelli, F.²

23
- 0000268954
- A counterexample to temporal-difference learning
- D. P. Bertsekas, "A counterexample to temporal-difference learning," Neural Comp., vol. 7, pp. 270-279, 1994.
- (1994) Neural Comp. , vol.7 , pp. 270-279
- Bertsekas, D.P.¹

24
- 0029753630
- Reinforcement learning with replacing eligibility traces
- S. P. Singh and R. S. Sutton, "Reinforcement learning with replacing eligibility traces," Mach. Learning, vol. 22, pp. 123-158, 1996.
- (1996) Mach. Learning , vol.22 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

25
- 0000723997
- Generalization in reinforcement learning: Successful examples using sparse coarse coding
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press
- R. S. Sutton, "Generalization in reinforcement learning: Successful examples using sparse coarse coding," in Advances in Neural Information Processing Systems, vol. 8, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Sutton, R.S.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.