SCOPUS 정보 검색 플랫폼

ACM International Conference Proceeding Series

Volumn 382, Issue , 2009, Pages

Fast gradient-descent methods for temporal-difference learning with linear function approximation

(7) Sutton, Richard S a Maei, Hamid Reza a Precup, Doina b Bhatnagar, Shalabh c Silver, David a Szepesv́ari, Csaba a Wiewiora, Eric a

a UNIVERSITY OF ALBERTA (Canada)

b MCGILL UNIVERSITY (Canada)

c INDIAN INSTITUTE OF SCIENCE (India)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL REQUIREMENTS; COMPUTER GO; CONVERGENCE RATES; FUNCTION APPROXIMATORS; GRADIENT-DESCENT; LEARNING RATES; LINEAR FUNCTIONS; OBJECTIVE FUNCTIONS; POLICY PROBLEM; TEMPORAL DIFFERENCE LEARNING; TEMPORAL DIFFERENCES; TEST PROBLEM;

APPROXIMATION ALGORITHMS; COMPUTER APPLICATIONS; CONVERGENCE OF NUMERICAL METHODS; EDUCATION; ROBOT LEARNING;

LEARNING ALGORITHMS;

EID: 70049090437 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1553374.1553501 Document Type: Conference Paper

Times cited : (232)

References (18)

1
- 40849145988
- Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path
- Antos, A., Szepesv́ari, Cs., Munos, R. (2008). Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path.Machine Learning 71:89-129.
- (2008) Machine Learning , vol.71 , pp. 89-129
- Antos, A.¹ Szepesv́ari, C.² Munos, R.³

2
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the 12th Int. Conf. on Machine Learning, pp. 30-37.
- (1995) Proceedings of the 12th Int. Conf. on Machine Learning , pp. 30-37
- Baird, L.C.¹

3
- 0005977690
- PhD thesis, Carnegie-Mellon University
- Baird, L. C. (1999). Reinforcement Learning Through Gradient Descent. PhD thesis, Carnegie-Mellon University.
- (1999) Reinforcement Learning Through Gradient Descent
- Baird, L.C.¹

4
- 0027554566
- Temporal-difference methods and Markov models
- Barnard, E. (1993). Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics 23(2):357-365.
- (1993) IEEE Transactions on Systems, Man, and Cybernetics , vol.23 , Issue.2 , pp. 357-365
- Barnard, E.¹

5
- 0031076413
- Stochastic approximation with two timescales
- Borkar, V. S. (1997). Stochastic approximation with two timescales. Systems and Control Letters 29:291-294.
- (1997) Systems and Control Letters , vol.29 , pp. 291-294
- Borkar, V.S.¹

6
- 0033876515
- The ODE method for convergence of stochastic approximation and reinforcement learning
- Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control And Optimization 38(2):447-469.
- (2000) SIAM Journal on Control And Optimization , vol.38 , Issue.2 , pp. 447-469
- Borkar, V.S.¹ Meyn, S.P.²

7
- 0036832950
- Technical update: Least-squares temporal difference learning
- Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning 49:233-246.
- (2002) Machine Learning , vol.49 , pp. 233-246
- Boyan, J.¹

8
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- Bradtke, S., Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning 22:33-57.
- (1996) Machine Learning , vol.22 , pp. 33-57
- Bradtke, S.¹ Barto, A.G.²

9
- 0000430514
- The convergence of TD(λ) for general λ
- Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning 8:341-362.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.¹

10
- 33750737011
- Incremental least-square temporal difference learning
- Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings AAAI, pp. 356-361.
- (2006) Proceedings AAAI , pp. 356-361
- Geramifard, A.¹ Bowling, M.² Sutton, R.S.³

11
- 0024909476
- Convergent activation dynamics in continuous time networks
- Hirsch, M. W. (1989). Convergent activation dynamics in continuous time networks. Neural Networks 2:331-349.
- (1989) Neural Networks , vol.2 , pp. 331-349
- Hirsch, M.W.¹

12
- 4644328593
- Offpolicy temporal-difference learning with function approximation
- Precup, D., Sutton, R. S. and Dasgupta, S. (2001). Offpolicy temporal-difference learning with function approximation. Proceedings of the 18th International Conference on Machine Learning, pp. 417-424.
- (2001) Proceedings of the 18th International Conference on Machine Learning , pp. 417-424
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

13
- 70049112133
- Off-policy learning with recognizers
- Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S. (2006). Off-policy learning with recognizers. Advances in Neural Information Processing Systems 18.
- (2006) Advances in Neural Information Processing Systems , vol.18
- Precup, D.¹ Sutton, R.S.² Paduraru, C.³ Koop, A.⁴ Singh, S.⁵

14
- 84880900542
- Reinforcement learning of local shape in the game of Go
- Silver, D., Sutton, R. S., Müller, M. (2007). Reinforcement learning of local shape in the game of Go. Proceedings of the 20th IJCAI, pp. 1053-1058.
- (2007) Proceedings of the 20th IJCAI , pp. 1053-1058
- Silver, D.¹ Sutton, R.S.² Müller, M.³

15
- 70049084197
- Feature construction for reinforcement learning in hearts
- Sturtevant, N. R., White, A. M. (2006). Feature construction for reinforcement learning in hearts. In Proceedings of the 5th International Conf. on Computers and Games.
- (2006) Proceedings of the 5th International Conf. on Computers and Games
- Sturtevant, N.R.¹ White, A.M.²

16
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning 3:9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

17
- 0033170372
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- Sutton, R. S., Precup D., and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112:181-211.
- (1999) Artificial Intelligence , vol.112 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

18
- 77956513316
- A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation
- Sutton, R. S., Szepesv́ari, Cs., Maei, H. R. (2009). A convergent O(n) algorithm for off-policy temporaldifference learning with linear function approximation. Advances in Neural Information Processing Systems 21.
- (2009) Advances in Neural Information Processing Systems , vol.21
- Sutton, R.S.¹ Szepesv́ari, C.² Maei, H.R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.