SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems 21 - Proceedings of the 2008 Conference

Volumn , Issue , 2009, Pages 1609-1616

A convergent O(n) algorithm for off-policy temporal-difference learning with linear function approximation

(3) Sutton, Richard S a Szepesvári, Csaba a Maei, Hamid Reza a

a UNIVERSITY OF ALBERTA (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION ALGORITHMS; IMPORTANCE SAMPLING; LEAST SQUARES APPROXIMATIONS; MARKOV PROCESSES; STOCHASTIC SYSTEMS;

BEHAVIOR POLICY; FINITE MARKOV DECISION PROCESS; FUNCTIONS APPROXIMATIONS; LINEAR FUNCTIONS; POLICY EVALUATION; PROCESS BEHAVIOR; STOCHASTIC APPROXIMATIONS; STOCHASTIC GRADIENT DESCENT; TEMPORAL DIFFERENCE LEARNING; TEMPORAL-DIFFERENCE ALGORITHM;

LEARNING ALGORITHMS;

EID: 77956513316 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (332)

References (24)

1
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Morgan Kaufmann
- Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 30-37. Morgan Kaufmann.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 30-37
- Baird, L.C.¹

2
- 0004870198
- Experiments in parameter learning using temporal differences
- Baxter, J., Tridgell, A., Weaver, L. (1998). Experiments in parameter learning using temporal differences. International Computer Chess Association Journal, 21, 84-99.
- (1998) International Computer Chess Association Journal , vol.21 , pp. 84-99
- Baxter, J.¹ Tridgell, A.² Weaver, L.³

3
- 0003487482
- Athena Scientific, 1996
- Bertsekas, D. P., Tsitsiklis. J. (1996). Neuro-Dynamic Programming. Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.²

4
- 0033876515
- O.D.E. method for convergence of stochastic approximation and reinforcement learning
- Borkar, V. S. and Meyn, S. P. (2000). The ODE method for convergence of stochastic approximation and reinforcement learning. SIAM Journal on Control And Optimization, 38(2):447-469. (Pubitemid 30594546)
- (2000) SIAM Journal on Control and Optimization , vol.38 , Issue.2 , pp. 447-469
- Borkar, V.S.¹ Meyn, S.P.²

5
- 0036832950
- Technical update: Least-squares temporal difference learning
- DOI 10.1023/A:1017936530646
- Boyan, J. (2002). Technical update: Least-squares temporal difference learning. Machine Learning, 49:233-246. (Pubitemid 34325688)
- (2002) Machine Learning , vol.49 , Issue.2-3 , pp. 233-246
- Boyan, J.A.¹

6
- 0001771345
- Linear least-squares algorithms for temporal difference learning
- Bradtke, S., Barto, A. G. (1996). Linear least-squares algorithms for temporal difference learning. Machine Learning, 22:33-57. (Pubitemid 126724362)
- (1996) Machine Learning , vol.22 , Issue.1-3 , pp. 33-57
- Bradtke, S.J.¹

7
- 0000430514
- The convergence of TD(λ) for general λ
- Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8:341-362.
- (1992) Machine Learning , vol.8 , pp. 341-362
- Dayan, P.¹

8
- 33750737011
- Incremental least-squares temporal difference learning
- Proceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
- Geramifard, A., Bowling, M., Sutton, R. S. (2006). Incremental least-square temporal difference learning. Proceedings of the National Conference on Artificial Intelligence, pp. 356-361. (Pubitemid 44705310)
- (2006) Proceedings of the National Conference on Artificial Intelligence , vol.1 , pp. 356-361
- Geramifard, A.¹ Bowling, M.² Sutton, R.S.³

9
- 84880694195
- Stable function approximation in dynamic programming
- Morgan Kaufmann, San Francisco
- Gordon, G. J. (1995). Stable function approximation in dynamic programming. Proceedings of the Twelfth International Conference on Machine Learning, pp. 261-268. Morgan Kaufmann, San Francisco.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 261-268
- Gordon, G.J.¹

10
- 4644323293
- Least squares policy iteration
- Lagoudakis, M., Parr, R. (2003). Least squares policy iteration. Journal of Machine Learning Research, 4:1107-1149.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.¹ Parr, R.²

11
- 33646413135
- Natural actor-critic
- Peters, J., Vijayakumar, S. and Schaal, S. (2005). Natural Actor-Critic. Proceedings of the 16th European Conference on Machine Learning, pp. 280-291.
- (2005) Proceedings of the 16th European Conference on Machine Learning , pp. 280-291
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

12
- 4644328593
- Off-policy temporal-difference learning with function approximation
- Precup, D., Sutton, R. S. and Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. Proceedings of the 18th International Conference on Machine Learning, pp. 417-424.
- (2001) Proceedings of the 18th International Conference on Machine Learning , pp. 417-424
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

13
- 70049112133
- Off-policy learning with recognizers
- Precup, D., Sutton, R. S., Paduraru, C., Koop, A., Singh, S. (2006). Off-policy Learning with Recognizers. Advances in Neural Information Processing Systems 18.
- (2006) Advances in Neural Information Processing Systems , vol.18
- Precup, D.¹ Sutton, R.S.² Paduraru, C.³ Koop, A.⁴ Singh, S.⁵

14
- 0242393653
- Eligibility traces for off-policy policy evaluation
- Morgan Kaufmann
- Precup, D., Sutton, R. S., Singh, S. (2000). Eligibility traces for off-policy policy evaluation. Proceedings of the 17th International Conference on Machine Learning, pp. 759-766. Morgan Kaufmann.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.³

15
- 0038145011
- Temporal difference learning applied to a high-performance gameplaying program
- Schaeffer, J., Hlynka, M., Jussila, V. (2001). Temporal difference learning applied to a high-performance gameplaying program. Proceedings of the International Joint Conference on Artificial Intelligence, pp. 529-534.
- (2001) Proceedings of the International Joint Conference on Artificial Intelligence , pp. 529-534
- Schaeffer, J.¹ Hlynka, M.² Jussila, V.³

16
- 84880900542
- Reinforcement learning of local shape in the game of Go
- Silver, D., Sutton, R. S., Müller, M. (2007). Reinforcement learning of local shape in the game of Go. Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 1053-1058.
- (2007) Proceedings of the 20th International Joint Conference on Artificial Intelligence , pp. 1053-1058
- Silver, D.¹ Sutton, R.S.² Müller, M.³

17
- 70049084197
- Feature construction for reinforcement learning in hearts
- Sturtevant, N. R., White, A. M. (2006). Feature construction for reinforcement learning in hearts. In Proceedings of the 5th International Conference on Computers and Games.
- (2006) Proceedings of the 5th International Conference on Computers and Games
- Sturtevant, N.R.¹ White, A.M.²

18
- 33847202724
- Learning to predict by the method of temporal differences
- Sutton, R. S. (1988). Learning to predict by the method of temporal differences. Machine Learning, 3:9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

19
- 0004102479
- MIT Press
- Sutton, R. S., Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

20
- 0033170372
- Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
- DOI 10.1016/S0004-3702(99)00052-1
- Sutton, R.S., Precup D. and Singh, S (1999). Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, 112:181-211. (Pubitemid 32079890)
- (1999) Artificial Intelligence , vol.112 , Issue.1 , pp. 181-211
- Sutton, R.S.¹ Precup, D.² Singh, S.³

21
- 33749265408
- Temporal abstraction in temporal-difference networks
- Sutton, R. S., Rafols, E.J., and Koop, A. 2006. Temporal abstraction in temporal-difference networks. Advances in Neural Information Processing Systems 18.
- (2006) Advances in Neural Information Processing Systems , vol.18
- Sutton, R.S.¹ Rafols, E.J.² Koop, A.³

22
- 0035283402
- On the convergence of temporal-difference learning with linear function approximation
- DOI 10.1023/A:1007609817671
- Tadic, V. (2001). On the convergence of temporal-difference learning with linear function approximation. In Machine Learning 42:241-267 (Pubitemid 32188797)
- (2001) Machine Learning , vol.42 , Issue.3 , pp. 241-267
- Tadic, V.¹

23
- 0031143730
- An analysis of temporal-difference learning with function approximation
- PII S0018928697034375
- Tsitsiklis, J. N., and Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42:674-690. (Pubitemid 127760263)
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

24
- 0004049893
- Ph.D. thesis, Cambridge University
- Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D. thesis, Cambridge University.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.