SCOPUS 정보 검색 플랫폼

31st International Conference on Machine Learning, ICML 2014

Volumn 3, Issue , 2014, Pages 1973-1988

A new Q(λ) with interim forward view and Monte Carlo equivalence

(4) Sutton, Richard S a Mahmood, A Rupam a Precup, Doina b Van Hasselt, Hado a

a UNIVERSITY OF ALBERTA (Canada)

b MCGILL UNIVERSITY (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; MONTE CARLO METHODS; PARALLEL PROCESSING SYSTEMS; PARAMETER ESTIMATION; REINFORCEMENT LEARNING;

ALGORITHMIC COMPLEXITY; ASYMPTOTIC PERFORMANCE; DERIVATION TECHNIQUES; ELIGIBILITY TRACES; EN-ROUTE; Q-LEARNING; RAPID LEARNING; TIME STEP;

LEARNING ALGORITHMS;

EID: 84919913727 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (20)

References (22)

1
- 77956543059
- Temporal difference bayesian model averaging: A bayesian perspective on adapting lambda
- Downey, C., Sanner, S. (2010). Temporal difference Bayesian model averaging: A Bayesian perspective on adapting lambda. In Proceedings of the 27th International Conference on Machine Learning, pp. 311-318.
- (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 311-318
- Downey, C.¹ Sanner, S.²

2
- 84897081792
- Off-policy learning with eligibility traces: A survey
- Geist, M., Scherrer, B. (2014). Off-policy learning with eligibility traces: A survey. Journal of Machine Learning Research 15:289-333.
- (2014) Journal of Machine Learning Research , vol.15 , pp. 289-333
- Geist, M.¹ Scherrer, B.²

3
- 26944457467
- Bias-variance error bounds for temporal difference updates
- Kearns, M. J., Singh, S. P. (2000). Bias-variance error bounds for temporal difference updates. In Proceedings of the 13th Annual Conference on Computational Learning Theory, pp. 142-147.
- (2000) Proceedings of the 13th Annual Conference on Computational Learning Theory , pp. 142-147
- Kearns, M.J.¹ Singh, S.P.²

4
- 84864655352
- PhD thesis, University of Alberta
- Maei, H. R. (2011). Gradient Temporal-Difference Learning Algorithms. PhD thesis, University of Alberta.
- (2011) Gradient Temporal-Difference Learning Algorithms
- Maei, H.R.¹

5
- 77954101982
- GQ(A): A general gradient algorithm for temporal-difference prediction learning with eligibility traces
- Atlantis Press
- Maei, H. R., Sutton, R. S. (2010). GQ(A): A general gradient algorithm for temporal-difference prediction learning with eligibility traces. In Proceedings of the Third Conference on Artificial General Intelligence, pp. 91-96. Atlantis Press.
- (2010) Proceedings of the Third Conference on Artificial General Intelligence , pp. 91-96
- Maei, H.R.¹ Sutton, R.S.²

6
- 84896357393
- Multi-timescale nexting in a reinforcement learning robot
- Modayil, J., White, A., Sutton, R. S. (2014). Multi-timescale nexting in a reinforcement learning robot. Adaptive Behavior 22(2): 146-160.
- (2014) Adaptive Behavior , vol.22 , Issue.2 , pp. 146-160
- Modayil, J.¹ White, A.² Sutton, R.S.³

7
- 0010932382
- PhD thesis, Northeastern University, Boston
- Peng, J. (1993). Efficient Dynamic Programming-Based Learning for Control. PhD thesis, Northeastern University, Boston.
- (1993) Efficient Dynamic Programming-Based Learning for Control
- Peng, J.¹

8
- 0242393653
- Eligibility traces for off-policy policy evaluation
- Morgan Kaufmann
- Precup, D., Sutton, R. S., Singh, S. (2000). Eligibility traces for off-policy policy evaluation. In Proceedings of the 17th International Conference on Machine Learning, pp. 759-766. Morgan Kaufmann.
- (2000) Proceedings of the 17th International Conference on Machine Learning , pp. 759-766
- Precup, D.¹ Sutton, R.S.² Singh, S.³

9
- 4644328593
- Off-policy temporal-difference learning with function approximation
- Precup, D., Sutton, R. S., Dasgupta, S. (2001). Off-policy temporal-difference learning with function approximation. In Proceedings of the 18th International Conference on Machine Learning, pp. 417-424.
- (2001) Proceedings of the 18th International Conference on Machine Learning , pp. 417-424
- Precup, D.¹ Sutton, R.S.² Dasgupta, S.³

10
- 0003677359
- PhD thesis, Cambridge University
- Rummery, G. A. (1995). Problem Solving with Reinforcement Learning. PhD thesis, Cambridge University.
- (1995) Problem Solving with Reinforcement Learning
- Rummery, G.A.¹

11
- 0032114627
- Analytical mean squared error curves for temporal difference learning
- Singh, S. P., Dayan, P. (1998). Analytical mean squared error curves for temporal difference learning. Machine Learning 52:5-40.
- (1998) Machine Learning , vol.52 , pp. 5-40
- Singh, S.P.¹ Dayan, P.²

12
- 0029753630
- Reinforcement learning with replacing eligibility traces
- Singh, S. P., Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning 22:123-158.
- (1996) Machine Learning , vol.22 , pp. 123-158
- Singh, S.P.¹ Sutton, R.S.²

13
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning 3:9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

14
- 84922015064
- TD models: Modeling the world at a mixture of time scales
- Morgan Kaufmann
- Sutton, R. S. (1995). TD models: Modeling the world at a mixture of time scales. In Proceedings of the Twelfth International Conference on Machine Learning, pp. 531- 539. Morgan Kaufmann.
- (1995) Proceedings of the Twelfth International Conference on Machine Learning , pp. 531-539
- Sutton, R.S.¹

15
- 0004102479
- MIT Press
- Sutton, R. S., Barto, A. G. (1998). Reinforcement Learning: An Introduction. MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

16
- 84899464022
- Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction
- Taipei, Taiwan
- Sutton, R. S., Modayil, J., Delp, M., Degris, T., Pilarski, P. M., White, A., Precup, D. (2011). Horde: A scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction. In Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems, Taipei, Taiwan.
- (2011) Proceedings of the Tenth International Conference on Autonomous Agents and Multiagent Systems
- Sutton, R.S.¹ Modayil, J.² Delp, M.³ Degris, T.⁴ Pilarski, P.M.⁵ White, A.⁶ Precup, D.⁷

17
- 0010495476
- On bias and step size in temporal-difference learning
- New Haven, CT. Yale University
- Sutton, R. S., Singh, S. (1994). On bias and step size in temporal-difference learning. In Proceedings of the Eighth Yale Workshop on Adaptive and Learning Systems, pp. 91-96, New Haven, CT. Yale University.
- (1994) Proceedings of the Eighth Yale Workshop on Adaptive and Learning Systems , pp. 91-96
- Sutton, R.S.¹ Singh, S.²

18
- 84923286978
- Off-policy TD(A) with a true online equivalence
- Quebec City, Canada
- van Hasselt, H., Mahmood, A. R., Sutton, R. S. (2014). Off-policy TD(A) with a true online equivalence. In Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence, Quebec City, Canada.
- (2014) Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence
- Van Hasselt, H.¹ Mahmood, A.R.² Sutton, R.S.³

19
- 84919943264
- True online TD(A)
- Beijing, China. JMLR: W&CP
- van Seijen, H., Sutton, R. S. (2014). True online TD(A). In Proceedings of the 31st International Conference on Machine Learning. Beijing, China. JMLR: W&CP volume 32.
- (2014) Proceedings of the 31st International Conference on Machine Learning , vol.32
- Van Seijen, H.¹ Sutton, R.S.²

20
- 67650505307
- A theoretical and empirical analysis of expected sarsa
- van Seijen, H., van Hasselt, H., Whiteson, S., Wiering, M. (2009). A theoretical and empirical analysis of Expected Sarsa. In Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning, pp. 177-184.
- (2009) Proceedings of the IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning , pp. 177-184
- Van Seijen, H.¹ Van Hasselt, H.² Whiteson, S.³ Wiering, M.⁴

21
- 0004049893
- PhD thesis, Cambridge University
- Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. PhD thesis, Cambridge University.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

22
- 77956517288
- Convergence of least-squares temporal difference methods under general conditions
- Yu, H. (2010). Convergence of least-squares temporal difference methods under general conditions. In Proceedings of the 27th International Conference on Machine Learning, pp. 1207-1214.
- (2010) Proceedings of the 27th International Conference on Machine Learning , pp. 1207-1214
- Yu, H.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.