SCOPUS 정보 검색 플랫폼

Proceedings of the 25th International Conference on Machine Learning

Volumn , Issue , 2008, Pages 560-567

A worst-case comparison between temporal difference and residual gradient with linear function approximation

(1) Li, Lihong a

a RUTGERS UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

LEARNING ALGORITHMS; MACHINE LEARNING; MARKOV PROCESSES; FUNCTIONS; INTERNET; LEARNING SYSTEMS; PROBABILITY DENSITY FUNCTION; ROBOT LEARNING;

FORMAL ANALYSIS; FUNCTION APPROXIMATION; LINEAR FUNCTIONS; NON-PROBABILISTIC; POLICY EVALUATION; PREDICTION ERRORS; RESIDUAL GRADIENT; TEMPORAL DIFFERENCES;

APPROXIMATION ALGORITHMS;

FORMAL ANALYSIS; FUNCTION APPROXIMATIONS; LINEAR FUNCTIONS; MARKOV CHAINS; MARKOVIAN; NON-PROBABILISTIC; ON-LINE LEARNING; POLICY EVALUATIONS; PREDICTION ERRORS; RESIDUAL GRADIENTS; TEMPORAL DIFFERENCES;

EID: 56449125197 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1390156.1390227 Document Type: Conference Paper

Times cited : (22)

References (15)

1
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- Baird, L. C. (1995). Residual algorithms: Reinforcement learning with function approximation. Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) (pp. 30-37).
- (1995) Proceedings of the Twelfth International Conference on Machine Learning (ICML-95) , pp. 30-37
- Baird, L.C.¹

2
- 0003487482
- Athena Scientific
- Bertsekas, D. P., & Tsitsiklis, J. N. (1996). Neuro-dynamic programming. Athena Scientific.
- (1996) Neuro-dynamic programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. Advances in Neural Information Processing Systems 7 (NIPS-94) (pp. 369-376).
- (1995) Advances in Neural Information Processing Systems 7 (NIPS-94) , pp. 369-376
- Boyan, J.A.¹ Moore, A.W.²

4
- 0030145382
- Worst-case quadratic loss bounds for prediction using linear functions and gradient descent
- Cesa-Bianchi, N., Long, P. M., & Warmuth, M. (1996). Worst-case quadratic loss bounds for prediction using linear functions and gradient descent. IEEE Transactions on Neural Networks, 7, 604-619.
- (1996) IEEE Transactions on Neural Networks , vol.7 , pp. 604-619
- Cesa-Bianchi, N.¹ Long, P.M.² Warmuth, M.³

5
- 0004151494
- Cambridge University Press
- Horn, R. A., & Johnson, C. R. (1986). Matrix analysis. Cambridge University Press.
- (1986) Matrix analysis
- Horn, R.A.¹ Johnson, C.R.²

6
- 0008815681
- Exponentiated gradient versus gradient descent for linear predictors
- Kivinen, J., & Warmuth, M. K. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132, 1-63.
- (1997) Information and Computation , vol.132 , pp. 1-63
- Kivinen, J.¹ Warmuth, M.K.²

7
- 4644323293
- Least-squares policy iteration
- Lagoudakis, M. G., & Parr, R. (2003). Least-squares policy iteration. Journal of Machine Learning Research, 4, 1107-1149.
- (2003) Journal of Machine Learning Research , vol.4 , pp. 1107-1149
- Lagoudakis, M.G.¹ Parr, R.²

8
- 1942516880
- Error bounds for approximate policy iteration
- Munos, R. (2003). Error bounds for approximate policy iteration. Proceedings of the Twentieth International Conference on Machine Learning (ICML-03) (pp. 560-567).
- (2003) Proceedings of the Twentieth International Conference on Machine Learning (ICML-03) , pp. 560-567
- Munos, R.¹

9
- 33749032965
- Exponentiated gradient methods for reinforcement learning
- Precup, D., & Sutton, R. S. (1997). Exponentiated gradient methods for reinforcement learning. Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97) (pp. 272-277).
- (1997) Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97) , pp. 272-277
- Precup, D.¹ Sutton, R.S.²

10
- 85102627959
- New York: Wiley-Interscience
- Puterman, M. L. (1994). Markov decision processes: Discrete stochastic dynamic programming. New York: Wiley-Interscience.
- (1994) Markov decision processes: Discrete stochastic dynamic programming
- Puterman, M.L.¹

11
- 0013419177
- On the worst-case analysis of temporal-difference learning algorithms
- Schapire, R. E., & Warmuth, M. K. (1996). On the worst-case analysis of temporal-difference learning algorithms. Machine Learning, 22, 95-122.
- (1996) Machine Learning , vol.22 , pp. 95-122
- Schapire, R.E.¹ Warmuth, M.K.²

12
- 1942452243
- TD(0) converges provably faster than the residual gradient algorithm
- Schoknecht, R., & Merke, A. (2003). TD(0) converges provably faster than the residual gradient algorithm. Proceedings of the Twentieth International Conference on Machine Learning (ICML-03) (pp. 680-687).
- (2003) Proceedings of the Twentieth International Conference on Machine Learning (ICML-03) , pp. 680-687
- Schoknecht, R.¹ Merke, A.²

13
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

14
- 0004102479
- Cambridge, MA: MIT Press
- Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. Cambridge, MA: MIT Press.
- (1998) Reinforcement learning: An introduction
- Sutton, R.S.¹ Barto, A.G.²

15
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J. N., & Van Roy, B. (1997). An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control, 42, 674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.