SCOPUS 정보 검색 플랫폼

Machine Learning

Volumn 32, Issue 1, 1998, Pages 5-40

Analytical mean squared error curves for temporal difference learning

(2) Singh, Satinder a Dayan, Peter b

a UCB 450 (United States)

b MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Bias; Eligibility trace; Markov reward process; Monte Carlo; MSE; Reinforcement learning; Temporal difference; Variance

Indexed keywords

BIAS; ELIGIBILITY TRACE; MARKOV REWARD PROCESS; MEAN SQUARED ERROR CURVES; REINFORCEMENT LEARNING; TEMPORAL DIFFERENCE LEARNING; VARIANCE;

ALGORITHMS; CURVE FITTING; ERROR ANALYSIS; MARKOV PROCESSES; MAXIMUM LIKELIHOOD ESTIMATION; MONTE CARLO METHODS; PARAMETER ESTIMATION; PROBLEM SOLVING; COMPUTER SOFTWARE; TABLE LOOKUP;

LEARNING SYSTEMS;

MEAN SQUARED ERROR CURVES; TEMPORAL DIFFERENCE;

EID: 0032114627 PISSN: 08856125 EISSN: None Source Type: Journal
DOI: 10.1023/A:1007495401240 Document Type: Article

Times cited : (38)

References (15)

1
- 0027554566
- Temporal-difference methods and Markov models
- Barnard, E. (1993). Temporal-difference methods and Markov models. IEEE Transactions on Systems, Man, and Cybernetics, 23(2), 357-365.
- (1993) IEEE Transactions on Systems, Man, and Cybernetics , vol.23 , Issue.2 , pp. 357-365
- Barnard, E.¹

2
- 2442603180
- Monte Carlo matrix inversion and reinforcement learning
- San Mateo, CA. Morgan Kaufmann
- Barto, A. G. & Duff, M. (1994). Monte Carlo matrix inversion and reinforcement learning. In Advances in Neural Information Processing Systems 6, pages 687-694, San Mateo, CA. Morgan Kaufmann.
- (1994) Advances in Neural Information Processing Systems , vol.6 , pp. 687-694
- Barto, A.G.¹ Duff, M.²

3
- 0020970738
- Neuronlike elements that can solve difficult learning control problems
- Barto, A. G., Sutton, R. S., & Anderson, C. W. (1983). Neuronlike elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics, 13, 835-846.
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , pp. 835-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

4
- 0004102205
- New York: Wiley-Interscience
- Bucklew, J. A. (1990). Large Deviation Techniques in Decision, Simulation and Estimation. New York: Wiley-Interscience.
- (1990) Large Deviation Techniques in Decision, Simulation and Estimation
- Bucklew, J.A.¹

5
- 0000430514
- The convergence of TD(λ) for general λ
- Dayan, P. (1992). The convergence of TD(λ) for general λ. Machine Learning, 8(3/4), 341-362.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 341-362
- Dayan, P.¹

6
- 0028388685
- TD(λ) converges with probability 1
- Dayan, P. & Sejnowski, T. (1994). TD(λ) converges with probability 1. Machine Learning, 14, 295-301.
- (1994) Machine Learning , vol.14 , pp. 295-301
- Dayan, P.¹ Sejnowski, T.²

7
- 80051745292
- Rigorous learning curve bounds from statistical mechanics
- San Mateo, CA. Morgan Kauffman
- Haussler, D., Kearns, M., Seung, H. S., & Tishby, N. (1994). Rigorous learning curve bounds from statistical mechanics. In Proceedings of the 7th Annual ACM Workshop on Computational Learning Theory, pages 76-87, San Mateo, CA. Morgan Kauffman.
- (1994) Proceedings of the 7th Annual ACM Workshop on Computational Learning Theory , pp. 76-87
- Haussler, D.¹ Kearns, M.² Seung, H.S.³ Tishby, N.⁴

8
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M. I., & Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6), 1185-1201.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.I.² Singh, S.³

9
- 85088329770
- Learning curves bounds for Markov decision processes with undiscounted rewards
- Saul, L. K. & Singh, S. (1996). Learning curves bounds for Markov decision processes with undiscounted rewards. In Proceedings of COLT.
- (1996) Proceedings of COLT
- Saul, L.K.¹ Singh, S.²

10
- 0029753630
- Reinforcement learning with replacing eligibility traces
- Singh, S. & Sutton, R. S. (1996). Reinforcement learning with replacing eligibility traces. Machine Learning, Vol. 22, 123-158.
- (1996) Machine Learning , vol.22 , pp. 123-158
- Singh, S.¹ Sutton, R.S.²

11
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R. S. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9-44.
- (1988) Machine Learning , vol.3 , pp. 9-44
- Sutton, R.S.¹

12
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- Tsitsiklis, J. (1994). Asynchronous stochastic approximation and Q-learning. Machine Learning, 16(3), 185-202.
- (1994) Machine Learning , vol.16 , Issue.3 , pp. 185-202
- Tsitsiklis, J.¹

13
- 84968491631
- A note on the inversion of matrices by random walks
- Wasow, W. R. (1952). A note on the inversion of matrices by random walks. Math. Tables Other Aids Comput., 6, 78-81.
- (1952) Math. Tables Other Aids Comput. , vol.6 , pp. 78-81
- Wasow, W.R.¹

14
- 0004049893
- Ph.D Thesis, Cambridge Univ., Cambridge, England
- Watkins, C. J. C. H. (1989). Learning from Delayed Rewards. Ph.D Thesis, Cambridge Univ., Cambridge, England.
- (1989) Learning from Delayed Rewards
- Watkins, C.J.C.H.¹

15
- 0004113431
- Englewood Cliffs, NJ: Prentice-Hall
- Widrow, B. & Stearns, S. D. (1985). Adaptive Signal Processing. Englewood Cliffs, NJ: Prentice-Hall.
- (1985) Adaptive Signal Processing
- Widrow, B.¹ Stearns, S.D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.