SCOPUS 정보 검색 플랫폼

Volumn , Issue , 1998, Pages 1064-1070

The asymptotic convergence-rate of Q-learning

Author keywords

[No Author keywords available]

Indexed keywords

ASYMPTOTIC RATE; DISCOUNT FACTORS; ONLINE LEARNING; Q-LEARNING; STATIONARY DISTRIBUTION;

PROBABILITY DISTRIBUTIONS;

EID: 84898998140 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (144)

References (17)

1
- 0003487482
- Athena Scientific, Belmont, MA
- Bertsekas, D. and Tsitsiklis, J. (1996). Neuro-Dynamic Programming. Athena Scientific, Belmont, MA.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

2
- 84899001914
- A modified form of the iterative method of dynamic programming
- Hordjik, A. and Tijms, H. (1975). A modified form of the iterative method of dynamic programming. Annals of Statistics, 3:203-208.
- (1975) Annals of Statistics , vol.3 , pp. 203-208
- Hordjik, A.¹ Tijms, H.²

3
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- Jaakkola, T., Jordan, M., and Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185- 1201.
- (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
- Jaakkola, T.¹ Jordan, M.² Singh, S.³

5
- 0001961616
- A generalized reinforcement learning model: Convergence and applications
- Littman, M. and Szepesvri, C. (1996). A Generalized Reinforcement Learning Model: Convergence and applications. In Int. Conf. on Machine Learning. http://iserv.iki.kfki.hu/asl-publs.html.
- (1996) Int. Conf. on Machine Learning
- Littman, M.¹ Szepesvri, C.²

8
- 84898952773
- A law of the iterated logarithm for the robbins-monro method
- Major, P. (1993). A law of the iterated logarithm for the Robbins-Monro method. Studia Scientiarum Mathematicarum Hungarica, 8:95-102.
- (1993) Studia Scientiarum Mathematicarum Hungarica , vol.8 , pp. 95-102
- Major, P.¹

9
- 0010720865
- Pseudogradient adaption and training algorithms
- Poljak, B. and Tsypkin, Y. (1983). Pseudogradient adaption and training algorithms. Automation and Remote Control, 12:83-94.
- (1983) Automation and Remote Control , vol.12 , pp. 83-94
- Poljak, B.¹ Tsypkin, Y.²

10
- 0003998452
- John Wiley & Sons, Inc. New York, NY
- Puterman, M. L. (1994). Markov Decision Processes - Discrete Stochastic Dynamic Programming. John Wiley & Sons, Inc., New York, NY.
- (1994) Markov Decision Processes - Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

11
- 0003644137
- Holden Day, San Francisco, California
- Ross, S. (1970). Applied Probability Models with Optimization Applications. Holden Day, San Francisco, California.
- (1970) Applied Probability Models with Optimization Applications
- Ross, S.¹

12
- 0000224681
- Reinforcement learning with soft state aggregation
- Singh, S., Jaakkola, T., and Jordan, M. (1995). Reinforcement learning with soft state aggregation. In Proceedings of Neural Information Processing Systems.
- (1995) Proceedings of Neural Information Processing Systems
- Singh, S.¹ Jaakkola, T.² Jordan, M.³

15
- 2342562099
- Asynchronous stochastic approximation and Q-learning
- Tsitsiklis, J. (1994). Asynchronous stochastic approximation and q-learning. Machine Learning, 8(3-4):257-277.
- (1994) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
- Tsitsiklis, J.¹

16
- 0004093909
- Cambridge University Press, London
- Wasan, T. (1969). Stochastic Approximation. Cambridge University Press, London.
- (1969) Stochastic Approximation
- Wasan, T.¹

17
- 0004049893
- PhD thesis, King's College, Cambridge. QLEARNING
- Watkins, C. (1990). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge. QLEARNING.
- (1990) Learning from Delayed Rewards
- Watkins, C.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.