메뉴 건너뛰기




Volumn , Issue , 1998, Pages 1064-1070

The asymptotic convergence-rate of Q-learning

Author keywords

[No Author keywords available]

Indexed keywords

ASYMPTOTIC RATE; DISCOUNT FACTORS; ONLINE LEARNING; Q-LEARNING; STATIONARY DISTRIBUTION;

EID: 84898998140     PISSN: 10495258     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (144)

References (17)
  • 2
    • 84899001914 scopus 로고
    • A modified form of the iterative method of dynamic programming
    • Hordjik, A. and Tijms, H. (1975). A modified form of the iterative method of dynamic programming. Annals of Statistics, 3:203-208.
    • (1975) Annals of Statistics , vol.3 , pp. 203-208
    • Hordjik, A.1    Tijms, H.2
  • 3
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • Jaakkola, T., Jordan, M., and Singh, S. (1994). On the convergence of stochastic iterative dynamic programming algorithms. Neural Computation, 6(6):1185- 1201.
    • (1994) Neural Computation , vol.6 , Issue.6 , pp. 1185-1201
    • Jaakkola, T.1    Jordan, M.2    Singh, S.3
  • 4
    • 85149834820 scopus 로고
    • Markov games as a framework for multi-agent reinforcement learning
    • San Francisco, CA. Morgan Kauffman
    • Littman, M. (1994). Markov games as a framework for multi-agent reinforcement learning. In Proc. of the Eleventh International Conference on Machine Learning, pages 157-163, San Francisco, CA. Morgan Kauffman.
    • (1994) Proc. of the Eleventh International Conference on Machine Learning , pp. 157-163
    • Littman, M.1
  • 5
    • 0001961616 scopus 로고    scopus 로고
    • A generalized reinforcement learning model: Convergence and applications
    • Littman, M. and Szepesvri, C. (1996). A Generalized Reinforcement Learning Model: Convergence and applications. In Int. Conf. on Machine Learning. http://iserv.iki.kfki.hu/asl-publs.html.
    • (1996) Int. Conf. on Machine Learning
    • Littman, M.1    Szepesvri, C.2
  • 6
    • 0010853273 scopus 로고
    • To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning
    • San Francisco, CA. Morgan Kaufmann
    • Mahadevan, S. (1994). To discount or not to discount in reinforcement learning: A case study comparing R learning and Q learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 164-172, San Francisco, CA. Morgan Kaufmann.
    • (1994) Proceedings of the Eleventh International Conference on Machine Learning , pp. 164-172
    • Mahadevan, S.1
  • 7
    • 0029752592 scopus 로고    scopus 로고
    • Average reward reinforcement learning: Foundations, algorithms, and empirical results
    • 3
    • Mahadevan, S. (1996). Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1, 2, 3) :124-158.
    • (1996) Machine Learning , vol.22 , Issue.1-2 , pp. 124-158
    • Mahadevan, S.1
  • 8
    • 84898952773 scopus 로고
    • A law of the iterated logarithm for the robbins-monro method
    • Major, P. (1993). A law of the iterated logarithm for the Robbins-Monro method. Studia Scientiarum Mathematicarum Hungarica, 8:95-102.
    • (1993) Studia Scientiarum Mathematicarum Hungarica , vol.8 , pp. 95-102
    • Major, P.1
  • 9
    • 0010720865 scopus 로고
    • Pseudogradient adaption and training algorithms
    • Poljak, B. and Tsypkin, Y. (1983). Pseudogradient adaption and training algorithms. Automation and Remote Control, 12:83-94.
    • (1983) Automation and Remote Control , vol.12 , pp. 83-94
    • Poljak, B.1    Tsypkin, Y.2
  • 13
    • 2342564758 scopus 로고    scopus 로고
    • On the convergence of single-step on-policy reinforcement-learning al gorithms
    • preparation
    • Singh, S., Jaakkola, T., Littman, M., and Csaba Szepesvari (1997). On the convergence of single-step on-policy reinforcement-learning al gorithms. Machine Learning, in preparation.
    • (1997) Machine Learning
    • Singh, S.1    Jaakkola, T.2    Littman, M.3    Szepesvari, C.4
  • 14
    • 0003629453 scopus 로고    scopus 로고
    • Generalized markov decision processes: Dynamic programming and reinforcement learning algorithms
    • preparation, available as TR CS96-10, Brown Univ
    • Szepesvari, C. and Littman, M. (1996). Generalized Markov Decision Processes: Dynamic programming and reinforcement learning algorithms. Machine Learning. in preparation, available as TR CS96-10, Brown Univ.
    • (1996) Machine Learning
    • Szepesvari, C.1    Littman, M.2
  • 15
    • 2342562099 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • Tsitsiklis, J. (1994). Asynchronous stochastic approximation and q-learning. Machine Learning, 8(3-4):257-277.
    • (1994) Machine Learning , vol.8 , Issue.3-4 , pp. 257-277
    • Tsitsiklis, J.1
  • 17
    • 0004049893 scopus 로고
    • PhD thesis, King's College, Cambridge. QLEARNING
    • Watkins, C. (1990). Learning from Delayed Rewards. PhD thesis, King's College, Cambridge. QLEARNING.
    • (1990) Learning from Delayed Rewards
    • Watkins, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.