메뉴 건너뛰기




Volumn 38, Issue 2, 2000, Pages 447-469

O.D.E. method for convergence of stochastic approximation and reinforcement learning

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; CONVERGENCE OF NUMERICAL METHODS; LEARNING SYSTEMS; RANDOM PROCESSES;

EID: 0033876515     PISSN: 03630129     EISSN: None     Source Type: Journal    
DOI: 10.1137/S0363012997331639     Document Type: Article
Times cited : (545)

References (21)
  • 1
    • 0036287773 scopus 로고    scopus 로고
    • Learning algorithms for Markov decision processes with average cost
    • submitted
    • J. ABOUNADI, D. BERTSEKAS, AND V. S. BORKAR, Learning algorithms for Markov decision processes with average cost, SIAM J. Control Optim., submitted.
    • SIAM J. Control Optim.
    • Abounadi, J.1    Bertsekas, D.2    Borkar, V.S.3
  • 5
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • V. S. BORKAR, Stochastic approximation with two time scales. Systems Control Lett., 29 (1997), pp. 291-294.
    • (1997) Systems Control Lett. , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 6
    • 0032075427 scopus 로고    scopus 로고
    • Asynchronous stochastic approximation
    • V. S. BORKAR, Asynchronous stochastic approximation, SIAM J. Control Optim., 36 (1998), pp. 840-851.
    • (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
    • Borkar, V.S.1
  • 7
    • 0009636221 scopus 로고    scopus 로고
    • Recursive self-tuning control of finite Markov chains
    • V. S. BORKAR, Recursive self-tuning control of finite Markov chains, Appl. Math., 24 (1996), pp. 169-188.
    • (1996) Appl. Math. , vol.24 , pp. 169-188
    • Borkar, V.S.1
  • 9
    • 0003077340 scopus 로고
    • On positive harris recurrence for multiclass queueing networks: A unified approach via fluid limit models
    • J. G. DAI, On positive Harris recurrence for multiclass queueing networks: A unified approach via fluid limit models, Ann. Appl. Probab., 5 (1995), pp. 49-77.
    • (1995) Ann. Appl. Probab. , vol.5 , pp. 49-77
    • Dai, J.G.1
  • 10
    • 0029404157 scopus 로고
    • Stability and convergence of moments for multiclass queueing networks via fluid limit models
    • J. G. DAI AND S. P. MEYN, Stability and convergence of moments for multiclass queueing networks via fluid limit models, IEEE Trans. Automat. Control, 40 (1995), pp. 1889-1904.
    • (1995) IEEE Trans. Automat. Control , vol.40 , pp. 1889-1904
    • Dai, J.G.1    Meyn, S.P.2
  • 11
    • 0024909476 scopus 로고
    • Convergent activation dynamics in continuous time networks
    • M. W. HIRSCH, Convergent activation dynamics in continuous time networks, Neural Networks, 2 (1989), pp. 331-349.
    • (1989) Neural Networks , vol.2 , pp. 331-349
    • Hirsch, M.W.1
  • 12
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. JAAKOLA, M. I. JORDAN, AND S. P. SINGH, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation, 6 (1994), pp. 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 13
    • 0343893613 scopus 로고    scopus 로고
    • Actor-critic-type learning algorithms for Markov decision processes
    • V. R. KONDA AND V. S. BORKAR, Actor-critic-type learning algorithms for Markov decision processes, SIAM J. Control Optim., 38 (1999), pp. 94-123.
    • (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 15
    • 0002261059 scopus 로고
    • Ergodicity, continuity and analyticity of countable Markov chains
    • V. A. MALYSHEV AND M. V. MEN'SIKOV, Ergodicity, continuity and analyticity of countable Markov chains, Trans. Moscow Math. Soc., 1 (1982), pp. 1-48.
    • (1982) Trans. Moscow Math. Soc. , vol.1 , pp. 1-48
    • Malyshev, V.A.1    Men'sikov, M.V.2
  • 17
    • 0000566364 scopus 로고
    • Computable bounds for geometric convergence rates of Markov chains
    • S. P. MEYN AND R. L. TWEEDIE, Computable bounds for geometric convergence rates of Markov chains, Ann. Appl. Probab., 4 (1994), pp. 981-1011.
    • (1994) Ann. Appl. Probab. , vol.4 , pp. 981-1011
    • Meyn, S.P.1    Tweedie, R.L.2
  • 20
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and q-learning
    • J. TSITSIKLIS, Asynchronous stochastic approximation and q-learning, Mach. Learning, 16 (1994), pp. 195-202.
    • (1994) Mach. Learning , vol.16 , pp. 195-202
    • Tsitsiklis, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.