SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

SIAM Journal on Control and Optimization

Volumn 38, Issue 2, 2000, Pages 447-469

O.D.E. method for convergence of stochastic approximation and reinforcement learning

(2) Borkar, V S a Meyn, S P b

a TATA INSTITUTE OF FUNDAMENTAL RESEARCH (India)

b UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; CONVERGENCE OF NUMERICAL METHODS; LEARNING SYSTEMS; RANDOM PROCESSES;

REINFORCEMENT LEARNING;

ORDINARY DIFFERENTIAL EQUATIONS;

EID: 0033876515 PISSN: 03630129 EISSN: None Source Type: Journal
DOI: 10.1137/S0363012997331639 Document Type: Article

Times cited : (545)

References (21)

1
- 0036287773
- Learning algorithms for Markov decision processes with average cost
- submitted
- J. ABOUNADI, D. BERTSEKAS, AND V. S. BORKAR, Learning algorithms for Markov decision processes with average cost, SIAM J. Control Optim., submitted.
- SIAM J. Control Optim.
- Abounadi, J.¹ Bertsekas, D.² Borkar, V.S.³

2
- 0020970738
- Neuron-like elements that can solve difficult learning control problems
- A. G. BARTO, R. S. SUTTON, AND C. W. ANDERSON, Neuron-like elements that can solve difficult learning control problems, IEEE Trans. Systems, Man and Cybernetics, 13 (1983), pp. 835-846.
- (1983) IEEE Trans. Systems, Man and Cybernetics , vol.13 , pp. 835-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

3
- 0003778897
- Springer-Verlag, Berlin
- A. BENVENISTE, M. MÉTIVIER, AND P. PRIOURET, Adaptive Algorithms and Stochastic Approximations, Springer-Verlag, Berlin, 1990.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Métivier, M.² Priouret, P.³

4
- 0003487482
- Athena Scientific, Belmont, MA
- D. BERTSEKAS AND J. TSITSIKLIS, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.¹ Tsitsiklis, J.²

5
- 0031076413
- Stochastic approximation with two time scales
- V. S. BORKAR, Stochastic approximation with two time scales. Systems Control Lett., 29 (1997), pp. 291-294.
- (1997) Systems Control Lett. , vol.29 , pp. 291-294
- Borkar, V.S.¹

6
- 0032075427
- Asynchronous stochastic approximation
- V. S. BORKAR, Asynchronous stochastic approximation, SIAM J. Control Optim., 36 (1998), pp. 840-851.
- (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
- Borkar, V.S.¹

7
- 0009636221
- Recursive self-tuning control of finite Markov chains
- V. S. BORKAR, Recursive self-tuning control of finite Markov chains, Appl. Math., 24 (1996), pp. 169-188.
- (1996) Appl. Math. , vol.24 , pp. 169-188
- Borkar, V.S.¹

8
- 0031123471
- An analog scheme for fixed-point computation, part I: Theory
- V. S. BORKAR AND K. SOUMYANATH, An analog scheme for fixed-point computation, part I: Theory, IEEE Trans. Circuits Systems I Fund. Theory Appl., 44 (1997), pp. 351-355.
- (1997) IEEE Trans. Circuits Systems I Fund. Theory Appl. , vol.44 , pp. 351-355
- Borkar, V.S.¹ Soumyanath, K.²

9
- 0003077340
- On positive harris recurrence for multiclass queueing networks: A unified approach via fluid limit models
- J. G. DAI, On positive Harris recurrence for multiclass queueing networks: A unified approach via fluid limit models, Ann. Appl. Probab., 5 (1995), pp. 49-77.
- (1995) Ann. Appl. Probab. , vol.5 , pp. 49-77
- Dai, J.G.¹

10
- 0029404157
- Stability and convergence of moments for multiclass queueing networks via fluid limit models
- J. G. DAI AND S. P. MEYN, Stability and convergence of moments for multiclass queueing networks via fluid limit models, IEEE Trans. Automat. Control, 40 (1995), pp. 1889-1904.
- (1995) IEEE Trans. Automat. Control , vol.40 , pp. 1889-1904
- Dai, J.G.¹ Meyn, S.P.²

11
- 0024909476
- Convergent activation dynamics in continuous time networks
- M. W. HIRSCH, Convergent activation dynamics in continuous time networks, Neural Networks, 2 (1989), pp. 331-349.
- (1989) Neural Networks , vol.2 , pp. 331-349
- Hirsch, M.W.¹

12
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. JAAKOLA, M. I. JORDAN, AND S. P. SINGH, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation, 6 (1994), pp. 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakola, T.¹ Jordan, M.I.² Singh, S.P.³

13
- 0343893613
- Actor-critic-type learning algorithms for Markov decision processes
- V. R. KONDA AND V. S. BORKAR, Actor-critic-type learning algorithms for Markov decision processes, SIAM J. Control Optim., 38 (1999), pp. 94-123.
- (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

14
- 0004066022
- Springer-Verlag, New York
- H. J. KUSHNER AND G. G. YIN, Stochastic Approximation Algorithms and Applications, Springer-Verlag, New York, 1997.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.G.²

15
- 0002261059
- Ergodicity, continuity and analyticity of countable Markov chains
- V. A. MALYSHEV AND M. V. MEN'SIKOV, Ergodicity, continuity and analyticity of countable Markov chains, Trans. Moscow Math. Soc., 1 (1982), pp. 1-48.
- (1982) Trans. Moscow Math. Soc. , vol.1 , pp. 1-48
- Malyshev, V.A.¹ Men'sikov, M.V.²

16
- 0003637131
- Springer-Verlag, London
- S. P. MEYN AND R. L. TWEEDIE, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

17
- 0000566364
- Computable bounds for geometric convergence rates of Markov chains
- S. P. MEYN AND R. L. TWEEDIE, Computable bounds for geometric convergence rates of Markov chains, Ann. Appl. Probab., 4 (1994), pp. 981-1011.
- (1994) Ann. Appl. Probab. , vol.4 , pp. 981-1011
- Meyn, S.P.¹ Tweedie, R.L.²

18
- 0004239351
- North Holland, Amsterdam
- J. NEVEU, Discrete Parameter Martingales, North Holland, Amsterdam, 1975.
- (1975) Discrete Parameter Martingales
- Neveu, J.¹

19
- 0003540954
- Clarendon Press, Oxford
- T. SARGENT, Bounded Rationality in Macroeconomics, Clarendon Press, Oxford, 1993.
- (1993) Bounded Rationality in Macroeconomics
- Sargent, T.¹

20
- 0028497630
- Asynchronous stochastic approximation and q-learning
- J. TSITSIKLIS, Asynchronous stochastic approximation and q-learning, Mach. Learning, 16 (1994), pp. 195-202.
- (1994) Mach. Learning , vol.16 , pp. 195-202
- Tsitsiklis, J.¹

21
- 34249833101
- Q-learning
- C. J. C. H. WATKINS AND P. DAYAN, Q-learning, Mach. Learning, 8 (1992), pp. 279-292.
- (1992) Mach. Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.