SCOPUS 정보 검색 플랫폼

SIAM Journal on Control and Optimization

Volumn 41, Issue 1, 2003, Pages 1-22

Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms

(3) Abounadi, Jinane a Bertsekas, Dimitri P a Borkar, Vivek b

a Department of Electrical Engineering and Computer Science (United States)

b TATA INSTITUTE OF FUNDAMENTAL RESEARCH (India)

Author keywords

Neuro dynamic programming; Q learning; Stochastic approximation

Indexed keywords

APPROXIMATION THEORY; COMPUTER SIMULATION; CONVERGENCE OF NUMERICAL METHODS; ITERATIVE METHODS; LEARNING ALGORITHMS; LYAPUNOV METHODS; MATHEMATICAL MODELS; PROBLEM SOLVING; RANDOM PROCESSES; THEOREM PROVING;

CONVERGENCE ANALYSIS; STOCHASTIC APPROXIMATION;

DYNAMIC PROGRAMMING;

EID: 0037225359 PISSN: 03630129 EISSN: None Source Type: Journal
DOI: 10.1137/S0363012998346621 Document Type: Article

Times cited : (47)

References (26)

1
- 0036287773
- Learning algorithms for Markov decision processes with average cost
- J. Abounadi, D. P. Bertsekas, and V. S. Borkar (2001), Learning algorithms for Markov decision processes with average cost, SIAM J. Control Optim., 40, pp. 681-698.
- (2001) SIAM J. Control Optim. , vol.40 , pp. 681-698
- Abounadi, J.¹ Bertsekas, D.P.² Borkar, V.S.³

2
- 0003636164
- Prentice-Hall, Englewood Cliffs, NJ
- D. P. Bertsekas and J. N. Tsitsiklis (1989), Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, Englewood Cliffs, NJ.
- (1989) Parallel and Distributed Computation: Numerical Methods
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

3
- 0003161907
- An analysis of stochastic shortest path problems
- D. P. Bertsekas and J. N. Tsitsiklis (1991), An analysis of stochastic shortest path problems, Math. Oper. Res., 16, pp. 580-595.
- (1991) Math. Oper. Res. , vol.16 , pp. 580-595
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

4
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. Bertsekas and J. N. Tsitsiklis (1996), Neuro-Dynamic Programming, Athena Scientific, Belmont, MA.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

5
- 0003565783
- Athena Scientific, Belmont, MA
- D. P. Bertsekas (2001), Dynamic Programming and Optimal Control, 2nd ed., Athena Scientific, Belmont, MA.
- (2001) Dynamic Programming and Optimal Control, 2nd Ed.
- Bertsekas, D.P.¹

6
- 0003778897
- Springer-Verlag, New York
- A. Benveniste, M. Metivier, and P. Priouret (1990), Adaptive Algorithms and Stochastic Approximations, Springer-Verlag, New York.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

7
- 0003407041
- John Wiley, New York
- P. Billingsley (1968), Convergence of Probability Measures, John Wiley, New York.
- (1968) Convergence of Probability Measures
- Billingsley, P.¹

8
- 0027656581
- White noise representations in stochastic realization theory
- V. S. Borkar (1993), White noise representations in stochastic realization theory, SIAM J. Control Optim., 31, pp. 1093-1102.
- (1993) SIAM J. Control Optim. , vol.31 , pp. 1093-1102
- Borkar, V.S.¹

9
- 0003500973
- Springer-Verlag, New York
- V. S. Borkar (1995), Probability Theory: An Advanced Course, Springer-Verlag, New York.
- (1995) Probability Theory: An Advanced Course
- Borkar, V.S.¹

10
- 0032075427
- Asynchronous stochastic approximations
- Correction note in ibid, 38 (2000), pp. 662-663
- V. S. Borkar (1998), Asynchronous stochastic approximations, SIAM J. Control Optim., 36, pp. 840-851. Correction note in ibid, 38 (2000), pp. 662-663.
- (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
- Borkar, V.S.¹

11
- 0033876515
- The O.D.E. method for convergence of stochastic approximation and reinforcement learning
- V. S. Borkar and S. P. Meyn (2000), The O.D.E. method for convergence of stochastic approximation and reinforcement learning, SIAM J. Control Optim., 38, pp. 447-469.
- (2000) SIAM J. Control Optim. , vol.38 , pp. 447-469
- Borkar, V.S.¹ Meyn, S.P.²

12
- 0031123471
- An analog parallel scheme for fixed point computation-Part I: Theory
- V. S. Borkar and K. Soumyanath (1997), An analog parallel scheme for fixed point computation-Part I: Theory, IEEE Trans. Circuits Systems I Fund. Theory Appl., 44, pp. 351-355.
- (1997) IEEE Trans. Circuits Systems I Fund. Theory Appl. , vol.44 , pp. 351-355
- Borkar, V.S.¹ Soumyanath, K.²

13
- 0016458868
- Learning under computational constraints from weakly dependent samples
- S. Csibi (1975), Learning under computational constraints from weakly dependent samples, Prob. Control Inform. Theory, 4, pp. 3-21.
- (1975) Prob. Control Inform. Theory , vol.4 , pp. 3-21
- Csibi, S.¹

14
- 0026923443
- Rate of convergence of recursive estimators
- L. Gerencsér (1992), Rate of convergence of recursive estimators, SIAM J. Control Optim., 30, pp. 1200-1227.
- (1992) SIAM J. Control Optim. , vol.30 , pp. 1200-1227
- Gerencsér, L.¹

15
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- T. Jaakola, M. I. Jordan, and S. P. Singh (1994), On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation, 6, pp. 1185-1201.
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakola, T.¹ Jordan, M.I.² Singh, S.P.³

16
- 0003717129
- Birkhäuser Boston, Cambridge, MA
- Y. Kifer (1986), Ergodic Theory of Random Transformations, Birkhäuser Boston, Cambridge, MA.
- (1986) Ergodic Theory of Random Transformations
- Kifer, Y.¹

17
- 0003452601
- Springer-Verlag, New York
- H. J. Kushner and D. S. Clark (1978), Stochastic Approximation Methods for Constrained and Unconstrained Systems, Springer-Verlag, New York.
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems
- Kushner, H.J.¹ Clark, D.S.²

18
- 0004066022
- Springer-Verlag, New York
- H. J. Kushner and G. Yin (1997), Stochastic Approximation Algorithms and Applications, Springer-Verlag, New York.
- (1997) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.²

19
- 0017526570
- Analysis of recursive stochastic algorithms
- L. Ljung (1977), Analysis of recursive stochastic algorithms, IEEE Trans. Automat. Control, 22, pp. 551-575.
- (1977) IEEE Trans. Automat. Control , vol.22 , pp. 551-575
- Ljung, L.¹

20
- 0033279107
- An analog scheme for fixed point computation-Part II: Applications
- K. Soumyanath and V. S. Borkar (1999), An analog scheme for fixed point computation-Part II: Applications, IEEE Trans. Circuits Systems I Fund. Theory Appl., 46, pp. 442-451.
- (1999) IEEE Trans. Circuits Systems I Fund. Theory Appl. , vol.46 , pp. 442-451
- Soumyanath, K.¹ Borkar, V.S.²

21
- 0025430267
- Partially asynchronous parallel algorithms for network flow and other problems
- P. Tseng, D. P. Bertsekas, and J. N. Tsitsiklis (1990), Partially asynchronous parallel algorithms for network flow and other problems, SIAM J. Control Optim., 28, pp. 678-710.
- (1990) SIAM J. Control Optim. , vol.28 , pp. 678-710
- Tseng, P.¹ Bertsekas, D.P.² Tsitsiklis, J.N.³

22
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- J. N. Tsitsiklis (1994), Asynchronous stochastic approximation and Q-learning, Machine Learning, 16, pp. 185-202.
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

23
- 0004049893
- Learning from delayed rewards
- Ph.D. thesis, Cambridge University, Cambridge, England
- C. J. C. H. Watkins (1989), Learning from delayed rewards, Ph.D. thesis, Cambridge University, Cambridge, England.
- (1989)
- Watkins, C.J.C.H.¹

24
- 34249833101
- Q-learning
- C. J. C. H. Watkins and P. Dayan (1992), Q-learning, Machine Learning, 8, pp. 279-292.
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

25
- 84968514083
- Smoothing derivatives of functions and applications
- F. W. Wilson (1969), Smoothing derivatives of functions and applications, Trans. Amer. Math. Soc., 139, pp. 413-428.
- (1969) Trans. Amer. Math. Soc. , vol.139 , pp. 413-428
- Wilson, F.W.¹

26
- 0003842823
- Mathematical Society of Japan, Tokyo
- T. Yoshizawa (1966). Stability Theory by Lyapunov's Second Method, Mathematical Society of Japan, Tokyo.
- (1966) Stability Theory by Lyapunov's Second Method
- Yoshizawa, T.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.