메뉴 건너뛰기




Volumn 41, Issue 1, 2003, Pages 1-22

Stochastic approximation for nonexpansive maps: Application to Q-learning algorithms

Author keywords

Neuro dynamic programming; Q learning; Stochastic approximation

Indexed keywords

APPROXIMATION THEORY; COMPUTER SIMULATION; CONVERGENCE OF NUMERICAL METHODS; ITERATIVE METHODS; LEARNING ALGORITHMS; LYAPUNOV METHODS; MATHEMATICAL MODELS; PROBLEM SOLVING; RANDOM PROCESSES; THEOREM PROVING;

EID: 0037225359     PISSN: 03630129     EISSN: None     Source Type: Journal    
DOI: 10.1137/S0363012998346621     Document Type: Article
Times cited : (45)

References (26)
  • 1
    • 0036287773 scopus 로고    scopus 로고
    • Learning algorithms for Markov decision processes with average cost
    • J. Abounadi, D. P. Bertsekas, and V. S. Borkar (2001), Learning algorithms for Markov decision processes with average cost, SIAM J. Control Optim., 40, pp. 681-698.
    • (2001) SIAM J. Control Optim. , vol.40 , pp. 681-698
    • Abounadi, J.1    Bertsekas, D.P.2    Borkar, V.S.3
  • 3
    • 0003161907 scopus 로고
    • An analysis of stochastic shortest path problems
    • D. P. Bertsekas and J. N. Tsitsiklis (1991), An analysis of stochastic shortest path problems, Math. Oper. Res., 16, pp. 580-595.
    • (1991) Math. Oper. Res. , vol.16 , pp. 580-595
    • Bertsekas, D.P.1    Tsitsiklis, J.N.2
  • 8
    • 0027656581 scopus 로고
    • White noise representations in stochastic realization theory
    • V. S. Borkar (1993), White noise representations in stochastic realization theory, SIAM J. Control Optim., 31, pp. 1093-1102.
    • (1993) SIAM J. Control Optim. , vol.31 , pp. 1093-1102
    • Borkar, V.S.1
  • 10
    • 0032075427 scopus 로고    scopus 로고
    • Asynchronous stochastic approximations
    • Correction note in ibid, 38 (2000), pp. 662-663
    • V. S. Borkar (1998), Asynchronous stochastic approximations, SIAM J. Control Optim., 36, pp. 840-851. Correction note in ibid, 38 (2000), pp. 662-663.
    • (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
    • Borkar, V.S.1
  • 11
    • 0033876515 scopus 로고    scopus 로고
    • The O.D.E. method for convergence of stochastic approximation and reinforcement learning
    • V. S. Borkar and S. P. Meyn (2000), The O.D.E. method for convergence of stochastic approximation and reinforcement learning, SIAM J. Control Optim., 38, pp. 447-469.
    • (2000) SIAM J. Control Optim. , vol.38 , pp. 447-469
    • Borkar, V.S.1    Meyn, S.P.2
  • 13
    • 0016458868 scopus 로고
    • Learning under computational constraints from weakly dependent samples
    • S. Csibi (1975), Learning under computational constraints from weakly dependent samples, Prob. Control Inform. Theory, 4, pp. 3-21.
    • (1975) Prob. Control Inform. Theory , vol.4 , pp. 3-21
    • Csibi, S.1
  • 14
    • 0026923443 scopus 로고
    • Rate of convergence of recursive estimators
    • L. Gerencsér (1992), Rate of convergence of recursive estimators, SIAM J. Control Optim., 30, pp. 1200-1227.
    • (1992) SIAM J. Control Optim. , vol.30 , pp. 1200-1227
    • Gerencsér, L.1
  • 15
    • 0000439891 scopus 로고
    • On the convergence of stochastic iterative dynamic programming algorithms
    • T. Jaakola, M. I. Jordan, and S. P. Singh (1994), On the convergence of stochastic iterative dynamic programming algorithms, Neural Computation, 6, pp. 1185-1201.
    • (1994) Neural Computation , vol.6 , pp. 1185-1201
    • Jaakola, T.1    Jordan, M.I.2    Singh, S.P.3
  • 19
    • 0017526570 scopus 로고
    • Analysis of recursive stochastic algorithms
    • L. Ljung (1977), Analysis of recursive stochastic algorithms, IEEE Trans. Automat. Control, 22, pp. 551-575.
    • (1977) IEEE Trans. Automat. Control , vol.22 , pp. 551-575
    • Ljung, L.1
  • 21
    • 0025430267 scopus 로고
    • Partially asynchronous parallel algorithms for network flow and other problems
    • P. Tseng, D. P. Bertsekas, and J. N. Tsitsiklis (1990), Partially asynchronous parallel algorithms for network flow and other problems, SIAM J. Control Optim., 28, pp. 678-710.
    • (1990) SIAM J. Control Optim. , vol.28 , pp. 678-710
    • Tseng, P.1    Bertsekas, D.P.2    Tsitsiklis, J.N.3
  • 22
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N. Tsitsiklis (1994), Asynchronous stochastic approximation and Q-learning, Machine Learning, 16, pp. 185-202.
    • (1994) Machine Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 23
    • 0004049893 scopus 로고
    • Learning from delayed rewards
    • Ph.D. thesis, Cambridge University, Cambridge, England
    • C. J. C. H. Watkins (1989), Learning from delayed rewards, Ph.D. thesis, Cambridge University, Cambridge, England.
    • (1989)
    • Watkins, C.J.C.H.1
  • 25
    • 84968514083 scopus 로고
    • Smoothing derivatives of functions and applications
    • F. W. Wilson (1969), Smoothing derivatives of functions and applications, Trans. Amer. Math. Soc., 139, pp. 413-428.
    • (1969) Trans. Amer. Math. Soc. , vol.139 , pp. 413-428
    • Wilson, F.W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.