메뉴 건너뛰기




Volumn 38, Issue 1, 1999, Pages 94-123

Actor-critic-type learning algorithms for Markov decision processes

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; ARTIFICIAL INTELLIGENCE; DECISION THEORY; LEARNING ALGORITHMS;

EID: 0343893613     PISSN: 03630129     EISSN: None     Source Type: Journal    
DOI: 10.1137/S036301299731669X     Document Type: Article
Times cited : (227)

References (28)
  • 3
    • 0020970738 scopus 로고
    • Neuron-like elements that can solve difficult learning control problems
    • A. BARTO, R. SUTTON, AND C. ANDERSON, Neuron-like elements that can solve difficult learning control problems, IEEE Trans. Systems, Man and Cybernetics, 13 (1983), pp. 835-846.
    • (1983) IEEE Trans. Systems, Man and Cybernetics , vol.13 , pp. 835-846
    • Barto, A.1    Sutton, R.2    Anderson, C.3
  • 4
    • 0032022988 scopus 로고    scopus 로고
    • A new value iteration method for average cost dynamic programming problem
    • D. P. BERTSEKAS, A new value iteration method for average cost dynamic programming problem, SIAM J. Control Optim., 36 (1998), pp. 742-759.
    • (1998) SIAM J. Control Optim. , vol.36 , pp. 742-759
    • Bertsekas, D.P.1
  • 5
    • 0003161907 scopus 로고
    • An analysis of stochastic shortest path problem
    • D. P. BERTSEKAS AND J. N. TSITSIKLIS, An analysis of stochastic shortest path problem, Math. Oper. Res., 16 (1991), pp. 580-595.
    • (1991) Math. Oper. Res. , vol.16 , pp. 580-595
    • Bertsekas, D.P.1    Tsitsiklis, J.N.2
  • 7
    • 0009636221 scopus 로고    scopus 로고
    • Recursive self-tuning control of finite Markov chains
    • V. S. BORKAR, Recursive self-tuning control of finite Markov chains, Applicationes Mathematicae, 24 (1996), pp. 169-188.
    • (1996) Applicationes Mathematicae , vol.24 , pp. 169-188
    • Borkar, V.S.1
  • 8
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • V. S. BORKAR, Stochastic approximation with two time scales, Systems Control Lett., 29 (1996), pp. 291-294.
    • (1996) Systems Control Lett. , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 9
    • 0032075427 scopus 로고    scopus 로고
    • Asynchronous stochastic approximations
    • V. S. BORKAR, Asynchronous stochastic approximations, SIAM J. Control Optim., 36 (1998), pp. 840-851.
    • (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
    • Borkar, V.S.1
  • 10
    • 0031198797 scopus 로고    scopus 로고
    • Actor-critic algorithm as multi-time scale stochastic approximation
    • V. S. BORKAR AND V. R. KONDA, Actor-critic algorithm as multi-time scale stochastic approximation, Sādhanā, 22 (1997), pp. 525-543.
    • (1997) Sādhanā , vol.22 , pp. 525-543
    • Borkar, V.S.1    Konda, V.R.2
  • 11
    • 85037961073 scopus 로고    scopus 로고
    • Stability and convergence of stochastic approximation using the ODE method
    • to appear
    • V. S. BORKAR AND S. P. MEYN, Stability and convergence of stochastic approximation using the ODE method, SIAM J. Control Optim., to appear.
    • SIAM J. Control Optim.
    • Borkar, V.S.1    Meyn, S.P.2
  • 13
    • 0024909476 scopus 로고
    • Convergent activation dynamics in continuous time networks
    • M. W. HIRSCH, Convergent activation dynamics in continuous time networks, Neural Networks, 2 (1989), pp. 331-349.
    • (1989) Neural Networks , vol.2 , pp. 331-349
    • Hirsch, M.W.1
  • 14
    • 51249164115 scopus 로고
    • A tutorial survey of reinforcement learning
    • S. S. KEERTHI AND B. RAVINDRAN, A tutorial survey of reinforcement learning, Sādhanā, 19 (1994), pp. 851-889.
    • (1994) Sādhanā , vol.19 , pp. 851-889
    • Keerthi, S.S.1    Ravindran, B.2
  • 15
    • 0021501125 scopus 로고
    • Applications of singular perturbation techniques to control problems
    • P. V. KOKOTOVIC, Applications of singular perturbation techniques to control problems, SIAM Rev., 26 (1984), pp. 501-550.
    • (1984) SIAM Rev. , vol.26 , pp. 501-550
    • Kokotovic, P.V.1
  • 16
    • 85037958376 scopus 로고    scopus 로고
    • M.S. thesis, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
    • V. R. KONDA, Learning Algorithms for Markov Decision Processes, M.S. thesis, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 1997.
    • (1997) Learning Algorithms for Markov Decision Processes
    • Konda, V.R.1
  • 19
    • 0001000786 scopus 로고
    • Non-convergence to unstable points in urn models and stochastic approximations
    • R. PEMANTLE, Non-convergence to unstable points in urn models and stochastic approximations, Ann. Probab., 18 (1990), pp. 698-712.
    • (1990) Ann. Probab. , vol.18 , pp. 698-712
    • Pemantle, R.1
  • 20
    • 84904774796 scopus 로고
    • New method of stochastic approximation type
    • B. T. POLYAK, New method of stochastic approximation type, Automat. Remote Control, 51 (1990), pp. 937-946.
    • (1990) Automat. Remote Control , vol.51 , pp. 937-946
    • Polyak, B.T.1
  • 22
    • 0031235784 scopus 로고    scopus 로고
    • A reinforcement learning neural network for adaptive control of Markov chains
    • G. SANTHARAM AND P. S. SASTRY, A reinforcement learning neural network for adaptive control of Markov chains, IEEE Trans. Systems, Man and Cybernetics, 27 (1997), pp. 588-600.
    • (1997) IEEE Trans. Systems, Man and Cybernetics , vol.27 , pp. 588-600
    • Santharam, G.1    Sastry, P.S.2
  • 23
    • 0001821168 scopus 로고
    • Estimation and control in discounted dynamic programming
    • M. SCHÄL, Estimation and control in discounted dynamic programming, Stochastics, 20 (1987), pp. 51-71.
    • (1987) Stochastics , vol.20 , pp. 51-71
    • Schäl, M.1
  • 24
    • 0028497630 scopus 로고
    • Asynchronous stochastic approximation and Q-learning
    • J. N. TSITSIKLIS, Asynchronous stochastic approximation and Q-learning, Mach. Learning, 16 (1994), pp. 185-202.
    • (1994) Mach. Learning , vol.16 , pp. 185-202
    • Tsitsiklis, J.N.1
  • 25
    • 0029752470 scopus 로고    scopus 로고
    • Feature-based methods for large scale dynamic programming
    • J. N. TSITSIKLIS AND B. VAN ROY, Feature-based methods for large scale dynamic programming, Mach. Learning, 22 (1996), pp. 59-94.
    • (1996) Mach. Learning , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 27
    • 0342455390 scopus 로고
    • A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming
    • New Haven. CT
    • R. WILLIAMS AND L. BAIRD, A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming, in Sixth Yale Workshop on Adaptive and Learning Systems, New Haven. CT, 1990, pp. 96-101.
    • (1990) Sixth Yale Workshop on Adaptive and Learning Systems , pp. 96-101
    • Williams, R.1    Baird, L.2
  • 28
    • 84968514083 scopus 로고
    • Smoothing derivatives of functions and applications
    • F. W. WILSON, Smoothing derivatives of functions and applications, Trans. Amer. Math. Soc., 139 (1969), pp. 413-428.
    • (1969) Trans. Amer. Math. Soc. , vol.139 , pp. 413-428
    • Wilson, F.W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.