메뉴 건너뛰기




Volumn 42, Issue 4, 2003, Pages 1143-1166

On actor-critic algorithms

Author keywords

Actor critic algorithms; Markov decision processes; Reinforcement learning; Stochastic approximation

Indexed keywords

ACTOR-CRITIC ALGORITHMS; MARKOV DECISION PROCESS (MDP); REINFORCEMENT LEARNING; STOCHASTIC APPROXIMATION;

EID: 4043069840     PISSN: 03630129     EISSN: None     Source Type: Journal    
DOI: 10.1137/S0363012901385691     Document Type: Article
Times cited : (725)

References (24)
  • 1
    • 0000991156 scopus 로고
    • A new approach to the limit theory of recurrent Markov chains
    • K. B. ATHREYA AND P. NEY, A new approach to the limit theory of recurrent Markov chains, Trans. Amer. Math. Soc., 245 (1978), pp. 493-501.
    • (1978) Trans. Amer. Math. Soc. , vol.245 , pp. 493-501
    • Athreya, K.B.1    Ney, P.2
  • 6
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • V. S. BORKAR, Stochastic approximation with two time scales, Systems Control Lett., 29 (1997), pp. 291-294.
    • (1997) Systems Control Lett. , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 7
    • 0031258478 scopus 로고    scopus 로고
    • Perturbation realization, potentials, and sensitivity analysis of Markov processes
    • X. R. CAO AND H. F. CHEN, Perturbation realization, potentials, and sensitivity analysis of Markov processes, IEEE Trans. Automat. Control, 42 (1997), pp. 1382-1393.
    • (1997) IEEE Trans. Automat. Control , vol.42 , pp. 1382-1393
    • Cao, X.R.1    Chen, H.F.2
  • 8
    • 85086530043 scopus 로고
    • Stochastic approximation for Monte Carlo optimization
    • Washington, DC
    • P. W. GLYNN, Stochastic approximation for Monte Carlo optimization, in Proceedings of the 1986 Winter Simulation Conference, Washington, DC, 1986, pp. 285-289.
    • (1986) Proceedings of the 1986 Winter Simulation Conference , pp. 285-289
    • Glynn, P.W.1
  • 9
    • 0001354607 scopus 로고
    • Likelihood ratio gradient estimation for stochastic recursions
    • P. W. GLYNN AND P. L'EcuYER, Likelihood ratio gradient estimation for stochastic recursions, Adv. Appl. Probab., 27 (1995), pp. 1019-1053.
    • (1995) Adv. Appl. Probab. , vol.27 , pp. 1019-1053
    • Glynn, P.W.1    L'Ecuyer, P.2
  • 10
    • 85153938292 scopus 로고
    • Reinforcement learning algorithms for partially observable Markov decision problems
    • G. Tesauro and D. Touretzky, eds., Morgan Kaufman, San Francisco, CA
    • T. JAAKKOLA, S. P. SINGH, AND M. I. JORDAN, Reinforcement learning algorithms for partially observable Markov decision problems, in Advances in Neural Information Processing Systems 7, G. Tesauro and D. Touretzky, eds., Morgan Kaufman, San Francisco, CA, 1995, pp. 345-352.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 345-352
    • Jaakkola, T.1    Singh, S.P.2    Jordan, M.I.3
  • 11
    • 0042758707 scopus 로고    scopus 로고
    • Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
    • V. R. KONDA, Actor-Critic Algorithms, Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2002.
    • (2002) Actor-critic Algorithms
    • Konda, V.R.1
  • 12
    • 0343893613 scopus 로고    scopus 로고
    • Actor-critic-type learning algorithms for Markov decision processes
    • V. R. KONDA AND V. S. BORKAR, Actor-critic-type learning algorithms for Markov decision processes, SIAM J. Control Optim., 38 (1999), pp. 94-123.
    • (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 13
    • 84898938510 scopus 로고    scopus 로고
    • Actor-critic algorithms
    • S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA
    • V. R. KONDA AND J. N. TSITSIKLIS, Actor-critic algorithms, in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA, 2000, pp. 1008-1014.
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1008-1014
    • Konda, V.R.1    Tsitsiklis, J.N.2
  • 16
    • 0035249254 scopus 로고    scopus 로고
    • Simulation-based optimization of Markov reward processes
    • P. MARBACH AND J. N. TSITSIKLIS, Simulation-based optimization of Markov reward processes, IEEE Trans. Automat. Control, 46 (2001), pp. 191-209.
    • (2001) IEEE Trans. Automat. Control , vol.46 , pp. 191-209
    • Marbach, P.1    Tsitsiklis, J.N.2
  • 20
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA
    • R. S. SUTTON, D. MCALLESTER, S. SINGH, AND Y. MANSOUR, Policy gradient methods for reinforcement learning with function approximation, in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA, 2000, pp. 1057-1063.
    • (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
    • Sutton, R.S.1    Mcallester, D.2    Singh, S.3    Mansour, Y.4
  • 21
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J. N. TSITSIKLIS AND B. VAN ROY, An analysis of temporal-difference learning with function approximation, IEEE Trans. Automat. Control, 42 (1997), pp. 674-690.
    • (1997) IEEE Trans. Automat. Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 22
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • J. N. TSITSIKLIS AND B. VAN ROY, Average cost temporal-difference learning, Automatica J. IFAC, 35 (1999), pp. 1799-1808.
    • (1999) Automatica J. IFAC , vol.35 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 23
    • 0000337576 scopus 로고
    • Simple statistical gradient following algorithms for connectionist reinforcement learning
    • R. WILLIAMS, Simple statistical gradient following algorithms for connectionist reinforcement learning, Machine Learning, 8 (1992), pp. 229-256.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.1
  • 24
    • 84904772278 scopus 로고
    • Pseudogradient adaptation and training algorithms
    • B. T. POLYAK, Pseudogradient adaptation and training algorithms, Autom. Remote Control, 34 (1973), pp. 377-397.
    • (1973) Autom. Remote Control , vol.34 , pp. 377-397
    • Polyak, B.T.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.