SCOPUS 정보 검색 플랫폼

SIAM Journal on Control and Optimization

Volumn 42, Issue 4, 2003, Pages 1143-1166

On actor-critic algorithms

(2) Konda, Vijay R a Tsitsiklis, John N a

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Actor critic algorithms; Markov decision processes; Reinforcement learning; Stochastic approximation

Indexed keywords

ACTOR-CRITIC ALGORITHMS; MARKOV DECISION PROCESS (MDP); REINFORCEMENT LEARNING; STOCHASTIC APPROXIMATION;

APPROXIMATION THEORY; COMPUTER SIMULATION; CONVERGENCE OF NUMERICAL METHODS; FUNCTIONS; GRADIENT METHODS; ITERATIVE METHODS; LEARNING SYSTEMS; MARKOV PROCESSES; OPTIMIZATION; PARAMETER ESTIMATION; RANDOM PROCESSES; STOCHASTIC CONTROL SYSTEMS;

LEARNING ALGORITHMS;

EID: 4043069840 PISSN: 03630129 EISSN: None Source Type: Journal
DOI: 10.1137/S0363012901385691 Document Type: Article

Times cited : (725)

References (24)

1
- 0000991156
- A new approach to the limit theory of recurrent Markov chains
- K. B. ATHREYA AND P. NEY, A new approach to the limit theory of recurrent Markov chains, Trans. Amer. Math. Soc., 245 (1978), pp. 493-501.
- (1978) Trans. Amer. Math. Soc. , vol.245 , pp. 493-501
- Athreya, K.B.¹ Ney, P.²

2
- 0020970738
- Neuron-like elements that can solve difficult learning control problems
- A. BARTO, R. SUTTON, AND C. ANDERSON, Neuron-like elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man and Cybernetics, 13 (1983), pp. 835-846.
- (1983) IEEE Transactions on Systems, Man and Cybernetics , vol.13 , pp. 835-846
- Barto, A.¹ Sutton, R.² Anderson, C.³

3
- 0003778897
- Springer-Verlag, Berlin, Heidelberg
- A. BENVENISTE, M. METIVIER, AND P. PRIOURET, Adaptive Algorithms and Stochastic Approximations, Springer-Verlag, Berlin, Heidelberg, 1990.
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

4
- 0003565783
- Athena Scientific, Belmont, MA
- D. P. BERTSEKAS, Dynamic Programming and Optimal Control, Athena Scientific, Belmont, MA, 1995.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

5
- 0003487482
- Athena Scientific, Belmont, MA
- D. P. BERTSEKAS AND J. N. TSITSIKLIS, Neuro-Dynamic Programming, Athena Scientific, Belmont, MA, 1996.
- (1996) Neuro-dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 0031076413
- Stochastic approximation with two time scales
- V. S. BORKAR, Stochastic approximation with two time scales, Systems Control Lett., 29 (1997), pp. 291-294.
- (1997) Systems Control Lett. , vol.29 , pp. 291-294
- Borkar, V.S.¹

7
- 0031258478
- Perturbation realization, potentials, and sensitivity analysis of Markov processes
- X. R. CAO AND H. F. CHEN, Perturbation realization, potentials, and sensitivity analysis of Markov processes, IEEE Trans. Automat. Control, 42 (1997), pp. 1382-1393.
- (1997) IEEE Trans. Automat. Control , vol.42 , pp. 1382-1393
- Cao, X.R.¹ Chen, H.F.²

8
- 85086530043
- Stochastic approximation for Monte Carlo optimization
- Washington, DC
- P. W. GLYNN, Stochastic approximation for Monte Carlo optimization, in Proceedings of the 1986 Winter Simulation Conference, Washington, DC, 1986, pp. 285-289.
- (1986) Proceedings of the 1986 Winter Simulation Conference , pp. 285-289
- Glynn, P.W.¹

9
- 0001354607
- Likelihood ratio gradient estimation for stochastic recursions
- P. W. GLYNN AND P. L'EcuYER, Likelihood ratio gradient estimation for stochastic recursions, Adv. Appl. Probab., 27 (1995), pp. 1019-1053.
- (1995) Adv. Appl. Probab. , vol.27 , pp. 1019-1053
- Glynn, P.W.¹ L'Ecuyer, P.²

10
- 85153938292
- Reinforcement learning algorithms for partially observable Markov decision problems
- G. Tesauro and D. Touretzky, eds., Morgan Kaufman, San Francisco, CA
- T. JAAKKOLA, S. P. SINGH, AND M. I. JORDAN, Reinforcement learning algorithms for partially observable Markov decision problems, in Advances in Neural Information Processing Systems 7, G. Tesauro and D. Touretzky, eds., Morgan Kaufman, San Francisco, CA, 1995, pp. 345-352.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 345-352
- Jaakkola, T.¹ Singh, S.P.² Jordan, M.I.³

11
- 0042758707
- Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology
- V. R. KONDA, Actor-Critic Algorithms, Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, 2002.
- (2002) Actor-critic Algorithms
- Konda, V.R.¹

12
- 0343893613
- Actor-critic-type learning algorithms for Markov decision processes
- V. R. KONDA AND V. S. BORKAR, Actor-critic-type learning algorithms for Markov decision processes, SIAM J. Control Optim., 38 (1999), pp. 94-123.
- (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

13
- 84898938510
- Actor-critic algorithms
- S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA
- V. R. KONDA AND J. N. TSITSIKLIS, Actor-critic algorithms, in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA, 2000, pp. 1008-1014.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1008-1014
- Konda, V.R.¹ Tsitsiklis, J.N.²

14
- 4043079876
- submitted
- V. R. KONDA AND J. N. TSITSIKLIS, Linear stochastic approximation driven by slowly varying Markov chains, 2002, submitted.
- (2002) Linear Stochastic Approximation Driven by Slowly Varying Markov Chains
- Konda, V.R.¹ Tsitsiklis, J.N.²

15
- 84862398723
- July
- V. R. KONDA AND J. N. TSITSIKLIS, Appendix to "On Actor-critic algorithms,"http://web.mit.edu/jnt/www/Papers.html/actor-app.pdf, July 2002.
- (2002) Appendix to "On Actor-critic Algorithms,"
- Konda, V.R.¹ Tsitsiklis, J.N.²

16
- 0035249254
- Simulation-based optimization of Markov reward processes
- P. MARBACH AND J. N. TSITSIKLIS, Simulation-based optimization of Markov reward processes, IEEE Trans. Automat. Control, 46 (2001), pp. 191-209.
- (2001) IEEE Trans. Automat. Control , vol.46 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, J.N.²

17
- 0003637131
- Springer-Verlag, London
- S. P. MEYN AND R. L. TWEEDIE, Markov Chains and Stochastic Stability, Springer-Verlag, London, 1993.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

18
- 0000427555
- A splitting technique for Harris recurrent chains
- E. NUMMELIN, A splitting technique for Harris recurrent chains, Z. Wahrscheinlichkeitstheorie and Verw. Geb., 43 (1978), pp. 119-143.
- (1978) Z. Wahrscheinlichkeitstheorie and Verw. Geb. , vol.43 , pp. 119-143
- Nummelin, E.¹

19
- 0004102479
- MIT Press, Cambridge, MA
- R. BUTTON AND A. BARTO, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
- (1998) Reinforcement Learning: An Introduction
- Button, R.¹ Barto, A.²

20
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA
- R. S. SUTTON, D. MCALLESTER, S. SINGH, AND Y. MANSOUR, Policy gradient methods for reinforcement learning with function approximation, in Advances in Neural Information Processing Systems 12, S. A. Solla, T. K. Leen, and K.-R. Muller, eds., MIT Press, Cambridge, MA, 2000, pp. 1057-1063.
- (2000) Advances in Neural Information Processing Systems , vol.12 , pp. 1057-1063
- Sutton, R.S.¹ Mcallester, D.² Singh, S.³ Mansour, Y.⁴

21
- 0031143730
- An analysis of temporal-difference learning with function approximation
- J. N. TSITSIKLIS AND B. VAN ROY, An analysis of temporal-difference learning with function approximation, IEEE Trans. Automat. Control, 42 (1997), pp. 674-690.
- (1997) IEEE Trans. Automat. Control , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

22
- 0033221519
- Average cost temporal-difference learning
- J. N. TSITSIKLIS AND B. VAN ROY, Average cost temporal-difference learning, Automatica J. IFAC, 35 (1999), pp. 1799-1808.
- (1999) Automatica J. IFAC , vol.35 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

23
- 0000337576
- Simple statistical gradient following algorithms for connectionist reinforcement learning
- R. WILLIAMS, Simple statistical gradient following algorithms for connectionist reinforcement learning, Machine Learning, 8 (1992), pp. 229-256.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.¹

24
- 84904772278
- Pseudogradient adaptation and training algorithms
- B. T. POLYAK, Pseudogradient adaptation and training algorithms, Autom. Remote Control, 34 (1973), pp. 377-397.
- (1973) Autom. Remote Control , vol.34 , pp. 377-397
- Polyak, B.T.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.