SCOPUS 정보 검색 플랫폼

SIAM Journal on Control and Optimization

Volumn 38, Issue 1, 1999, Pages 94-123

Actor-critic-type learning algorithms for Markov decision processes

(2) Konda, Vijaymohan R a Borkar, Vivek S b

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

b TATA INSTITUTE OF FUNDAMENTAL RESEARCH (India)

Author keywords

[No Author keywords available]

Indexed keywords

APPROXIMATION THEORY; ARTIFICIAL INTELLIGENCE; DECISION THEORY; LEARNING ALGORITHMS;

ACTOR-CRITIC ALGORITHMS; MARKOV DECISION PROCESSES (MDP); REINFORCEMENT LEARNING;

MARKOV PROCESSES;

EID: 0343893613 PISSN: 03630129 EISSN: None Source Type: Journal
DOI: 10.1137/S036301299731669X Document Type: Article

Times cited : (242)

References (28)

1
- 0004030716
- preprint LIDS-P-2438, Lab for Info. and Decision Sciences, MIT, Cambridge, MA
- J. ABOUNADI, D. P. BERTSEKAS, AND V. S. BORKAR, Stochastic Approximation for Non-expansive Maps: Applications to Q-Learning Algorithms, preprint LIDS-P-2438, Lab for Info. and Decision Sciences, MIT, Cambridge, MA, 1988.
- (1988) Stochastic Approximation for Non-expansive Maps: Applications to Q-Learning Algorithms
- Abounadi, J.¹ Bertsekas, D.P.² Borkar, V.S.³

2
- 0003874616
- preprint LIDS-P-2434, Lab for Info. and Decision Sciences, MIT, Cambridge, MA
- J. ABOUNADI, D. P. BERTSEKAS, AND V. S. BORKAR, Learning Algorithms for Markov Decision Processes with Average Cost, preprint LIDS-P-2434, Lab for Info. and Decision Sciences, MIT, Cambridge, MA, 1988.
- (1988) Learning Algorithms for Markov Decision Processes with Average Cost
- Abounadi, J.¹ Bertsekas, D.P.² Borkar, V.S.³

3
- 0020970738
- Neuron-like elements that can solve difficult learning control problems
- A. BARTO, R. SUTTON, AND C. ANDERSON, Neuron-like elements that can solve difficult learning control problems, IEEE Trans. Systems, Man and Cybernetics, 13 (1983), pp. 835-846.
- (1983) IEEE Trans. Systems, Man and Cybernetics , vol.13 , pp. 835-846
- Barto, A.¹ Sutton, R.² Anderson, C.³

4
- 0032022988
- A new value iteration method for average cost dynamic programming problem
- D. P. BERTSEKAS, A new value iteration method for average cost dynamic programming problem, SIAM J. Control Optim., 36 (1998), pp. 742-759.
- (1998) SIAM J. Control Optim. , vol.36 , pp. 742-759
- Bertsekas, D.P.¹

5
- 0003161907
- An analysis of stochastic shortest path problem
- D. P. BERTSEKAS AND J. N. TSITSIKLIS, An analysis of stochastic shortest path problem, Math. Oper. Res., 16 (1991), pp. 580-595.
- (1991) Math. Oper. Res. , vol.16 , pp. 580-595
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

6
- 0003487482
- Belmont, MA
- D. P. BERTSEKAS AND J. N. TSITSIKLIS, Neurodynamic Programming, Athena Scientific, Belmont, MA, 1996.
- (1996) Neurodynamic Programming, Athena Scientific
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

7
- 0009636221
- Recursive self-tuning control of finite Markov chains
- V. S. BORKAR, Recursive self-tuning control of finite Markov chains, Applicationes Mathematicae, 24 (1996), pp. 169-188.
- (1996) Applicationes Mathematicae , vol.24 , pp. 169-188
- Borkar, V.S.¹

8
- 0031076413
- Stochastic approximation with two time scales
- V. S. BORKAR, Stochastic approximation with two time scales, Systems Control Lett., 29 (1996), pp. 291-294.
- (1996) Systems Control Lett. , vol.29 , pp. 291-294
- Borkar, V.S.¹

9
- 0032075427
- Asynchronous stochastic approximations
- V. S. BORKAR, Asynchronous stochastic approximations, SIAM J. Control Optim., 36 (1998), pp. 840-851.
- (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
- Borkar, V.S.¹

10
- 0031198797
- Actor-critic algorithm as multi-time scale stochastic approximation
- V. S. BORKAR AND V. R. KONDA, Actor-critic algorithm as multi-time scale stochastic approximation, Sādhanā, 22 (1997), pp. 525-543.
- (1997) Sādhanā , vol.22 , pp. 525-543
- Borkar, V.S.¹ Konda, V.R.²

11
- 85037961073
- Stability and convergence of stochastic approximation using the ODE method
- to appear
- V. S. BORKAR AND S. P. MEYN, Stability and convergence of stochastic approximation using the ODE method, SIAM J. Control Optim., to appear.
- SIAM J. Control Optim.
- Borkar, V.S.¹ Meyn, S.P.²

12
- 0031123471
- A new analog parallel scheme for fixed point computation part I: Theory
- V. S. BORKAR AND K. SOUMYANATH, A new analog parallel scheme for fixed point computation part I: Theory. IEEE Trans. Circuits Systems I Fund. Theory Appl., 44 (1997), pp. 351-355.
- (1997) IEEE Trans. Circuits Systems I Fund. Theory Appl. , vol.44 , pp. 351-355
- Borkar, V.S.¹ Soumyanath, K.²

13
- 0024909476
- Convergent activation dynamics in continuous time networks
- M. W. HIRSCH, Convergent activation dynamics in continuous time networks, Neural Networks, 2 (1989), pp. 331-349.
- (1989) Neural Networks , vol.2 , pp. 331-349
- Hirsch, M.W.¹

14
- 51249164115
- A tutorial survey of reinforcement learning
- S. S. KEERTHI AND B. RAVINDRAN, A tutorial survey of reinforcement learning, Sādhanā, 19 (1994), pp. 851-889.
- (1994) Sādhanā , vol.19 , pp. 851-889
- Keerthi, S.S.¹ Ravindran, B.²

15
- 0021501125
- Applications of singular perturbation techniques to control problems
- P. V. KOKOTOVIC, Applications of singular perturbation techniques to control problems, SIAM Rev., 26 (1984), pp. 501-550.
- (1984) SIAM Rev. , vol.26 , pp. 501-550
- Kokotovic, P.V.¹

16
- 85037958376
- M.S. thesis, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India
- V. R. KONDA, Learning Algorithms for Markov Decision Processes, M.S. thesis, Department of Computer Science and Automation, Indian Institute of Science, Bangalore, India, 1997.
- (1997) Learning Algorithms for Markov Decision Processes
- Konda, V.R.¹

17
- 0003452601
- Springer-Verlag, New York
- H. J. KUSHNER AND D. S. CLARK, Stochastic Approximation for Constrained and Unconstrained Systems, Springer-Verlag, New York, 1978.
- Stochastic Approximation for Constrained and Unconstrained Systems , pp. 1978
- Kushner, H.J.¹ Clark, D.S.²

18
- 0004239351
- North-Holland, Amsterdam
- J. NEVEU, Discrete Parameter Martingales, North-Holland, Amsterdam, 1975.
- Discrete Parameter Martingales , pp. 1975
- Neveu, J.¹

19
- 0001000786
- Non-convergence to unstable points in urn models and stochastic approximations
- R. PEMANTLE, Non-convergence to unstable points in urn models and stochastic approximations, Ann. Probab., 18 (1990), pp. 698-712.
- (1990) Ann. Probab. , vol.18 , pp. 698-712
- Pemantle, R.¹

20
- 84904774796
- New method of stochastic approximation type
- B. T. POLYAK, New method of stochastic approximation type, Automat. Remote Control, 51 (1990), pp. 937-946.
- (1990) Automat. Remote Control , vol.51 , pp. 937-946
- Polyak, B.T.¹

21
- 0003998452
- John Wiley, New York
- M. PUTERMAN, Markov Decision Processes, John Wiley, New York, 1994.
- (1994) Markov Decision Processes
- Puterman, M.¹

22
- 0031235784
- A reinforcement learning neural network for adaptive control of Markov chains
- G. SANTHARAM AND P. S. SASTRY, A reinforcement learning neural network for adaptive control of Markov chains, IEEE Trans. Systems, Man and Cybernetics, 27 (1997), pp. 588-600.
- (1997) IEEE Trans. Systems, Man and Cybernetics , vol.27 , pp. 588-600
- Santharam, G.¹ Sastry, P.S.²

23
- 0001821168
- Estimation and control in discounted dynamic programming
- M. SCHÄL, Estimation and control in discounted dynamic programming, Stochastics, 20 (1987), pp. 51-71.
- (1987) Stochastics , vol.20 , pp. 51-71
- Schäl, M.¹

24
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- J. N. TSITSIKLIS, Asynchronous stochastic approximation and Q-learning, Mach. Learning, 16 (1994), pp. 185-202.
- (1994) Mach. Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

25
- 0029752470
- Feature-based methods for large scale dynamic programming
- J. N. TSITSIKLIS AND B. VAN ROY, Feature-based methods for large scale dynamic programming, Mach. Learning, 22 (1996), pp. 59-94.
- (1996) Mach. Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

26
- 34249833101
- Q-learning
- C. WATKINS AND P. DAYAN, Q-learning, Mach. Learning, 8 (1992), pp. 279-292.
- (1992) Mach. Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

27
- 0342455390
- A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming
- New Haven. CT
- R. WILLIAMS AND L. BAIRD, A mathematical analysis of actor-critic architectures for learning optimal controls through incremental dynamic programming, in Sixth Yale Workshop on Adaptive and Learning Systems, New Haven. CT, 1990, pp. 96-101.
- (1990) Sixth Yale Workshop on Adaptive and Learning Systems , pp. 96-101
- Williams, R.¹ Baird, L.²

28
- 84968514083
- Smoothing derivatives of functions and applications
- F. W. WILSON, Smoothing derivatives of functions and applications, Trans. Amer. Math. Soc., 139 (1969), pp. 413-428.
- (1969) Trans. Amer. Math. Soc. , vol.139 , pp. 413-428
- Wilson, F.W.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.