SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

SIAM Journal on Control and Optimization

Volumn 40, Issue 3, 2002, Pages 681-698

Learning algorithms for Markov decision processes with average cost

(3) Abounadi, Jinane a Bertsekas, Dimitrib a Borkar, V S a

a MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Average cost control; Controlled Markov chains; Dynamic programming; Q learning; Simulation based algorithms; Stochastic approximation

Indexed keywords

COMPUTER SIMULATION; COSTS; DECISION THEORY; DYNAMIC PROGRAMMING; LEARNING ALGORITHMS; OPTIMAL CONTROL SYSTEMS;

AVERAGE COST CONTROL;

MARKOV PROCESSES;

EID: 0036287773 PISSN: 03630129 EISSN: None Source Type: Journal
DOI: 10.1137/S0363012999361974 Document Type: Article

Times cited : (195)

References (29)

1
- 0004030716
- Report LIDS-P-2433, Laboratory for Information and Decision systems, MIT, Cambridge, MA
- (1998) Stochastic Approximation for Nonexpansive Maps: Application to Q-Learning
- Abounadi, J.¹ Bertsekas, D.P.² Borkar, V.S.³

2
- 0003778897
- Springer-Verlag, Berlin, Heidelberg
- (1990) Adaptive Algorithms and Stochastic Approximations
- Benveniste, A.¹ Metivier, M.² Priouret, P.³

3
- 0020138998
- Distributed dynamic programming
- (1982) IEEE Trans. Automat. Control , vol.27 , pp. 610-616
- Bertsekas, D.P.¹

4
- 0003565783
- Athena Scientific, Belmont, MA
- (1995) Dynamic Programming and Optimal Control , vol.2
- Bertsekas, D.P.¹

5
- 0032022988
- A new value iteration method for the average cost dynamic programming problem
- (1998) SIAM J. Control Optim. , vol.36 , pp. 742-759
- Bertsekas, D.P.¹

6
- 0344672463
- Rollout algorithms for stochastic scheduling problems
- (1999) J. Heuristics , vol.5 , pp. 89-108
- Bertsekas, D.P.¹ Castanon, D.A.²

7
- 0003487482
- Athena Scientific, Belmont, MA
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

8
- 0031076413
- Stochastic approximation with two time scales
- (1996) Systems Control Lett. , vol.29 , pp. 291-294
- Borkar, V.S.¹

9
- 0009636221
- Recursive self-tuning control of finite Markov chains
- (1996) Appl. Math. (Warsaw) , vol.24 , pp. 169-188
- Borkar, V.S.¹

10
- 0032075427
- Asynchronous stochastic approximations
- (1998) SIAM J. Control Optim. , vol.36 , pp. 840-851
- Borkar, V.S.¹

11
- 0034550848
- A learning algorithm for discrete time stochastic control
- (2000) Probab. Engrg. Inform. Sci. , vol.14 , pp. 243-248
- Borkar, V.S.¹

12
- 0009617930
- On the number of samples required for Q-learning
- University of Illinois at Urbana-Champaign, Urbana-Champaign, IL
- (2000) Proceedings of the 38th Allerton Conference
- Borkar, V.S.¹

13
- 0033876515
- The ODE method for convergence of stochastic approximation and reinforcement learning
- (2000) SIAM J. Control Optim. , vol.38 , pp. 447-469
- Borkar, V.S.¹ Meyn, S.P.²

14
- 0031123471
- A new analog parallel scheme for fixed point computation, part I: Theory
- (1997) IEEE Trans. Circuits Systems I Fund. Theory Appl. , vol.44 , pp. 351-355
- Borkar, V.S.¹ Soumyanath, K.²

15
- 0000439891
- On the convergence of stochastic iterative dynamic programming algorithms
- (1994) Neural Computation , vol.6 , pp. 1185-1201
- Jaakola, T.¹ Jordan, M.I.² Singh, S.P.³

16
- 0025402469
- Adaptive control of Markov chains with local updates
- (1990) Systems Control Lett. , vol.14 , pp. 209-218
- Jalali, A.¹ Ferguson, M.²

17
- 0343893613
- Actor-critic-type learning algorithms for Markov decision processes
- (2000) SIAM J. Control Optim. , vol.38 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

18
- 0003452601
- Springer-Verlag, New York
- (1978) Stochastic Approximation Methods for Constrained and Unconstrained Systems
- Kushner, H.J.¹ Clark, D.²

19
- 0004066022
- Springer-Verlag, New York
- (2000) Stochastic Approximation Algorithms and Applications
- Kushner, H.J.¹ Yin, G.²

20
- 0029752592
- Average reward reinforcement learning: Foundations, algorithms and empirical results
- (1996) Machine Learning , vol.22 , pp. 1-38
- Mahadevan, S.¹

21
- 0003998452
- John Wiley and Sons, New York
- (1994) Markov Decision Processes
- Puterman, M.L.¹

22
- 85152626183
- A reinforcement learning method for maximizing undiscounted rewards
- Morgan Kaufmann, San Mateo
- (1993) Proceedings of the 10th International Conference on Machine Learning , pp. 298-305
- Schwartz, A.¹

23
- 0028574683
- Reinforcement learning algorithms for average payoff Markovian decision processes
- MIT Press, Cambridge, MA
- (1994) Proceedings of the 12th National Conference on Artificial Intelligence , pp. 202-207
- Singh, S.P.¹

24
- 0003428111
- John Wiley and Sons, New York
- (1986) Stochastic Modeling and Analysis: A Computational Approach
- Tijms, H.C.¹

25
- 0028497630
- Asynchronous stochastic approximation and Q-learning
- (1994) Machine Learning , vol.16 , pp. 185-202
- Tsitsiklis, J.N.¹

26
- 0029752470
- Feature-based methods for large scale dynamic programming
- (1996) Machine Learning , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

27
- 0004049893
- Ph.D. thesis, Cambridge University, Cambridge, U.K.
- (1989) Learning from Delayed Rewards
- Watkins, C.¹

28
- 34249833101
- Q-learning
- (1992) Machine Learning , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

29
- 84968514083
- Smoothing derivatives of functions and applications
- (1967) Trans. Amer. Math. Soc. , vol.139 , pp. 413-428
- Wilson, F.W.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.