SCOPUS 정보 검색 플랫폼

Discrete Event Dynamic Systems: Theory and Applications

Volumn 13, Issue 1-2, 2003, Pages 9-39

From perturbation analysis to Markov decision processes and reinforcement learning

(1) Cao, Xi Ren a

a HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY (Hong Kong)

Author keywords

Gradient based policy iteration; Perturbation realization; Poisson equations; Potentials; Q learning; TD( )

Indexed keywords

CORRELATION METHODS; DECISION MAKING; DYNAMIC PROGRAMMING; GRADIENT METHODS; LEARNING SYSTEMS; MARKOV PROCESSES; OPTIMIZATION; PARAMETER ESTIMATION; PERTURBATION TECHNIQUES; POISSON EQUATION; SENSITIVITY ANALYSIS; STATE SPACE METHODS;

GRADIENT BASED POLICY ITERATION; MARKOV DECISION PROCESSES; NEURODYNAMIC PROGRAMMING; PERTURBATION ANALYSIS; PERTURBATION REALIZATION; Q-LEARNING; REINFORCEMENT LEARNING;

DECISION SUPPORT SYSTEMS;

EID: 0037289322 PISSN: 09246703 EISSN: None Source Type: Journal
DOI: 10.1023/A:1022188803039 Document Type: Article

Times cited : (46)

References (43)

1
- 0003874616
- Learning algorithms for Markov decision processes with average cost
- Report LIDS-P-2434, Lab. for Info. and Decision Systems, October
- Abounadi, J., Bertsekas, D., and Borkar, V. Learning algorithms for Markov decision processes with average cost. Report LIDS-P-2434, Lab. for Info. and Decision Systems, October 1998; to appear in SIAM J. on Control and Optimization.
- (1998) SIAM J. on Control and Optimization
- Abounadi, J.¹ Bertsekas, D.² Borkar, V.³

2
- 0003989208
- Chapman Hall/CRC
- Altman, E. 1999. Constrained Markov Decision Processes, Chapman Hall/CRC.
- (1999) Constrained Markov Decision Processes
- Altman, E.¹

3
- 0020970738
- Neuron-like elements that can solve difficult learning control problems
- Barto, A., Sutton, R., and Anderson, C. 1983. Neuron-like elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13: 835-846.
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , pp. 835-846
- Barto, A.¹ Sutton, R.² Anderson, C.³

4
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter, J., and Bartlett, P. L. 2001. Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15: 319-350.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

5
- 0013495368
- Experiments with infinite-horizon policy-gradient estimation
- Baxter, J., Bartlett, P. L., and Weaver, L. 2001. Experiments with infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15: 351-381.
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.L.² Weaver, L.³

6
- 0003565783
- Athena Scientific, Belmont, Massachusetts
- Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control. Vols. I. II. Athena Scientific, Belmont, Massachusetts.
- (1995) Dynamic Programming and Optimal Control , vol.1-2
- Bertsekas, D.P.¹

7
- 0003239419
- Nonnegative matrices in the mathematical sciences
- Philadelphia
- Berman, A., and Plemmons, R. J. 1994. Nonnegative matrices in the mathematical sciences. SIAM, Philadelphia.
- (1994) SIAM
- Berman, A.¹ Plemmons, R.J.²

8
- 0003487482
- Athena Scientific, Belmont, Massachusetts
- Bertsekas, D. P., and Tsitsiklis, T. N. 1996. Neuro-Dynamic Programming. Athena Scientific, Belmont, Massachusetts.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, T.N.²

9
- 0003983929
- Springer-Verlag, New York
- Cao, X. R. 1994. Realization Probabilities: The Dynamics of Queueing Systems. Springer-Verlag, New York.
- (1994) Realization Probabilities: The Dynamics of Queueing Systems
- Cao, X.R.¹

10
- 0032027940
- The relation among potentials, perturbation analysis, Markov decision processes, and other topics
- Cao, X. R. 1998. The relation among potentials, perturbation analysis, Markov decision processes, and other topics. Journal of Discrete Event Dynamic Systems 8: 71-87.
- (1998) Journal of Discrete Event Dynamic Systems , vol.8 , pp. 71-87
- Cao, X.R.¹

11
- 0033247533
- Single sample path-based optimization of Markov chains
- Cao, X. R. 1999. Single sample path-based optimization of markov chains. Journal of Optimization: Theory and Application 100: 527-548.
- (1999) Journal of Optimization: Theory and Application , vol.100 , pp. 527-548
- Cao, X.R.¹

12
- 0033884215
- A unified approach to Markov decision problems and performance sensitivity analysis
- Cao, X. R. 2000. A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36: 771-774.
- (2000) Automatica , vol.36 , pp. 771-774
- Cao, X.R.¹

13
- 0031258478
- Potentials, perturbation realization, and sensitivity analysis of Markov processes
- Cao, X. R., and Chen, H. F. 1997. Potentials, perturbation realization, and sensitivity analysis of Markov processes. IEEE Transactions on AC 42: 1382-1393.
- (1997) IEEE Transactions on AC , vol.42 , pp. 1382-1393
- Cao, X.R.¹ Chen, H.F.²

14
- 0036604532
- A time aggregation approach to Markov decision processes
- Cao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and Marcus, S. 2002. A time aggregation approach to Markov decision processes. Automatica 38: 929-943.
- (2002) Automatica , vol.38 , pp. 929-943
- Cao, X.R.¹ Ren, Z.Y.² Bhatnagar, S.³ Fu, M.⁴ Marcus, S.⁵

15
- 0036992818
- Gradient-based policy iteration: An example
- Cao, X. R., and Fang, H. T. Gradient-based policy iteration: an example. To appear in 2002 IEEE Conference on Decision and Control.
- 2002 IEEE Conference on Decision and Control
- Cao, X.R.¹ Fang, H.T.²

16
- 0032122986
- Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
- Cao, X. R., and Wan, Y. W. 1998. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Transactions on Control Systems Technology 6: 482-494.
- (1998) IEEE Transactions on Control Systems Technology , vol.6 , pp. 482-494
- Cao, X.R.¹ Wan, Y.W.²

17
- 0003864139
- Kluwer Academic Publishers
- Cassandras, C. G., and Lafortune, S. 1999. Introduction to Discrete Event Systems. Kluwer Academic Publishers.
- (1999) Introduction to Discrete Event Systems
- Cassandras, C.G.¹ Lafortune, S.²

18
- 0002322904
- Convergence of recursive optimization algorithms using infinitesimal perturbation analysis estimates
- Chong, E. K. P., and Ramadge, P. J. 1992. Convergence of recursive optimization algorithms using infinitesimal perturbation analysis estimates. Journal of Discrete Event Dynamic Systems 1: 339-372.
- (1992) Journal of Discrete Event Dynamic Systems , vol.1 , pp. 339-372
- Chong, E.K.P.¹ Ramadge, P.J.²

19
- 0003745958
- Prentice Hall, Englewood cliffs, NJ
- Çinlar, E. 1975. Introduction to Stochastic Processes. Prentice Hall, Englewood cliffs, NJ.
- (1975) Introduction to Stochastic Processes
- Çinlar, E.¹

20
- 0013501772
- Single sample path-based recursive algorithms for Markov decision processes
- submitted
- Fang, H. T., and Cao, X. R. Single sample path-based recursive algorithms for Markov decision processes. IEEE Trans. on Automatic Control, submitted.
- IEEE Trans. on Automatic Control
- Fang, H.T.¹ Cao, X.R.²

21
- 0041648459
- Feinberg, E. A., and Adam Shwartz (ed.); Kluwer
- Feinberg, E. A., and Adam Shwartz (ed.) 2002. Handbook of Markov Decision Processes. Kluwer, 2002.
- (2002) Handbook of Markov Decision Processes

22
- 0025417457
- Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis
- Fu, M. C. 1990. Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis. Journal of Optimization Theory and Applications 65: 149-160.
- (1990) Journal of Optimization Theory and Applications , vol.65 , pp. 149-160
- Fu, M.C.¹

23
- 0003400137
- Kluwer Academic Publishers, Boston
- Fu, M. C. and Hu, J. Q. 1997. Conditional Monte Carlo: Gradient Estimation and Optimization Applications. Kluwer Academic Publishers, Boston.
- (1997) Conditional Monte Carlo: Gradient Estimation and Optimization Applications
- Fu, M.C.¹ Hu, J.Q.²

24
- 0003526604
- Kluwer Academic Publishers, Boston
- Glasserman, P. 1991. Gradient Estimation Via Perturbation Analysis. Kluwer Academic Publishers, Boston.
- (1991) Gradient Estimation Via Perturbation Analysis
- Glasserman, P.¹

25
- 0030522182
- A Lyapunov bound for solutions of Poisson's equation
- Glynn, P. W., and Meyn, S. P. 1996. A Lyapunov bound for solutions of Poisson's equation. Ann. Probab. 24: 916-931.
- (1996) Ann. Probab. , vol.24 , pp. 916-931
- Glynn, P.W.¹ Meyn, S.P.²

26
- 0000405299
- Smoothed perturbation analysis of discrete event systems
- Gong, W. B., and Ho, Y. C. 1987. Smoothed perturbation analysis of discrete event systems. IEEE Transactions on Control Systems Technology 32: 858-866.
- (1987) IEEE Transactions on Control Systems Technology , vol.32 , pp. 858-866
- Gong, W.B.¹ Ho, Y.C.²

27
- 0003585978
- Kluwer Academic Publisher, Boston
- Ho, Y. C., and Cao, X. R. 1991. Perturbation Analysis of Discrete-Event Dynamic Systems. Kluwer Academic Publisher, Boston.
- (1991) Perturbation Analysis of Discrete-Event Dynamic Systems
- Ho, Y.C.¹ Cao, X.R.²

28
- 0020802518
- Perturbation analysis and optimization of queueing networks
- Ho, Y. C., and Cao, X. R. 1983. Perturbation analysis and optimization of queueing networks. Journal of Optimization Theory and Applications 40(4): 559-582.
- (1983) Journal of Optimization Theory and Applications , vol.40 , Issue.4 , pp. 559-582
- Ho, Y.C.¹ Cao, X.R.²

29
- 0032073263
- Planning and acting in partially observable stochastic domains
- Kaelbling, L. P., Littman, M. L., and Cassandra, A. R. 1998. Planning and acting in partially observable stochastic domains. Artificial Intelligence 101.
- (1998) Artificial Intelligence , pp. 101
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

30
- 0003979966
- Van Nostrand, New York
- Kemeny, J. G., and Snell, J. L. 1960. Finite Markov Chains. Van Nostrand, New York.
- (1960) Finite Markov Chains
- Kemeny, J.G.¹ Snell, J.L.²

31
- 0343893613
- Actor-critic like learning algorithms for Markov decision processes
- Konda, V. R., and Borkar, V. S. 1990. Actor-critic like learning algorithms for Markov decision processes. SIAM Journal on Control and Optimization 38: 94-123.
- (1990) SIAM Journal on Control and Optimization , vol.38 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

32
- 0000142953
- Actor-critic algorithms
- February
- Konda, V. R., and Tsitsiklis, J. N. 2001. Actor-critic Algorithms. Submitted to SIAM Journal on Control and Optimisation, February.
- (2001) SIAM Journal on Control and Optimisation
- Konda, V.R.¹ Tsitsiklis, J.N.²

33
- 0035249254
- Simulation-based optimization of Markov reward processes
- Marbach, P., and Tsitsiklis, T. N. 2001. Simulation-based optimization of Markov reward processes. IEEE Transactions on Automatic Control 46: 191-209.
- (2001) IEEE Transactions on Automatic Control , vol.46 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, T.N.²

34
- 0031344030
- The policy improvement algorithm for Markov decision processes with general state space
- Meyn, S. P. 1997. The policy improvement algorithm for Markov decision processes with general state space. IEEE Transactions on Automatic Control 42: 1663-1680.
- (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 1663-1680
- Meyn, S.P.¹

35
- 0003637131
- Springer-Verlag, London
- Meyn, S. P., and Tweedie, R. L. 1993. Markov Chains and Stochastic Stability. Springer-Verlag, London.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

36
- 85102627959
- Wiley, New York
- Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, New York.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

37
- 0028574683
- Reinforcement learning algorithms for average-payoff Markovain decision processes
- Singh, S. P. 1994. Reinforcement learning algorithms for average-payoff Markovain decision processes. Proceedings of the Twelfth National Conference on Artificial Intelligence 202-207.
- (1994) Proceedings of the Twelfth National Conference on Artificial Intelligence , pp. 202-207
- Singh, S.P.¹

38
- 0001898381
- Practical reinforcement learning in continuous spaces
- Smart, W. D., and Kaelbling, L. P. 2000. Practical reinforcement learning in continuous spaces. Proceedings of the Seventeenth International Conference on Machine Learning.
- (2000) Proceedings of the Seventeenth International Conference on Machine Learning
- Smart, W.D.¹ Kaelbling, L.P.²

39
- 33847202724
- Learning to predict by the methods of temporal differences
- Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3: 835-846.
- (1988) Machine Learning , vol.3 , pp. 835-846
- Sutton, R.S.¹

40
- 0004102479
- MIT Press, Cambridge, MA
- Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

41
- 0033221519
- Average cost temporal-difference learning
- Tsitsiklis, J. N., and Van Roy, B. 1999. Average cost temporal-difference learning. Automatica 35: 1799-1808.
- (1999) Automatica , vol.35 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

42
- 0032073173
- Centralized and decentralized asynchronous optimization of stochastic discrete event systems
- Vazquez-Abad, F. J., Cassandras, C. G., and Julka, V. 1998. Centralized and decentralized asynchronous optimization of stochastic discrete event systems. IEEE Transactions on Automatic Control 43: 631-655.
- (1998) IEEE Transactions on Automatic Control , vol.43 , pp. 631-655
- Vazquez-Abad, F.J.¹ Cassandras, C.G.² Julka, V.³

43
- 0026255225
- Performance gradient estimation for very large finite Markov chains
- Zhang, B., and Ho, Y. C. 1991. Performance gradient estimation for very large finite Markov chains. IEEE Transactions on Automatic Control 36: 1218-1227.
- (1991) IEEE Transactions on Automatic Control , vol.36 , pp. 1218-1227
- Zhang, B.¹ Ho, Y.C.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.