메뉴 건너뛰기




Volumn 13, Issue 1-2, 2003, Pages 9-39

From perturbation analysis to Markov decision processes and reinforcement learning

Author keywords

Gradient based policy iteration; Perturbation realization; Poisson equations; Potentials; Q learning; TD( )

Indexed keywords

CORRELATION METHODS; DECISION MAKING; DYNAMIC PROGRAMMING; GRADIENT METHODS; LEARNING SYSTEMS; MARKOV PROCESSES; OPTIMIZATION; PARAMETER ESTIMATION; PERTURBATION TECHNIQUES; POISSON EQUATION; SENSITIVITY ANALYSIS; STATE SPACE METHODS;

EID: 0037289322     PISSN: 09246703     EISSN: None     Source Type: Journal    
DOI: 10.1023/A:1022188803039     Document Type: Article
Times cited : (46)

References (43)
  • 1
    • 0003874616 scopus 로고    scopus 로고
    • Learning algorithms for Markov decision processes with average cost
    • Report LIDS-P-2434, Lab. for Info. and Decision Systems, October
    • Abounadi, J., Bertsekas, D., and Borkar, V. Learning algorithms for Markov decision processes with average cost. Report LIDS-P-2434, Lab. for Info. and Decision Systems, October 1998; to appear in SIAM J. on Control and Optimization.
    • (1998) SIAM J. on Control and Optimization
    • Abounadi, J.1    Bertsekas, D.2    Borkar, V.3
  • 7
    • 0003239419 scopus 로고
    • Nonnegative matrices in the mathematical sciences
    • Philadelphia
    • Berman, A., and Plemmons, R. J. 1994. Nonnegative matrices in the mathematical sciences. SIAM, Philadelphia.
    • (1994) SIAM
    • Berman, A.1    Plemmons, R.J.2
  • 10
    • 0032027940 scopus 로고    scopus 로고
    • The relation among potentials, perturbation analysis, Markov decision processes, and other topics
    • Cao, X. R. 1998. The relation among potentials, perturbation analysis, Markov decision processes, and other topics. Journal of Discrete Event Dynamic Systems 8: 71-87.
    • (1998) Journal of Discrete Event Dynamic Systems , vol.8 , pp. 71-87
    • Cao, X.R.1
  • 11
    • 0033247533 scopus 로고    scopus 로고
    • Single sample path-based optimization of Markov chains
    • Cao, X. R. 1999. Single sample path-based optimization of markov chains. Journal of Optimization: Theory and Application 100: 527-548.
    • (1999) Journal of Optimization: Theory and Application , vol.100 , pp. 527-548
    • Cao, X.R.1
  • 12
    • 0033884215 scopus 로고    scopus 로고
    • A unified approach to Markov decision problems and performance sensitivity analysis
    • Cao, X. R. 2000. A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36: 771-774.
    • (2000) Automatica , vol.36 , pp. 771-774
    • Cao, X.R.1
  • 13
    • 0031258478 scopus 로고    scopus 로고
    • Potentials, perturbation realization, and sensitivity analysis of Markov processes
    • Cao, X. R., and Chen, H. F. 1997. Potentials, perturbation realization, and sensitivity analysis of Markov processes. IEEE Transactions on AC 42: 1382-1393.
    • (1997) IEEE Transactions on AC , vol.42 , pp. 1382-1393
    • Cao, X.R.1    Chen, H.F.2
  • 14
    • 0036604532 scopus 로고    scopus 로고
    • A time aggregation approach to Markov decision processes
    • Cao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and Marcus, S. 2002. A time aggregation approach to Markov decision processes. Automatica 38: 929-943.
    • (2002) Automatica , vol.38 , pp. 929-943
    • Cao, X.R.1    Ren, Z.Y.2    Bhatnagar, S.3    Fu, M.4    Marcus, S.5
  • 16
    • 0032122986 scopus 로고    scopus 로고
    • Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
    • Cao, X. R., and Wan, Y. W. 1998. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Transactions on Control Systems Technology 6: 482-494.
    • (1998) IEEE Transactions on Control Systems Technology , vol.6 , pp. 482-494
    • Cao, X.R.1    Wan, Y.W.2
  • 18
    • 0002322904 scopus 로고
    • Convergence of recursive optimization algorithms using infinitesimal perturbation analysis estimates
    • Chong, E. K. P., and Ramadge, P. J. 1992. Convergence of recursive optimization algorithms using infinitesimal perturbation analysis estimates. Journal of Discrete Event Dynamic Systems 1: 339-372.
    • (1992) Journal of Discrete Event Dynamic Systems , vol.1 , pp. 339-372
    • Chong, E.K.P.1    Ramadge, P.J.2
  • 20
    • 0013501772 scopus 로고    scopus 로고
    • Single sample path-based recursive algorithms for Markov decision processes
    • submitted
    • Fang, H. T., and Cao, X. R. Single sample path-based recursive algorithms for Markov decision processes. IEEE Trans. on Automatic Control, submitted.
    • IEEE Trans. on Automatic Control
    • Fang, H.T.1    Cao, X.R.2
  • 21
    • 0041648459 scopus 로고    scopus 로고
    • Feinberg, E. A., and Adam Shwartz (ed.); Kluwer
    • Feinberg, E. A., and Adam Shwartz (ed.) 2002. Handbook of Markov Decision Processes. Kluwer, 2002.
    • (2002) Handbook of Markov Decision Processes
  • 22
    • 0025417457 scopus 로고
    • Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis
    • Fu, M. C. 1990. Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis. Journal of Optimization Theory and Applications 65: 149-160.
    • (1990) Journal of Optimization Theory and Applications , vol.65 , pp. 149-160
    • Fu, M.C.1
  • 25
    • 0030522182 scopus 로고    scopus 로고
    • A Lyapunov bound for solutions of Poisson's equation
    • Glynn, P. W., and Meyn, S. P. 1996. A Lyapunov bound for solutions of Poisson's equation. Ann. Probab. 24: 916-931.
    • (1996) Ann. Probab. , vol.24 , pp. 916-931
    • Glynn, P.W.1    Meyn, S.P.2
  • 28
    • 0020802518 scopus 로고
    • Perturbation analysis and optimization of queueing networks
    • Ho, Y. C., and Cao, X. R. 1983. Perturbation analysis and optimization of queueing networks. Journal of Optimization Theory and Applications 40(4): 559-582.
    • (1983) Journal of Optimization Theory and Applications , vol.40 , Issue.4 , pp. 559-582
    • Ho, Y.C.1    Cao, X.R.2
  • 31
    • 0343893613 scopus 로고
    • Actor-critic like learning algorithms for Markov decision processes
    • Konda, V. R., and Borkar, V. S. 1990. Actor-critic like learning algorithms for Markov decision processes. SIAM Journal on Control and Optimization 38: 94-123.
    • (1990) SIAM Journal on Control and Optimization , vol.38 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 34
    • 0031344030 scopus 로고    scopus 로고
    • The policy improvement algorithm for Markov decision processes with general state space
    • Meyn, S. P. 1997. The policy improvement algorithm for Markov decision processes with general state space. IEEE Transactions on Automatic Control 42: 1663-1680.
    • (1997) IEEE Transactions on Automatic Control , vol.42 , pp. 1663-1680
    • Meyn, S.P.1
  • 39
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3: 835-846.
    • (1988) Machine Learning , vol.3 , pp. 835-846
    • Sutton, R.S.1
  • 41
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • Tsitsiklis, J. N., and Van Roy, B. 1999. Average cost temporal-difference learning. Automatica 35: 1799-1808.
    • (1999) Automatica , vol.35 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 42
    • 0032073173 scopus 로고    scopus 로고
    • Centralized and decentralized asynchronous optimization of stochastic discrete event systems
    • Vazquez-Abad, F. J., Cassandras, C. G., and Julka, V. 1998. Centralized and decentralized asynchronous optimization of stochastic discrete event systems. IEEE Transactions on Automatic Control 43: 631-655.
    • (1998) IEEE Transactions on Automatic Control , vol.43 , pp. 631-655
    • Vazquez-Abad, F.J.1    Cassandras, C.G.2    Julka, V.3
  • 43
    • 0026255225 scopus 로고
    • Performance gradient estimation for very large finite Markov chains
    • Zhang, B., and Ho, Y. C. 1991. Performance gradient estimation for very large finite Markov chains. IEEE Transactions on Automatic Control 36: 1218-1227.
    • (1991) IEEE Transactions on Automatic Control , vol.36 , pp. 1218-1227
    • Zhang, B.1    Ho, Y.C.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.