메뉴 건너뛰기




Volumn 23, Issue 2, 2006, Pages 292-296

Unified NDP method based on TD(0) learning for both average and discounted Markov decision processes

Author keywords

Markov decision processes; Neuro dynamic programming; Performance potentials; TD(0) learning

Indexed keywords

LEARNING OPTIMIZATION PROBLEMS; MARKOV DECISION PROCESSES (MDPS); NEURO DYNAMIC PROGRAMMING (NDP); NEURO POLICY ITERATION ALGORITHM; PERFORMANCE POTENTIALS; REINFORCEMENT LEARNING (RL); TD(0) LEARNING; TEMPORAL DIFFERENCE;

EID: 33745951445     PISSN: 10008152     EISSN: None     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (6)

References (12)
  • 1
    • 0031258478 scopus 로고    scopus 로고
    • Perturbation realization, potentials and sensitivity analysis of Markov processes
    • CAO X R, CHEN H F. Perturbation realization, potentials and sensitivity analysis of Markov processes[J]. IEEE Trans on Automatic Control, 1997, 42(10): 1382-1393.
    • (1997) IEEE Trans on Automatic Control , vol.42 , Issue.10 , pp. 1382-1393
    • Cao, X.R.1    Chen, H.F.2
  • 2
    • 0032027940 scopus 로고    scopus 로고
    • The relations among potentials, perturbation analysis, and Markov decision processes
    • CAO X R. The relations among potentials, perturbation analysis, and Markov decision processes[J]. Discrete Event Dynamic Systems: Theory and Applications, 1998, 8(1): 71-78.
    • (1998) Discrete Event Dynamic Systems: Theory and Applications , vol.8 , Issue.1 , pp. 71-78
    • Cao, X.R.1
  • 5
    • 0142196586 scopus 로고    scopus 로고
    • Performance optimization of continuous-time Markov control processes based on performance potentials
    • TANG H, Xi H S, YIN B Q. Performance optimization of continuous-time Markov control processes based on performance potentials[J]. Int J of Systems Science, 2003, 34(1): 63-71.
    • (2003) Int J of Systems Science , vol.34 , Issue.1 , pp. 63-71
    • Tang, H.1    Xi, H.S.2    Yin, B.Q.3
  • 6
    • 0033247533 scopus 로고    scopus 로고
    • Single sample path-based optimization of Markov chains
    • CAO X R. Single sample path-based optimization of Markov chains[J]. J of Optimization Theory and Applications, 1999, 100(3): 527-548.
    • (1999) J of Optimization Theory and Applications , vol.100 , Issue.3 , pp. 527-548
    • Cao, X.R.1
  • 7
    • 0037289322 scopus 로고    scopus 로고
    • From Perturbation analysis to Markov decision processes and reinforcement learning
    • CAO X R. From Perturbation analysis to Markov decision processes and reinforcement learning[J]. Discrete Event Dynamic Systems: Theory and Applications, 2003, 13(1/2): 9-39.
    • (2003) Discrete Event Dynamic Systems: Theory and Applications , vol.13 , Issue.1-2 , pp. 9-39
    • Cao, X.R.1
  • 9
    • 23444449149 scopus 로고    scopus 로고
    • Performance potential-based neuro-dynamic programming for SMDPs
    • TANG H, YUAN J B, LU Y, et al. Performance potential-based neuro-dynamic programming for SMDPs[J]. Acta Automatic Sinica, 2005, 31(4): 642-645.
    • (2005) Acta Automatic Sinica , vol.31 , Issue.4 , pp. 642-645
    • Tang, H.1    Yuan, J.B.2    Lu, Y.3
  • 10
    • 0036997986 scopus 로고    scopus 로고
    • On-line optimization algorithm for Markov control processes based on a single sample path
    • TANG Hao, XI Hongsheng, YIN Baoqun. On-line optimization algorithm for Markov control processes based on a single sample path[J]. Control Theory and Application, 2002, 19(6): 863-871.
    • (2002) Control Theory and Application , vol.19 , Issue.6 , pp. 863-871
    • Tang, H.1    Xi, H.2    Yin, B.3
  • 11
    • 0038631988 scopus 로고    scopus 로고
    • Semi-Markov decision problems and performance sensitvity analysis
    • CAO X R. Semi-Markov decision problems and performance sensitvity analysis[J]. IEEE Trans on Automatic Control, 2003, 48(5): 758-769.
    • (2003) IEEE Trans on Automatic Control , vol.48 , Issue.5 , pp. 758-769
    • Cao, X.R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.