SCOPUS 정보 검색 플랫폼

Discrete Event Dynamic Systems: Theory and Applications

Volumn 15, Issue 2, 2005, Pages 169-197

Basic ideas for event-based optimization of markov systems

a HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY (Hong Kong)

Author keywords

Aggregation; Markov decision processes (MDPs); Performance potentials; Perturbation analysis; Policy gradients; Policy iteration; POMDPs

Indexed keywords

AGGLOMERATION; MARKOV PROCESSES; MATRIX ALGEBRA; OPTIMIZATION; PERTURBATION TECHNIQUES; PROBABILITY; PROBLEM SOLVING;

MARKOV DECISION PROCESSES (MDP); PERFORMANCE POTENTIAL; POLICY GRADIENTS; POLICY ITERATION; POMDP;

DISCRETE TIME CONTROL SYSTEMS;

EID: 14644388113 PISSN: 09246703 EISSN: None Source Type: Journal
DOI: 10.1007/s10626-004-6211-4 Document Type: Article

Times cited : (91)

References (29)

1
- 0037288370
- Recent advances in hierarchical reinforcement learning, special issue on reinforcement learning
- Barto, A., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning, special issue on reinforcement learning. Discret. Event Dyn. Syst. Theory Appl. 13: 41-77.
- (2003) Discret. Event Dyn. Syst. Theory Appl. , vol.13 , pp. 41-77
- Barto, A.¹ Mahadevan, S.²

2
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter, J., and Bartlett, P. L. 2001. Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 319-350.
- (2001) J. Artif. Intell. Res. , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.L.²

3
- 0013495368
- Experiments with infinite-horizon policy-gradient estimation
- Baxter, J., Bartlett, P. L., and Weaver, L. 2001. Experiments with infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 351-381.
- (2001) J. Artif. Intell. Res. , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.L.² Weaver, L.³

4
- 0003565783
- Belmont, MA: Athena Scientific
- Bertsekas, D. P. 1995. Dynamic Programming and Optimal Control, Volume I and II. Belmont, MA: Athena Scientific.
- (1995) Dynamic Programming and Optimal Control, Volume I and II , vol.1-2
- Bertsekas, D.P.¹

5
- 0003983929
- New York: Springer-Verlag
- Cao, X. R. 1994. Realization Probabilities: The Dynamics of Queueing Systems. New York: Springer-Verlag.
- (1994) Realization Probabilities: the Dynamics of Queueing Systems
- Cao, X.R.¹

6
- 0032027940
- The relation among potentials, perturbation analysis, Markov decision processes, and other topics
- Cao, X. R. 1998. The relation among potentials, perturbation analysis, Markov decision processes, and other topics. J. Discret. Event Dyn. Syst. 8: 71-87.
- (1998) J. Discret. Event Dyn. Syst. , vol.8 , pp. 71-87
- Cao, X.R.¹

7
- 0033247533
- Single sample path based optimization of Markov chains
- Cao, X. R. 1999. Single sample path based optimization of Markov chains. J. Optim. Theory Appl. 100(3): 527-548.
- (1999) J. Optim. Theory Appl. , vol.100 , Issue.3 , pp. 527-548
- Cao, X.R.¹

8
- 0033884215
- A unified approach to Markov decision problems and performance sensitivity analysis
- Cao, X. R. 2000. A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36: 771-774.
- (2000) Automatica , vol.36 , pp. 771-774
- Cao, X.R.¹

9
- 11044222936
- The potential structure of sample paths and performance sensitivities of Markov systems
- Cao, X. R. 2004a. The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans. Automat. Contr. 49: 2129-2142.
- (2004) IEEE Trans. Automat. Contr. , vol.49 , pp. 2129-2142
- Cao, X.R.¹

10
- 14644391675
- A basic formula for on-line policy gradient algorithms
- to appear
- Cao, X. R. 2004b. A basic formula for on-line policy gradient algorithms. IEEE Trans. Automat. Contr. to appear.
- (2004) IEEE Trans. Automat. Contr
- Cao, X.R.¹

11
- 14644392972
- Manuscript to be submitted
- Cao, X. R. 2004c. Event-based optimization of Markov systems. Manuscript to be submitted.
- (2004) Event-based Optimization of Markov Systems
- Cao, X.R.¹

12
- 0031258478
- Perturbation realization, potentials and sensitivity analysis of Markov processes
- Cao, X. R., and Chen, H. F. 1997. Perturbation realization, potentials and sensitivity analysis of Markov processes. IEEE Trans. Automat. Contr. 42: 1382-1393.
- (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1382-1393
- Cao, X.R.¹ Chen, H.F.²

13
- 3843150404
- A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: Multichain cases
- Cao, X. R., and Guo, X. 2004. A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: Multichain cases. Automatica 40: 1749-1759.
- (2004) Automatica , vol.40 , pp. 1749-1759
- Cao, X.R.¹ Guo, X.²

14
- 0032122986
- Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
- Cao, X. R., and Wan, Y. W. 1998. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans. Control Syst. Technol. 6: 482-494.
- (1998) IEEE Trans. Control Syst. Technol. , vol.6 , pp. 482-494
- Cao, X.R.¹ Wan, Y.W.²

15
- 0030409198
- A single sample path-based performance sensitivity formula for Markov chains
- Cao, X. R., Yuan, X. M., and Qiu, L. 1996. A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans. Automat. Contr. 41: 1814-1817.
- (1996) IEEE Trans. Automat. Contr. , vol.41 , pp. 1814-1817
- Cao, X.R.¹ Yuan, X.M.² Qiu, L.³

16
- 0036604532
- A time aggregation approach to Markov decision processes
- Cao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and Marcus, S. 2002. A time aggregation approach to Markov decision processes. Automatica 38: 929-943.
- (2002) Automatica , vol.38 , pp. 929-943
- Cao, X.R.¹ Ren, Z.Y.² Bhatnagar, S.³ Fu, M.⁴ Marcus, S.⁵

17
- 0028466316
- Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
- Chong, E. K. P., and Ramadge, P. J. 1994. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis. IEEE Trans. Automat. Contr. 39: 1400-1410.
- (1994) IEEE Trans. Automat. Contr. , vol.39 , pp. 1400-1410
- Chong, E.K.P.¹ Ramadge, P.J.²

18
- 0038380746
- Convergence of simulation-based policy iteration
- Cooper, W. L., Henderson, S. G., and Lewis, M. E. 2003. Convergence of simulation-based policy iteration. Probab. Eng. Inf. Sci. 17: 213-234.
- (2003) Probab. Eng. Inf. Sci. , vol.17 , pp. 213-234
- Cooper, W.L.¹ Henderson, S.G.² Lewis, M.E.³

19
- 0003838111
- Chichester: John Willey and Sons.
- Dijk, N. V. 1993. Queueing Networks and Product Forms: A Systems Approach. Chichester: John Willey and Sons.
- (1993) Queueing Networks and Product Forms: a Systems Approach
- Dijk, N.V.¹

20
- 2442614974
- Potential-based on-line policy iteration algorithms for Markov decision processes
- Fang, H. T., and Cao, X. R. 2004. Potential-based on-line policy iteration algorithms for Markov decision processes. IEEE Trans. Automat. Contr. 49: 493-505.
- (2004) IEEE Trans. Automat. Contr. , vol.49 , pp. 493-505
- Fang, H.T.¹ Cao, X.R.²

21
- 0020802518
- Perturbation analysis and optimization of queueing networks
- Ho, Y. C., and Cao, X. R. 1983. Perturbation analysis and optimization of queueing networks. J. Optim. Theory Appl. 40(4): 559-582.
- (1983) J. Optim. Theory Appl. , vol.40 , Issue.4 , pp. 559-582
- Ho, Y.C.¹ Cao, X.R.²

22
- 0003585978
- Boston: Kluwer Academic Publisher
- Ho, Y. C., and Cao, X. R. 1991. Perturbation Analysis of Discrete-Event Dynamic Systems. Boston: Kluwer Academic Publisher.
- (1991) Perturbation Analysis of Discrete-Event Dynamic Systems
- Ho, Y.C.¹ Cao, X.R.²

23
- 0037955677
- The no free lunch theorem, complexity and computer security
- Ho, Y. C., Zhao, Q. C., and Pepyne, D. L. 2003. The no free lunch theorem, complexity and computer security. IEEE Trans. Automat. Contr. 48: 783-793.
- (2003) IEEE Trans. Automat. Contr. , vol.48 , pp. 783-793
- Ho, Y.C.¹ Zhao, Q.C.² Pepyne, D.L.³

24
- 0035249254
- Simulation-based optimization of Markov reward processes
- Marbach, P., and Tsitsiklis, T. N. 2001. Simulation-based optimization of Markov reward processes. IEEE Trans. Automat. Contr. 46: 191-209.
- (2001) IEEE Trans. Automat. Contr. , vol.46 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, T.N.²

25
- 0002103968
- Learning finite-state controllers for partially observable environments
- Meuleau, N., Peshkin, L., Kim, K.- E., and Kaelbling, P. L. 1999. Learning finite-state controllers for partially observable environments. Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence.
- (1999) Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence
- Meuleau, N.¹ Peshkin, L.² Kim, K.E.³ Kaelbling, P.L.⁴

26
- 85102627959
- New York: Wiley
- Puterman, M. L. 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

27
- 0024621270
- Single run optimization of discrete event simulations - An empirical study using the M/M/1 queue
- Suri, R., and Leung, Y. T. 1989. Single run optimization of discrete event simulations - An empirical study using the M/M/1 queue. IIE Trans. 21: 35-49.
- (1989) IIE Trans. , vol.21 , pp. 35-49
- Suri, R.¹ Leung, Y.T.²

28
- 56349125566
- Approximate planning in POMDPS with macro-actions
- Cambridge, MA: MIT Press
- Theocharous, G., and Kaelbling, P. L. 2004. Approximate planning in POMDPS with macro-actions. Advances in Neural Information Processing Systems 16 (NIPS-03). Cambridge, MA: MIT Press. 775-782.
- (2004) Advances in Neural Information Processing Systems 16 (NIPS-03) , pp. 775-782
- Theocharous, G.¹ Kaelbling, P.L.²

29
- 34249833101
- Q-leaming
- Watkins, C., and Dayan, P. 1992. Q-leaming. Mach. Learn. 8: 279-292.
- (1992) Mach. Learn. , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.