-
1
-
-
0037288370
-
Recent advances in hierarchical reinforcement learning, special issue on reinforcement learning
-
Barto, A., and Mahadevan, S. 2003. Recent advances in hierarchical reinforcement learning, special issue on reinforcement learning. Discret. Event Dyn. Syst. Theory Appl. 13: 41-77.
-
(2003)
Discret. Event Dyn. Syst. Theory Appl.
, vol.13
, pp. 41-77
-
-
Barto, A.1
Mahadevan, S.2
-
2
-
-
0013535965
-
Infinite-horizon policy-gradient estimation
-
Baxter, J., and Bartlett, P. L. 2001. Infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 319-350.
-
(2001)
J. Artif. Intell. Res.
, vol.15
, pp. 319-350
-
-
Baxter, J.1
Bartlett, P.L.2
-
3
-
-
0013495368
-
Experiments with infinite-horizon policy-gradient estimation
-
Baxter, J., Bartlett, P. L., and Weaver, L. 2001. Experiments with infinite-horizon policy-gradient estimation. J. Artif. Intell. Res. 15: 351-381.
-
(2001)
J. Artif. Intell. Res.
, vol.15
, pp. 351-381
-
-
Baxter, J.1
Bartlett, P.L.2
Weaver, L.3
-
6
-
-
0032027940
-
The relation among potentials, perturbation analysis, Markov decision processes, and other topics
-
Cao, X. R. 1998. The relation among potentials, perturbation analysis, Markov decision processes, and other topics. J. Discret. Event Dyn. Syst. 8: 71-87.
-
(1998)
J. Discret. Event Dyn. Syst.
, vol.8
, pp. 71-87
-
-
Cao, X.R.1
-
7
-
-
0033247533
-
Single sample path based optimization of Markov chains
-
Cao, X. R. 1999. Single sample path based optimization of Markov chains. J. Optim. Theory Appl. 100(3): 527-548.
-
(1999)
J. Optim. Theory Appl.
, vol.100
, Issue.3
, pp. 527-548
-
-
Cao, X.R.1
-
8
-
-
0033884215
-
A unified approach to Markov decision problems and performance sensitivity analysis
-
Cao, X. R. 2000. A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36: 771-774.
-
(2000)
Automatica
, vol.36
, pp. 771-774
-
-
Cao, X.R.1
-
9
-
-
11044222936
-
The potential structure of sample paths and performance sensitivities of Markov systems
-
Cao, X. R. 2004a. The potential structure of sample paths and performance sensitivities of Markov systems. IEEE Trans. Automat. Contr. 49: 2129-2142.
-
(2004)
IEEE Trans. Automat. Contr.
, vol.49
, pp. 2129-2142
-
-
Cao, X.R.1
-
10
-
-
14644391675
-
A basic formula for on-line policy gradient algorithms
-
to appear
-
Cao, X. R. 2004b. A basic formula for on-line policy gradient algorithms. IEEE Trans. Automat. Contr. to appear.
-
(2004)
IEEE Trans. Automat. Contr
-
-
Cao, X.R.1
-
12
-
-
0031258478
-
Perturbation realization, potentials and sensitivity analysis of Markov processes
-
Cao, X. R., and Chen, H. F. 1997. Perturbation realization, potentials and sensitivity analysis of Markov processes. IEEE Trans. Automat. Contr. 42: 1382-1393.
-
(1997)
IEEE Trans. Automat. Contr.
, vol.42
, pp. 1382-1393
-
-
Cao, X.R.1
Chen, H.F.2
-
13
-
-
3843150404
-
A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: Multichain cases
-
Cao, X. R., and Guo, X. 2004. A unified approach to Markov decision problems and performance sensitivity analysis with discounted and average criteria: Multichain cases. Automatica 40: 1749-1759.
-
(2004)
Automatica
, vol.40
, pp. 1749-1759
-
-
Cao, X.R.1
Guo, X.2
-
14
-
-
0032122986
-
Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
-
Cao, X. R., and Wan, Y. W. 1998. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Trans. Control Syst. Technol. 6: 482-494.
-
(1998)
IEEE Trans. Control Syst. Technol.
, vol.6
, pp. 482-494
-
-
Cao, X.R.1
Wan, Y.W.2
-
15
-
-
0030409198
-
A single sample path-based performance sensitivity formula for Markov chains
-
Cao, X. R., Yuan, X. M., and Qiu, L. 1996. A single sample path-based performance sensitivity formula for Markov chains. IEEE Trans. Automat. Contr. 41: 1814-1817.
-
(1996)
IEEE Trans. Automat. Contr.
, vol.41
, pp. 1814-1817
-
-
Cao, X.R.1
Yuan, X.M.2
Qiu, L.3
-
16
-
-
0036604532
-
A time aggregation approach to Markov decision processes
-
Cao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and Marcus, S. 2002. A time aggregation approach to Markov decision processes. Automatica 38: 929-943.
-
(2002)
Automatica
, vol.38
, pp. 929-943
-
-
Cao, X.R.1
Ren, Z.Y.2
Bhatnagar, S.3
Fu, M.4
Marcus, S.5
-
17
-
-
0028466316
-
Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
-
Chong, E. K. P., and Ramadge, P. J. 1994. Stochastic optimization of regenerative systems using infinitesimal perturbation analysis. IEEE Trans. Automat. Contr. 39: 1400-1410.
-
(1994)
IEEE Trans. Automat. Contr.
, vol.39
, pp. 1400-1410
-
-
Chong, E.K.P.1
Ramadge, P.J.2
-
18
-
-
0038380746
-
Convergence of simulation-based policy iteration
-
Cooper, W. L., Henderson, S. G., and Lewis, M. E. 2003. Convergence of simulation-based policy iteration. Probab. Eng. Inf. Sci. 17: 213-234.
-
(2003)
Probab. Eng. Inf. Sci.
, vol.17
, pp. 213-234
-
-
Cooper, W.L.1
Henderson, S.G.2
Lewis, M.E.3
-
20
-
-
2442614974
-
Potential-based on-line policy iteration algorithms for Markov decision processes
-
Fang, H. T., and Cao, X. R. 2004. Potential-based on-line policy iteration algorithms for Markov decision processes. IEEE Trans. Automat. Contr. 49: 493-505.
-
(2004)
IEEE Trans. Automat. Contr.
, vol.49
, pp. 493-505
-
-
Fang, H.T.1
Cao, X.R.2
-
21
-
-
0020802518
-
Perturbation analysis and optimization of queueing networks
-
Ho, Y. C., and Cao, X. R. 1983. Perturbation analysis and optimization of queueing networks. J. Optim. Theory Appl. 40(4): 559-582.
-
(1983)
J. Optim. Theory Appl.
, vol.40
, Issue.4
, pp. 559-582
-
-
Ho, Y.C.1
Cao, X.R.2
-
23
-
-
0037955677
-
The no free lunch theorem, complexity and computer security
-
Ho, Y. C., Zhao, Q. C., and Pepyne, D. L. 2003. The no free lunch theorem, complexity and computer security. IEEE Trans. Automat. Contr. 48: 783-793.
-
(2003)
IEEE Trans. Automat. Contr.
, vol.48
, pp. 783-793
-
-
Ho, Y.C.1
Zhao, Q.C.2
Pepyne, D.L.3
-
24
-
-
0035249254
-
Simulation-based optimization of Markov reward processes
-
Marbach, P., and Tsitsiklis, T. N. 2001. Simulation-based optimization of Markov reward processes. IEEE Trans. Automat. Contr. 46: 191-209.
-
(2001)
IEEE Trans. Automat. Contr.
, vol.46
, pp. 191-209
-
-
Marbach, P.1
Tsitsiklis, T.N.2
-
25
-
-
0002103968
-
Learning finite-state controllers for partially observable environments
-
Meuleau, N., Peshkin, L., Kim, K.- E., and Kaelbling, P. L. 1999. Learning finite-state controllers for partially observable environments. Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence.
-
(1999)
Proceedings of the Fifteenth International Conference on Uncertainty in Artificial Intelligence
-
-
Meuleau, N.1
Peshkin, L.2
Kim, K.E.3
Kaelbling, P.L.4
-
27
-
-
0024621270
-
Single run optimization of discrete event simulations - An empirical study using the M/M/1 queue
-
Suri, R., and Leung, Y. T. 1989. Single run optimization of discrete event simulations - An empirical study using the M/M/1 queue. IIE Trans. 21: 35-49.
-
(1989)
IIE Trans.
, vol.21
, pp. 35-49
-
-
Suri, R.1
Leung, Y.T.2
|