-
1
-
-
0003874616
-
Learning algorithms for Markov decision processes with average cost
-
Report LIDS-P-2434, Lab. for Info. and Decision Systems, October
-
Abounadi, J., Bertsekas, D., and Borkar, V. Learning algorithms for Markov decision processes with average cost. Report LIDS-P-2434, Lab. for Info. and Decision Systems, October 1998; to appear in SIAM J. on Control and Optimization.
-
(1998)
SIAM J. on Control and Optimization
-
-
Abounadi, J.1
Bertsekas, D.2
Borkar, V.3
-
3
-
-
0020970738
-
Neuron-like elements that can solve difficult learning control problems
-
Barto, A., Sutton, R., and Anderson, C. 1983. Neuron-like elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 13: 835-846.
-
(1983)
IEEE Transactions on Systems, Man, and Cybernetics
, vol.13
, pp. 835-846
-
-
Barto, A.1
Sutton, R.2
Anderson, C.3
-
5
-
-
0013495368
-
Experiments with infinite-horizon policy-gradient estimation
-
Baxter, J., Bartlett, P. L., and Weaver, L. 2001. Experiments with infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15: 351-381.
-
(2001)
Journal of Artificial Intelligence Research
, vol.15
, pp. 351-381
-
-
Baxter, J.1
Bartlett, P.L.2
Weaver, L.3
-
7
-
-
0003239419
-
Nonnegative matrices in the mathematical sciences
-
Philadelphia
-
Berman, A., and Plemmons, R. J. 1994. Nonnegative matrices in the mathematical sciences. SIAM, Philadelphia.
-
(1994)
SIAM
-
-
Berman, A.1
Plemmons, R.J.2
-
10
-
-
0032027940
-
The relation among potentials, perturbation analysis, Markov decision processes, and other topics
-
Cao, X. R. 1998. The relation among potentials, perturbation analysis, Markov decision processes, and other topics. Journal of Discrete Event Dynamic Systems 8: 71-87.
-
(1998)
Journal of Discrete Event Dynamic Systems
, vol.8
, pp. 71-87
-
-
Cao, X.R.1
-
11
-
-
0033247533
-
Single sample path-based optimization of Markov chains
-
Cao, X. R. 1999. Single sample path-based optimization of markov chains. Journal of Optimization: Theory and Application 100: 527-548.
-
(1999)
Journal of Optimization: Theory and Application
, vol.100
, pp. 527-548
-
-
Cao, X.R.1
-
12
-
-
0033884215
-
A unified approach to Markov decision problems and performance sensitivity analysis
-
Cao, X. R. 2000. A unified approach to Markov decision problems and performance sensitivity analysis. Automatica 36: 771-774.
-
(2000)
Automatica
, vol.36
, pp. 771-774
-
-
Cao, X.R.1
-
13
-
-
0031258478
-
Potentials, perturbation realization, and sensitivity analysis of Markov processes
-
Cao, X. R., and Chen, H. F. 1997. Potentials, perturbation realization, and sensitivity analysis of Markov processes. IEEE Transactions on AC 42: 1382-1393.
-
(1997)
IEEE Transactions on AC
, vol.42
, pp. 1382-1393
-
-
Cao, X.R.1
Chen, H.F.2
-
14
-
-
0036604532
-
A time aggregation approach to Markov decision processes
-
Cao, X. R., Ren, Z. Y., Bhatnagar, S., Fu, M., and Marcus, S. 2002. A time aggregation approach to Markov decision processes. Automatica 38: 929-943.
-
(2002)
Automatica
, vol.38
, pp. 929-943
-
-
Cao, X.R.1
Ren, Z.Y.2
Bhatnagar, S.3
Fu, M.4
Marcus, S.5
-
16
-
-
0032122986
-
Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
-
Cao, X. R., and Wan, Y. W. 1998. Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization. IEEE Transactions on Control Systems Technology 6: 482-494.
-
(1998)
IEEE Transactions on Control Systems Technology
, vol.6
, pp. 482-494
-
-
Cao, X.R.1
Wan, Y.W.2
-
18
-
-
0002322904
-
Convergence of recursive optimization algorithms using infinitesimal perturbation analysis estimates
-
Chong, E. K. P., and Ramadge, P. J. 1992. Convergence of recursive optimization algorithms using infinitesimal perturbation analysis estimates. Journal of Discrete Event Dynamic Systems 1: 339-372.
-
(1992)
Journal of Discrete Event Dynamic Systems
, vol.1
, pp. 339-372
-
-
Chong, E.K.P.1
Ramadge, P.J.2
-
20
-
-
0013501772
-
Single sample path-based recursive algorithms for Markov decision processes
-
submitted
-
Fang, H. T., and Cao, X. R. Single sample path-based recursive algorithms for Markov decision processes. IEEE Trans. on Automatic Control, submitted.
-
IEEE Trans. on Automatic Control
-
-
Fang, H.T.1
Cao, X.R.2
-
21
-
-
0041648459
-
-
Feinberg, E. A., and Adam Shwartz (ed.); Kluwer
-
Feinberg, E. A., and Adam Shwartz (ed.) 2002. Handbook of Markov Decision Processes. Kluwer, 2002.
-
(2002)
Handbook of Markov Decision Processes
-
-
-
22
-
-
0025417457
-
Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis
-
Fu, M. C. 1990. Convergence of a stochastic approximation algorithm for the GI/G/1 queue using infinitesimal perturbation analysis. Journal of Optimization Theory and Applications 65: 149-160.
-
(1990)
Journal of Optimization Theory and Applications
, vol.65
, pp. 149-160
-
-
Fu, M.C.1
-
25
-
-
0030522182
-
A Lyapunov bound for solutions of Poisson's equation
-
Glynn, P. W., and Meyn, S. P. 1996. A Lyapunov bound for solutions of Poisson's equation. Ann. Probab. 24: 916-931.
-
(1996)
Ann. Probab.
, vol.24
, pp. 916-931
-
-
Glynn, P.W.1
Meyn, S.P.2
-
28
-
-
0020802518
-
Perturbation analysis and optimization of queueing networks
-
Ho, Y. C., and Cao, X. R. 1983. Perturbation analysis and optimization of queueing networks. Journal of Optimization Theory and Applications 40(4): 559-582.
-
(1983)
Journal of Optimization Theory and Applications
, vol.40
, Issue.4
, pp. 559-582
-
-
Ho, Y.C.1
Cao, X.R.2
-
31
-
-
0343893613
-
Actor-critic like learning algorithms for Markov decision processes
-
Konda, V. R., and Borkar, V. S. 1990. Actor-critic like learning algorithms for Markov decision processes. SIAM Journal on Control and Optimization 38: 94-123.
-
(1990)
SIAM Journal on Control and Optimization
, vol.38
, pp. 94-123
-
-
Konda, V.R.1
Borkar, V.S.2
-
34
-
-
0031344030
-
The policy improvement algorithm for Markov decision processes with general state space
-
Meyn, S. P. 1997. The policy improvement algorithm for Markov decision processes with general state space. IEEE Transactions on Automatic Control 42: 1663-1680.
-
(1997)
IEEE Transactions on Automatic Control
, vol.42
, pp. 1663-1680
-
-
Meyn, S.P.1
-
39
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
Sutton, R. S. 1988. Learning to predict by the methods of temporal differences. Machine Learning 3: 835-846.
-
(1988)
Machine Learning
, vol.3
, pp. 835-846
-
-
Sutton, R.S.1
-
41
-
-
0033221519
-
Average cost temporal-difference learning
-
Tsitsiklis, J. N., and Van Roy, B. 1999. Average cost temporal-difference learning. Automatica 35: 1799-1808.
-
(1999)
Automatica
, vol.35
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
42
-
-
0032073173
-
Centralized and decentralized asynchronous optimization of stochastic discrete event systems
-
Vazquez-Abad, F. J., Cassandras, C. G., and Julka, V. 1998. Centralized and decentralized asynchronous optimization of stochastic discrete event systems. IEEE Transactions on Automatic Control 43: 631-655.
-
(1998)
IEEE Transactions on Automatic Control
, vol.43
, pp. 631-655
-
-
Vazquez-Abad, F.J.1
Cassandras, C.G.2
Julka, V.3
-
43
-
-
0026255225
-
Performance gradient estimation for very large finite Markov chains
-
Zhang, B., and Ho, Y. C. 1991. Performance gradient estimation for very large finite Markov chains. IEEE Transactions on Automatic Control 36: 1218-1227.
-
(1991)
IEEE Transactions on Automatic Control
, vol.36
, pp. 1218-1227
-
-
Zhang, B.1
Ho, Y.C.2
|