-
1
-
-
0027557742
-
Discrete-time controlled Markov processes with average cost criterion: A survey
-
A. Arapostathis, V. S. Borkar, E. Femandez-Gaucherand, M. K. Ghosh, and S. I. Marcus, Discrete-time controlled Markov processes with average cost criterion: A survey, SIAM Journal Control and Optimization, vol. 31, pp. 282-344, 1993.
-
(1993)
SIAM Journal Control and Optimization
, vol.31
, pp. 282-344
-
-
Arapostathis, A.1
Borkar, V.S.2
Femandez-Gaucherand, E.3
Ghosh, M.K.4
Marcus, S.I.5
-
3
-
-
0013495368
-
Experiments with infinite-horizon policy-gradient estimation
-
J. Baxter, P. L. Bartlett, and L. Weaver, Experiments with infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, vol. 15, pp. 351-381,2001.
-
(2001)
Journal of Artificial Intelligence Research
, vol.15
, pp. 351-381
-
-
Baxter, J.1
Bartlett, P.L.2
Weaver, L.3
-
4
-
-
0003565783
-
-
Athena Scientific, Belmont, MA
-
D. P. Bertsekas, Dynamic Programming and Optimal Control, Volume I and II, Athena Scientific, Belmont, MA, 1995.
-
(1995)
Dynamic Programming and Optimal Control, Volume I and II
-
-
Bertsekas, D.P.1
-
6
-
-
0004256573
-
-
Addison-Wesley, Reading, MA, 1968. Springer-Verlag, New York
-
L. Breiman, Probability, Addison-Wesley, Reading, MA, 1968. Springer-Verlag, New York, 1994.
-
(1994)
Probability
-
-
Breiman, L.1
-
7
-
-
0003618624
-
-
Springer-Verlag, New York
-
P. Bremaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues, Springer-Verlag, New York, 1998.
-
(1998)
Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues
-
-
Bremaud, P.1
-
8
-
-
0022117237
-
Convergence of parameter sensitivity estimates in a stochastic experiment
-
X. R. Cao, Convergence of parameter sensitivity estimates in a stochastic experiment, IEEE Trans. Automatic Control, vol. AC-30, pp. 834-843,1985.
-
(1985)
IEEE Trans. Automatic Control
, vol.AC-30
, pp. 834-843
-
-
Cao, X.R.1
-
9
-
-
0042585486
-
Sensitivity estimates based on one realization of a stochastic system
-
X. R. Cao, Sensitivity estimates based on one realization of a stochastic system, Journal of Statistical Computation and Simulation, vol. 27, pp. 211-232,1987.
-
(1987)
Journal of Statistical Computation and Simulation
, vol.27
, pp. 211-232
-
-
Cao, X.R.1
-
11
-
-
0030409198
-
A single sample path-based performance sensitivity formula for Markov chains
-
X. R. Cao, X. M. Yuan, and L. Qiu, A single sample path-based performance sensitivity formula for Markov chains, IEEE Trans. Automatic Control, vol. 41, pp. 1814-1817,1996.
-
(1996)
IEEE Trans. Automatic Control
, vol.41
, pp. 1814-1817
-
-
Cao, X.R.1
Yuan, X.M.2
Qiu, L.3
-
12
-
-
0032027940
-
The relation among potentials, perturbation analysis, Markov decision processes, and other topics
-
X. R. Cao, The relation among potentials, perturbation analysis, Markov decision processes, and other topics, Journal of Discrete Event Dynamic Systems, vol. 8, pp. 71-87,1998.
-
(1998)
Journal of Discrete Event Dynamic Systems
, vol.8
, pp. 71-87
-
-
Cao, X.R.1
-
13
-
-
0033247533
-
Single sample path based optimization of Markov chains
-
X. R. Cao, Single sample path based optimization of Markov chains, Journal of Optimization: Theory and Application, vol. 100, no. 3, pp. 527-548,1999.
-
(1999)
Journal of Optimization: Theory and Application
, vol.100
, Issue.3
, pp. 527-548
-
-
Cao, X.R.1
-
15
-
-
0031258478
-
Perturbation realization, potentials and sensitivity analysis of Markov processes
-
X. R. Cao and H. F. Chen, Perturbation realization, potentials and sensitivity analysis of Markov processes, IEEE Trans. Automatic Control, vol. 42, pp. 1382-1393, 1997.
-
(1997)
IEEE Trans. Automatic Control
, vol.42
, pp. 1382-1393
-
-
Cao, X.R.1
Chen, H.F.2
-
16
-
-
0033884215
-
A unified approach to Markov decision problems and performance sensitivity analysis
-
X. R. Cao, A unified approach to Markov decision problems and performance sensitivity analysis, Automatica, vol. 36, pp. 771-774, 2000.
-
(2000)
Automatica
, vol.36
, pp. 771-774
-
-
Cao, X.R.1
-
17
-
-
1542350287
-
Constructing performance sensitivities for Markov systems with potentials as building blocks
-
Maui, Hawaii
-
X. R. Cao, Constructing performance sensitivities for Markov systems with potentials as building blocks, Proc. Of the 42nd IEEE Conference on Decision and Control, Maui, Hawaii, 2003.
-
(2003)
Proc. Of the 42Nd IEEE Conference on Decision and Control
-
-
Cao, X.R.1
-
20
-
-
0032122986
-
Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
-
X. R. Cao and Y. W. Wan, Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization, IEEE Trans. Control System Tech, vol. 6, pp. 482-494,1998.
-
(1998)
IEEE Trans. Control System Tech
, vol.6
, pp. 482-494
-
-
Cao, X.R.1
Wan, Y.W.2
-
22
-
-
0028466316
-
Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
-
E. K. P. Chong and P. J. Ramadge, Stochastic optimization of regenerative systems using infinitesimal perturbation analysis, IEEE Trans. Automatic Control, vol. 39, pp. 1400-1410,1994.
-
(1994)
IEEE Trans. Automatic Control
, vol.39
, pp. 1400-1410
-
-
Chong, E.K.P.1
Ramadge, P.J.2
-
24
-
-
2442614974
-
Potential-based on-line policy iteration algorithms for Markov decision processes
-
H.-T. Fang and X. R. Cao, Potential-based on-line policy iteration algorithms for Markov decision processes, IEEE Trans. Automatic Control, vol. 49, no. 4, pp. 493-505,2004.
-
(2004)
IEEE Trans. Automatic Control
, vol.49
, Issue.4
, pp. 493-505
-
-
Fang, H.-T.1
Cao, X.R.2
-
25
-
-
0023543886
-
Likelihood ratio gradient estimation: An overview
-
A. Thesen, H. Grant, and K. D. Kelton, (eds.), Society for Computer Simulation, San Diego, CA
-
P. W. Glynn, Likelihood ratio gradient estimation: An overview, in A. Thesen, H. Grant, and K. D. Kelton, (eds.), Proc. Of1987 Winter Simulation Conference, pp. 366-375, Society for Computer Simulation, San Diego, CA, 1988.
-
(1988)
Proc. Of1987 Winter Simulation Conference
, pp. 366-375
-
-
Glynn, P.W.1
-
26
-
-
0024932244
-
Optimization of stochastic systems via simulation
-
A. Thesen, H. Grant, and K. D. Kelton, (eds.), Society for Computer Simulation, San Diego, CA
-
P. W. Glynn, Optimization of stochastic systems via simulation, in A. Thesen, H. Grant, and K. D. Kelton, (eds.), Proc. Of1987 Winter Simulation Conference, pp. 90-105, Society for Computer Simulation, San Diego, CA, 1988.
-
(1988)
Proc. Of1987 Winter Simulation Conference
, pp. 90-105
-
-
Glynn, P.W.1
-
27
-
-
0020802518
-
Perturbation analysis and optimization of queueing networks
-
Y. C. Ho and X. R. Cao, Perturbation analysis and optimization of queueing networks, Journal of Optimization Theory and Applications, vol. 40, no. 4, pp. 559-582, 1983.
-
(1983)
Journal of Optimization Theory and Applications
, vol.40
, Issue.4
, pp. 559-582
-
-
Ho, Y.C.1
Cao, X.R.2
-
29
-
-
0032653557
-
Explanation of goal softening in ordinal optimization
-
L. H. Lee, E. T. K. Lau and Y. C. Ho, Explanation of goal softening in ordinal optimization, IEEE Trans. Automatic Control, vol. 44, pp. 94-99,1999.
-
(1999)
IEEE Trans. Automatic Control
, vol.44
, pp. 94-99
-
-
Lee, L.H.1
Lau, E.T.K.2
Ho, Y.C.3
-
30
-
-
85153938292
-
Reinforcement learning algorithm for partially observable Markov decision problems
-
Morgan Kaufman, San Francisco, CA
-
T. Jaakkola, S. P. Singh, and M. I. Jordan, Reinforcement learning algorithm for partially observable Markov decision problems, Advances in Neural Information Processing Systems, vol. 7, pp. 345-352, Morgan Kaufman, San Francisco, CA, 1995.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 345-352
-
-
Jaakkola, T.1
Singh, S.P.2
Jordan, M.I.3
-
32
-
-
0343893613
-
Actor-critic-type learning algorithms for Markov decision processes
-
V. R. Konda and V. S. Borkar, Actor-critic-type learning algorithms for Markov decision processes, SIAM Journal of Control Optimization, vol. 38, pp. 94-123, 1999.
-
(1999)
SIAM Journal of Control Optimization
, vol.38
, pp. 94-123
-
-
Konda, V.R.1
Borkar, V.S.2
-
33
-
-
0035249254
-
Simulation-based optimization of Markov reward processes
-
P. Marbach and J. N. Tsitsiklis, Simulation-based optimization of Markov reward processes, IEEE Trans. Automatic Control, vol. 46, pp. 191-209, 2001.
-
(2001)
IEEE Trans. Automatic Control
, vol.46
, pp. 191-209
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
34
-
-
0037288469
-
Approximate gradient methods in policy-space optimization of Markov reward processes
-
P. Marbach and J. N. Tsitsiklis, Approximate gradient methods in policy-space optimization of Markov reward processes, Journal of Discrete Event Dynamic Systems,1, vol. 13, no. 1, pp. 111-148, 2003.
-
(2003)
Journal of Discrete Event Dynamic Systems,1
, vol.13
, Issue.1
, pp. 111-148
-
-
Marbach, P.1
Tsitsiklis, J.N.2
-
36
-
-
0001621211
-
Sample-path optimization of convex stochastic performance functions
-
E. L. Plambeck, B. R. Fu, S. M. Robinson, and R. Suri, Sample-path optimization of convex stochastic performance functions, Math. Program. B, vol. 75, pp. 137-176, 1996.
-
(1996)
Math. Program. B
, vol.75
, pp. 137-176
-
-
Plambeck, E.L.1
Fu, B.R.2
Robinson, S.M.3
Suri, R.4
-
38
-
-
0024735795
-
Sensitivity analysis via likelihood ratio
-
M. I. Reiman and A. Weiss, Sensitivity analysis via likelihood ratio, Operations Research, vol. 37, pp. 830-844,1989.
-
(1989)
Operations Research
, vol.37
, pp. 830-844
-
-
Reiman, M.I.1
Weiss, A.2
-
40
-
-
0003418592
-
-
Wiley, New York
-
R. V. Rubinstein, Monte Carlo Optimization, Simulation, and Sensitivity Analysis of Queueing Networks, Wiley, New York, 1986.
-
(1986)
Monte Carlo Optimization, Simulation, and Sensitivity Analysis of Queueing Networks
-
-
Rubinstein, R.V.1
-
41
-
-
0024621270
-
Single run optimization of discrete event simulations-An empirical study using the M/M/l queue
-
R. Sun and Y. T. Leung, Single run optimization of discrete event simulations-An empirical study using the M/M/l queue, HE Trans., vol. 21, pp. 35-49, 1989.
-
(1989)
HE Trans.
, vol.21
, pp. 35-49
-
-
Sun, R.1
Leung, Y.T.2
-
42
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, vol. 3, pp. 835-846,1988.
-
(1988)
Machine Learning
, vol.3
, pp. 835-846
-
-
Sutton, R.S.1
-
44
-
-
0042758707
-
Actor-critic algorithms
-
MIT, Cambridge, MA, Preprint
-
J. N. Tsitsiklis and V. R. Konda, Actor-critic algorithms, Tech. Rep., Lab. Inform. Decision Systems, MIT, Cambridge, MA, 2001, Preprint.
-
(2001)
Tech. Rep., Lab. Inform. Decision Systems
-
-
Tsitsiklis, J.N.1
Konda, V.R.2
-
45
-
-
0029752470
-
Feature-based methods for large-scale dynamic programming
-
J. N. Tsitsiklis and B. Van Roy, Feature-based methods for large-scale dynamic programming, Machine Learning, vol. 22, pp. 59-94,1994.
-
(1994)
Machine Learning
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
46
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
J. N. Tsitsiklis and B. Van Roy, An analysis of temporal-difference learning with function approximation, IEEE Trans. Automatic Control, vol. 42, pp. 674-690, 1997.
-
(1997)
IEEE Trans. Automatic Control
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
47
-
-
0033221519
-
Average cost temporal-difference learning
-
J. N. Tsitsiklis and B. Van Roy, Average cost temporal-difference learning, Automatica, vol. 35, pp. 1799-1808,1999.
-
(1999)
Automatica
, vol.35
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
48
-
-
0004049893
-
-
Ph.D. Thesis, Cambridge University, Cambridge, UK
-
C. Watkins, Learning from Delayed Rewards, Ph.D. Thesis, Cambridge University, Cambridge, UK, 1989.
-
(1989)
Learning from Delayed Rewards
-
-
Watkins, C.1
|