-
1
-
-
0027557742
-
Discrete-time controlled Markov processes with average cost criterion: A survey
-
A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I. Marcus, "Discrete-time controlled Markov processes with average cost criterion: A survey," SIAM J. Control Optim., vol. 31, pp. 282-344, 1993.
-
(1993)
SIAM J. Control Optim.
, vol.31
, pp. 282-344
-
-
Arapostathis, A.1
Borkar, V.S.2
Fernandez-Gaucherand, E.3
Ghosh, M.K.4
Marcus, S.I.5
-
5
-
-
0032027940
-
The relation among potentials, perturbation analysis, Markov decision processes, and other topics
-
X.-R. Cao, "The relation among potentials, perturbation analysis, Markov decision processes, and other topics," J. Discrete Event Dyna. Syst., vol. 8, pp. 71-87, 1998.
-
(1998)
J. Discrete Event Dyna. Syst.
, vol.8
, pp. 71-87
-
-
Cao, X.-R.1
-
6
-
-
0033247533
-
Single sample path based optimization of Markov chains
-
____, "Single sample path based optimization of Markov chains," J. Optim.: Theory Applicat. vol. 100, no. 3, pp. 527-548, 1999.
-
(1999)
J. Optim.: Theory Applicat.
, vol.100
, Issue.3
, pp. 527-548
-
-
Cao, X.-R.1
-
7
-
-
0031258478
-
Perturbation realization, potentials and sensitivity analysis of Markov processes
-
Sept.
-
X.-R. Cao and H. F. Chen, "Perturbation realization, potentials and sensitivity analysis of Markov processes," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Sept. 1997.
-
(1997)
IEEE Trans. Automat. Contr.
, vol.42
, pp. 1382-1393
-
-
Cao, X.-R.1
Chen, H.F.2
-
9
-
-
0036604532
-
A time aggregation approach to Markov decision processes
-
X.-R. Cao, Z. Y. Ren, S. Bhatnagar, F. Fu, and S. Marcus, "A time aggregation approach to Markov decision processes," Automatica, vol. 38, pp. 929-943, 2002.
-
(2002)
Automatica
, vol.38
, pp. 929-943
-
-
Cao, X.-R.1
Ren, Z.Y.2
Bhatnagar, S.3
Fu, F.4
Marcus, S.5
-
10
-
-
0032122986
-
Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
-
July
-
X.-R. Cao and Y. W. Wan, "Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization," IEEE Trans. Contr. Syst. Tech, vol. 6, pp. 482-494, July 1998.
-
(1998)
IEEE Trans. Contr. Syst. Tech
, vol.6
, pp. 482-494
-
-
Cao, X.-R.1
Wan, Y.W.2
-
12
-
-
0038380746
-
Convergence of simulation-based policy iteration
-
W. L. Cooper, S. H. Henderson, and M. E. Lewis, "Convergence of simulation-based policy iteration," Probab. Eng. Inform. Sci., vol. 17, pp. 213-234, 2003.
-
(2003)
Probab. Eng. Inform. Sci.
, vol.17
, pp. 213-234
-
-
Cooper, W.L.1
Henderson, S.H.2
Lewis, M.E.3
-
13
-
-
0028466316
-
Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
-
Oct.
-
E. K. P. Chong and P. J. Ramadge, "Stochastic optimization of regenerative systems using infinitesimal perturbation analysis," IEEE Trans. Automat. Contr., vol. 39, pp. 1400-1410, Oct. 1994.
-
(1994)
IEEE Trans. Automat. Contr.
, vol.39
, pp. 1400-1410
-
-
Chong, E.K.P.1
Ramadge, P.J.2
-
14
-
-
0028444151
-
Smoothed perturbation analysis derivative estimation for Markov chains
-
M. C. Fu and J. Hu, "Smoothed perturbation analysis derivative estimation for Markov chains," Oper. Res. Lett., vol. 15, pp. 241-251, 1994.
-
(1994)
Oper. Res. Lett.
, vol.15
, pp. 241-251
-
-
Fu, M.C.1
Hu, J.2
-
17
-
-
0020802518
-
Perturbation analysis and optimization of queueing networks
-
Y. C. Ho and X.-R. Cao, "Perturbation analysis and optimization of queueing networks," J. Optim. Theory Applicat., vol. 40, no. 4, pp. 559-582, 1983.
-
(1983)
J. Optim. Theory Applicat.
, vol.40
, Issue.4
, pp. 559-582
-
-
Ho, Y.C.1
Cao, X.-R.2
-
19
-
-
0032653557
-
Explanation of goal softening in ordinal optimization
-
Jan.
-
L. H. Lee, E. T. K. Lau, and Y. C. Ho, "Explanation of goal softening in ordinal optimization," IEEE Trans. Automat. Contr., vol. 44, pp. 94-99, Jan. 1999.
-
(1999)
IEEE Trans. Automat. Contr.
, vol.44
, pp. 94-99
-
-
Lee, L.H.1
Lau, E.T.K.2
Ho, Y.C.3
-
20
-
-
85153938292
-
Reinforcement learning algorithm for partially observable Markov decision problems
-
San Francisco, CA: Morgan Kaufman
-
T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable Markov decision problems," in Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kaufman, 1995, vol. 7, pp. 345-352.
-
(1995)
Advances in Neural Information Processing Systems
, vol.7
, pp. 345-352
-
-
Jaakkola, T.1
Singh, S.P.2
Jordan, M.I.3
-
21
-
-
0343893613
-
Actor-ctritic-type learning algorithms for Markov decision processes
-
V. R. Konda and V. S. Borkar, "Actor-ctritic-type learning algorithms for Markov decision processes," SIAM J. Control Optim., vol. 38, pp. 94-123, 1999.
-
(1999)
SIAM J. Control Optim.
, vol.38
, pp. 94-123
-
-
Konda, V.R.1
Borkar, V.S.2
-
23
-
-
0035249254
-
Simulation-based optimization of Markov reward processes
-
Feb.
-
P. Marbach and T. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. Automat. Contr., vol. 46, pp. 191-209, Feb. 2001.
-
(2001)
IEEE Trans. Automat. Contr.
, vol.46
, pp. 191-209
-
-
Marbach, P.1
Tsitsiklis, T.N.2
-
25
-
-
0031344030
-
The policy improvement algorithm for Markov decision processes with general state space
-
Oct.
-
S. P. Meyn, "The policy improvement algorithm for Markov decision processes with general state space," IEEE Trans. Automat. Contr., vol. 42, pp. 1663-1680, Oct. 1997.
-
(1997)
IEEE Trans. Automat. Contr.
, vol.42
, pp. 1663-1680
-
-
Meyn, S.P.1
-
26
-
-
0001621211
-
Sample-path optimization of convex stochastic performance functions
-
E. L. Plambeck, B. R. Fu, S. M. Robinson, and R. Suri, "Sample-path optimization of convex stochastic performance functions," Math. Program. B, vol. 75, pp. 137-176, 1996.
-
(1996)
Math. Program. B
, vol.75
, pp. 137-176
-
-
Plambeck, E.L.1
Fu, B.R.2
Robinson, S.M.3
Suri, R.4
-
28
-
-
0024735795
-
Sensitivity analysis via likelihood ratio
-
M. I. Reiman and A. Weiss, "Sensitivity analysis via likelihood ratio," Oper. Res., vol. 37, pp. 830-844, 1989.
-
(1989)
Oper. Res.
, vol.37
, pp. 830-844
-
-
Reiman, M.I.1
Weiss, A.2
-
30
-
-
0024621270
-
Single run optimization of discrete event simulations-An empirical study using M/M/I queue
-
R. Suri and Y. T. Leung, "Single run optimization of discrete event simulations-An empirical study using M/M/I queue," IIE Trans., vol. 21, pp. 35-49, 1989.
-
(1989)
IIE Trans.
, vol.21
, pp. 35-49
-
-
Suri, R.1
Leung, Y.T.2
-
31
-
-
33847202724
-
Learning to predict by the methods of temporal differences
-
R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learn., vol. 3, pp. 835-846, 1988.
-
(1988)
Machine Learn.
, vol.3
, pp. 835-846
-
-
Sutton, R.S.1
-
33
-
-
0042758707
-
Actor-critic algorithms
-
Cambridge, MA, Tech. Rep., Lab. Inform. Decision Systems, Mass. Inst. Technol.
-
J. N. Tsitsiklis and V. R. Konda "Actor-critic algorithms," Cambridge, MA, Tech. Rep., Lab. Inform. Decision Systems, Mass. Inst. Technol., 2001.
-
(2001)
-
-
Tsitsiklis, J.N.1
Konda, V.R.2
-
34
-
-
0029752470
-
Feature-based methods for large-scale dynamic programming
-
J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large-scale dynamic programming," Machine Learn., vol. 22, pp. 59-4, 1994.
-
(1994)
Machine Learn.
, vol.22
, pp. 59-94
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
35
-
-
0031143730
-
An analysis of temporal-difference learning with function approximation
-
May
-
____, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Contr., vol. 42, pp. 674-690, May 1997.
-
(1997)
IEEE Trans. Automat. Contr.
, vol.42
, pp. 674-690
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
36
-
-
0033221519
-
Average cost temporal-difference learning
-
____, "Average cost temporal-difference learning," Automatica, vol. 35, pp. 1799-1808, 1999.
-
(1999)
Automatica
, vol.35
, pp. 1799-1808
-
-
Tsitsiklis, J.N.1
Van Roy, B.2
-
37
-
-
0004049893
-
Learning from delayed rewards
-
Ph.D. dissertaton, Cambridge Univ., Cambridge, U.K.
-
C. Watkins, "Learning from delayed rewards," Ph.D. dissertaton, Cambridge Univ., Cambridge, U.K., 1989.
-
(1989)
-
-
Watkins, C.1
|