SCOPUS 정보 검색 플랫폼

IEEE Transactions on Automatic Control

Volumn 49, Issue 4, 2004, Pages 493-505

Potential-based online policy iteration algorithms for Markov decision processes

(2) Fang, Hai Tao a Cao, Xi Ren b

a CHINESE ACADEMY OF SCIENCES (China)

b HONG KONG UNIVERSITY OF SCIENCE AND TECHNOLOGY (Hong Kong)

Author keywords

Markov decision process; Potential; Recursive optimization

Indexed keywords

DECISION SUPPORT SYSTEMS; ITERATIVE METHODS; MARKOV PROCESSES; OPTIMAL SYSTEMS;

MARKOV DECISION PROCESS; ONLINE POLICY ITERATION ALGORITHMS;

ALGORITHMS;

EID: 2442614974 PISSN: 00189286 EISSN: None Source Type: Journal
DOI: 10.1109/TAC.2004.825647 Document Type: Article

Times cited : (46)

References (38)

1
- 0027557742
- Discrete-time controlled Markov processes with average cost criterion: A survey
- A. Arapostathis, V. S. Borkar, E. Fernandez-Gaucherand, M. K. Ghosh, and S. I. Marcus, "Discrete-time controlled Markov processes with average cost criterion: A survey," SIAM J. Control Optim., vol. 31, pp. 282-344, 1993.
- (1993) SIAM J. Control Optim. , vol.31 , pp. 282-344
- Arapostathis, A.¹ Borkar, V.S.² Fernandez-Gaucherand, E.³ Ghosh, M.K.⁴ Marcus, S.I.⁵

2
- 0003565783
- Belmont, MA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific, 1995, vol. I and II.
- (1995) Dynamic Programming and Optimal Control , vol.1-2
- Bertsekas, D.P.¹

3
- 0003487482
- Belmont, MA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

4
- 0003618624
- New York: Springer-Verlag
- P. Brémaud, Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues. New York: Springer-Verlag, 1998.
- (1998) Markov Chains: Gibbs Fields, Monte Carlo Simulation, and Queues
- Brémaud, P.¹

5
- 0032027940
- The relation among potentials, perturbation analysis, Markov decision processes, and other topics
- X.-R. Cao, "The relation among potentials, perturbation analysis, Markov decision processes, and other topics," J. Discrete Event Dyna. Syst., vol. 8, pp. 71-87, 1998.
- (1998) J. Discrete Event Dyna. Syst. , vol.8 , pp. 71-87
- Cao, X.-R.¹

6
- 0033247533
- Single sample path based optimization of Markov chains
- ____, "Single sample path based optimization of Markov chains," J. Optim.: Theory Applicat. vol. 100, no. 3, pp. 527-548, 1999.
- (1999) J. Optim.: Theory Applicat. , vol.100 , Issue.3 , pp. 527-548
- Cao, X.-R.¹

7
- 0031258478
- Perturbation realization, potentials and sensitivity analysis of Markov processes
- Sept.
- X.-R. Cao and H. F. Chen, "Perturbation realization, potentials and sensitivity analysis of Markov processes," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Sept. 1997.
- (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1382-1393
- Cao, X.-R.¹ Chen, H.F.²

8
- 0036992818
- Gradient-based policy iteration: An example
- X.-R. Cao and H.-T. Fang, "Gradient-based policy iteration: An example," presented at the 2002 IEEE Conf. Decision Control.
- 2002 IEEE Conf. Decision Control
- Cao, X.-R.¹ Fang, H.-T.²

9
- 0036604532
- A time aggregation approach to Markov decision processes
- X.-R. Cao, Z. Y. Ren, S. Bhatnagar, F. Fu, and S. Marcus, "A time aggregation approach to Markov decision processes," Automatica, vol. 38, pp. 929-943, 2002.
- (2002) Automatica , vol.38 , pp. 929-943
- Cao, X.-R.¹ Ren, Z.Y.² Bhatnagar, S.³ Fu, F.⁴ Marcus, S.⁵

10
- 0032122986
- Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
- July
- X.-R. Cao and Y. W. Wan, "Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization," IEEE Trans. Contr. Syst. Tech, vol. 6, pp. 482-494, July 1998.
- (1998) IEEE Trans. Contr. Syst. Tech , vol.6 , pp. 482-494
- Cao, X.-R.¹ Wan, Y.W.²

11
- 0003864139
- Norwell, MA: Kluwer
- C. G. Cassandras and S. Lafortune, Introduction to Discrete Event Systems. Norwell, MA: Kluwer, 1999.
- (1999) Introduction to Discrete Event Systems
- Cassandras, C.G.¹ Lafortune, S.²

12
- 0038380746
- Convergence of simulation-based policy iteration
- W. L. Cooper, S. H. Henderson, and M. E. Lewis, "Convergence of simulation-based policy iteration," Probab. Eng. Inform. Sci., vol. 17, pp. 213-234, 2003.
- (2003) Probab. Eng. Inform. Sci. , vol.17 , pp. 213-234
- Cooper, W.L.¹ Henderson, S.H.² Lewis, M.E.³

13
- 0028466316
- Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
- Oct.
- E. K. P. Chong and P. J. Ramadge, "Stochastic optimization of regenerative systems using infinitesimal perturbation analysis," IEEE Trans. Automat. Contr., vol. 39, pp. 1400-1410, Oct. 1994.
- (1994) IEEE Trans. Automat. Contr. , vol.39 , pp. 1400-1410
- Chong, E.K.P.¹ Ramadge, P.J.²

14
- 0028444151
- Smoothed perturbation analysis derivative estimation for Markov chains
- M. C. Fu and J. Hu, "Smoothed perturbation analysis derivative estimation for Markov chains," Oper. Res. Lett., vol. 15, pp. 241-251, 1994.
- (1994) Oper. Res. Lett. , vol.15 , pp. 241-251
- Fu, M.C.¹ Hu, J.²

15
- 0023543886
- Likelihood ratio gradient estimation: An overview
- P. W. Glynn, "Likelihood ratio gradient estimation: An overview," in Proc. 1987 Winter Simulation Conf., 1987, pp. 366-375.
- Proc. 1987 Winter Simulation Conf., 1987 , pp. 366-375
- Glynn, P.W.¹

16
- 0024932244
- Optimization of stochastic systems via simulation
- ____, "Optimization of stochastic systems via simulation," in Proc. 1987 Winter Simulation Conf., 1989, pp. 90-105.
- Proc. 1987 Winter Simulation Conf., 1989 , pp. 90-105
- Glynn, P.W.¹

17
- 0020802518
- Perturbation analysis and optimization of queueing networks
- Y. C. Ho and X.-R. Cao, "Perturbation analysis and optimization of queueing networks," J. Optim. Theory Applicat., vol. 40, no. 4, pp. 559-582, 1983.
- (1983) J. Optim. Theory Applicat. , vol.40 , Issue.4 , pp. 559-582
- Ho, Y.C.¹ Cao, X.-R.²

18
- 0003585978
- Norwell, MA: Kluwer
- Y. C. Ho and X. R. Cao, Perturbation Analysis of Discrete-Event Dynamic Systems. Norwell, MA: Kluwer, 1991.
- (1991) Perturbation Analysis of Discrete-Event Dynamic Systems
- Ho, Y.C.¹ Cao, X.R.²

19
- 0032653557
- Explanation of goal softening in ordinal optimization
- Jan.
- L. H. Lee, E. T. K. Lau, and Y. C. Ho, "Explanation of goal softening in ordinal optimization," IEEE Trans. Automat. Contr., vol. 44, pp. 94-99, Jan. 1999.
- (1999) IEEE Trans. Automat. Contr. , vol.44 , pp. 94-99
- Lee, L.H.¹ Lau, E.T.K.² Ho, Y.C.³

20
- 85153938292
- Reinforcement learning algorithm for partially observable Markov decision problems
- San Francisco, CA: Morgan Kaufman
- T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable Markov decision problems," in Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kaufman, 1995, vol. 7, pp. 345-352.
- (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 345-352
- Jaakkola, T.¹ Singh, S.P.² Jordan, M.I.³

21
- 0343893613
- Actor-ctritic-type learning algorithms for Markov decision processes
- V. R. Konda and V. S. Borkar, "Actor-ctritic-type learning algorithms for Markov decision processes," SIAM J. Control Optim., vol. 38, pp. 94-123, 1999.
- (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
- Konda, V.R.¹ Borkar, V.S.²

22
- 21144477219
- Re-entrant lines
- P. R. Kumar, "Re-entrant lines," Queueing Syst.: Theory Applicat. vol. 1, pp. 87-110, 1993.
- (1993) Queueing Syst.: Theory Applicat. , vol.1 , pp. 87-110
- Kumar, P.R.¹

23
- 0035249254
- Simulation-based optimization of Markov reward processes
- Feb.
- P. Marbach and T. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. Automat. Contr., vol. 46, pp. 191-209, Feb. 2001.
- (2001) IEEE Trans. Automat. Contr. , vol.46 , pp. 191-209
- Marbach, P.¹ Tsitsiklis, T.N.²

24
- 0003637131
- London, U.K.: Springer-Verlag
- S. P. Meyn and R. L. Tweedie, Markov Chains and Stochastic Stability. London, U.K.: Springer-Verlag, 1993.
- (1993) Markov Chains and Stochastic Stability
- Meyn, S.P.¹ Tweedie, R.L.²

25
- 0031344030
- The policy improvement algorithm for Markov decision processes with general state space
- Oct.
- S. P. Meyn, "The policy improvement algorithm for Markov decision processes with general state space," IEEE Trans. Automat. Contr., vol. 42, pp. 1663-1680, Oct. 1997.
- (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1663-1680
- Meyn, S.P.¹

26
- 0001621211
- Sample-path optimization of convex stochastic performance functions
- E. L. Plambeck, B. R. Fu, S. M. Robinson, and R. Suri, "Sample-path optimization of convex stochastic performance functions," Math. Program. B, vol. 75, pp. 137-176, 1996.
- (1996) Math. Program. B , vol.75 , pp. 137-176
- Plambeck, E.L.¹ Fu, B.R.² Robinson, S.M.³ Suri, R.⁴

27
- 85102627959
- New York: Wiley
- M. L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming. New York: Wiley, 1994.
- (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman, M.L.¹

28
- 0024735795
- Sensitivity analysis via likelihood ratio
- M. I. Reiman and A. Weiss, "Sensitivity analysis via likelihood ratio," Oper. Res., vol. 37, pp. 830-844, 1989.
- (1989) Oper. Res. , vol.37 , pp. 830-844
- Reiman, M.I.¹ Weiss, A.²

29
- 0003418592
- New York: Wiley
- R. V. Rubinstein, Monte Carlo Optimization, Simulation, and Sensitivity Analysis of Queueing Networks. New York: Wiley, 1986.
- (1986) Monte Carlo Optimization, Simulation, and Sensitivity Analysis of Queueing Networks
- Rubinstein, R.V.¹

30
- 0024621270
- Single run optimization of discrete event simulations-An empirical study using M/M/I queue
- R. Suri and Y. T. Leung, "Single run optimization of discrete event simulations-An empirical study using M/M/I queue," IIE Trans., vol. 21, pp. 35-49, 1989.
- (1989) IIE Trans. , vol.21 , pp. 35-49
- Suri, R.¹ Leung, Y.T.²

31
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learn., vol. 3, pp. 835-846, 1988.
- (1988) Machine Learn. , vol.3 , pp. 835-846
- Sutton, R.S.¹

32
- 0004102479
- Cambridge, MA: MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. Cambridge, MA: MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

33
- 0042758707
- Actor-critic algorithms
- Cambridge, MA, Tech. Rep., Lab. Inform. Decision Systems, Mass. Inst. Technol.
- J. N. Tsitsiklis and V. R. Konda "Actor-critic algorithms," Cambridge, MA, Tech. Rep., Lab. Inform. Decision Systems, Mass. Inst. Technol., 2001.
- (2001)
- Tsitsiklis, J.N.¹ Konda, V.R.²

34
- 0029752470
- Feature-based methods for large-scale dynamic programming
- J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large-scale dynamic programming," Machine Learn., vol. 22, pp. 59-4, 1994.
- (1994) Machine Learn. , vol.22 , pp. 59-94
- Tsitsiklis, J.N.¹ Van Roy, B.²

35
- 0031143730
- An analysis of temporal-difference learning with function approximation
- May
- ____, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Contr., vol. 42, pp. 674-690, May 1997.
- (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 674-690
- Tsitsiklis, J.N.¹ Van Roy, B.²

36
- 0033221519
- Average cost temporal-difference learning
- ____, "Average cost temporal-difference learning," Automatica, vol. 35, pp. 1799-1808, 1999.
- (1999) Automatica , vol.35 , pp. 1799-1808
- Tsitsiklis, J.N.¹ Van Roy, B.²

37
- 0004049893
- Learning from delayed rewards
- Ph.D. dissertaton, Cambridge Univ., Cambridge, U.K.
- C. Watkins, "Learning from delayed rewards," Ph.D. dissertaton, Cambridge Univ., Cambridge, U.K., 1989.
- (1989)
- Watkins, C.¹

38
- 34249833101
- Q-learning
- C. Watkins and P. Dayan, "Q-learning," Machine Learn., vol. 8, pp. 279-292, 1992.
- (1992) Machine Learn. , vol.8 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.