메뉴 건너뛰기




Volumn 49, Issue 4, 2004, Pages 493-505

Potential-based online policy iteration algorithms for Markov decision processes

Author keywords

Markov decision process; Potential; Recursive optimization

Indexed keywords

DECISION SUPPORT SYSTEMS; ITERATIVE METHODS; MARKOV PROCESSES; OPTIMAL SYSTEMS;

EID: 2442614974     PISSN: 00189286     EISSN: None     Source Type: Journal    
DOI: 10.1109/TAC.2004.825647     Document Type: Article
Times cited : (46)

References (38)
  • 5
    • 0032027940 scopus 로고    scopus 로고
    • The relation among potentials, perturbation analysis, Markov decision processes, and other topics
    • X.-R. Cao, "The relation among potentials, perturbation analysis, Markov decision processes, and other topics," J. Discrete Event Dyna. Syst., vol. 8, pp. 71-87, 1998.
    • (1998) J. Discrete Event Dyna. Syst. , vol.8 , pp. 71-87
    • Cao, X.-R.1
  • 6
    • 0033247533 scopus 로고    scopus 로고
    • Single sample path based optimization of Markov chains
    • ____, "Single sample path based optimization of Markov chains," J. Optim.: Theory Applicat. vol. 100, no. 3, pp. 527-548, 1999.
    • (1999) J. Optim.: Theory Applicat. , vol.100 , Issue.3 , pp. 527-548
    • Cao, X.-R.1
  • 7
    • 0031258478 scopus 로고    scopus 로고
    • Perturbation realization, potentials and sensitivity analysis of Markov processes
    • Sept.
    • X.-R. Cao and H. F. Chen, "Perturbation realization, potentials and sensitivity analysis of Markov processes," IEEE Trans. Automat. Contr., vol. 42, pp. 1382-1393, Sept. 1997.
    • (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1382-1393
    • Cao, X.-R.1    Chen, H.F.2
  • 9
    • 0036604532 scopus 로고    scopus 로고
    • A time aggregation approach to Markov decision processes
    • X.-R. Cao, Z. Y. Ren, S. Bhatnagar, F. Fu, and S. Marcus, "A time aggregation approach to Markov decision processes," Automatica, vol. 38, pp. 929-943, 2002.
    • (2002) Automatica , vol.38 , pp. 929-943
    • Cao, X.-R.1    Ren, Z.Y.2    Bhatnagar, S.3    Fu, F.4    Marcus, S.5
  • 10
    • 0032122986 scopus 로고    scopus 로고
    • Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization
    • July
    • X.-R. Cao and Y. W. Wan, "Algorithms for sensitivity analysis of Markov systems through potentials and perturbation realization," IEEE Trans. Contr. Syst. Tech, vol. 6, pp. 482-494, July 1998.
    • (1998) IEEE Trans. Contr. Syst. Tech , vol.6 , pp. 482-494
    • Cao, X.-R.1    Wan, Y.W.2
  • 13
    • 0028466316 scopus 로고
    • Stochastic optimization of regenerative systems using infinitesimal perturbation analysis
    • Oct.
    • E. K. P. Chong and P. J. Ramadge, "Stochastic optimization of regenerative systems using infinitesimal perturbation analysis," IEEE Trans. Automat. Contr., vol. 39, pp. 1400-1410, Oct. 1994.
    • (1994) IEEE Trans. Automat. Contr. , vol.39 , pp. 1400-1410
    • Chong, E.K.P.1    Ramadge, P.J.2
  • 14
    • 0028444151 scopus 로고
    • Smoothed perturbation analysis derivative estimation for Markov chains
    • M. C. Fu and J. Hu, "Smoothed perturbation analysis derivative estimation for Markov chains," Oper. Res. Lett., vol. 15, pp. 241-251, 1994.
    • (1994) Oper. Res. Lett. , vol.15 , pp. 241-251
    • Fu, M.C.1    Hu, J.2
  • 17
    • 0020802518 scopus 로고
    • Perturbation analysis and optimization of queueing networks
    • Y. C. Ho and X.-R. Cao, "Perturbation analysis and optimization of queueing networks," J. Optim. Theory Applicat., vol. 40, no. 4, pp. 559-582, 1983.
    • (1983) J. Optim. Theory Applicat. , vol.40 , Issue.4 , pp. 559-582
    • Ho, Y.C.1    Cao, X.-R.2
  • 19
    • 0032653557 scopus 로고    scopus 로고
    • Explanation of goal softening in ordinal optimization
    • Jan.
    • L. H. Lee, E. T. K. Lau, and Y. C. Ho, "Explanation of goal softening in ordinal optimization," IEEE Trans. Automat. Contr., vol. 44, pp. 94-99, Jan. 1999.
    • (1999) IEEE Trans. Automat. Contr. , vol.44 , pp. 94-99
    • Lee, L.H.1    Lau, E.T.K.2    Ho, Y.C.3
  • 20
    • 85153938292 scopus 로고
    • Reinforcement learning algorithm for partially observable Markov decision problems
    • San Francisco, CA: Morgan Kaufman
    • T. Jaakkola, S. P. Singh, and M. I. Jordan, "Reinforcement learning algorithm for partially observable Markov decision problems," in Advances in Neural Information Processing Systems. San Francisco, CA: Morgan Kaufman, 1995, vol. 7, pp. 345-352.
    • (1995) Advances in Neural Information Processing Systems , vol.7 , pp. 345-352
    • Jaakkola, T.1    Singh, S.P.2    Jordan, M.I.3
  • 21
    • 0343893613 scopus 로고    scopus 로고
    • Actor-ctritic-type learning algorithms for Markov decision processes
    • V. R. Konda and V. S. Borkar, "Actor-ctritic-type learning algorithms for Markov decision processes," SIAM J. Control Optim., vol. 38, pp. 94-123, 1999.
    • (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 23
    • 0035249254 scopus 로고    scopus 로고
    • Simulation-based optimization of Markov reward processes
    • Feb.
    • P. Marbach and T. N. Tsitsiklis, "Simulation-based optimization of Markov reward processes," IEEE Trans. Automat. Contr., vol. 46, pp. 191-209, Feb. 2001.
    • (2001) IEEE Trans. Automat. Contr. , vol.46 , pp. 191-209
    • Marbach, P.1    Tsitsiklis, T.N.2
  • 25
    • 0031344030 scopus 로고    scopus 로고
    • The policy improvement algorithm for Markov decision processes with general state space
    • Oct.
    • S. P. Meyn, "The policy improvement algorithm for Markov decision processes with general state space," IEEE Trans. Automat. Contr., vol. 42, pp. 1663-1680, Oct. 1997.
    • (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 1663-1680
    • Meyn, S.P.1
  • 26
    • 0001621211 scopus 로고    scopus 로고
    • Sample-path optimization of convex stochastic performance functions
    • E. L. Plambeck, B. R. Fu, S. M. Robinson, and R. Suri, "Sample-path optimization of convex stochastic performance functions," Math. Program. B, vol. 75, pp. 137-176, 1996.
    • (1996) Math. Program. B , vol.75 , pp. 137-176
    • Plambeck, E.L.1    Fu, B.R.2    Robinson, S.M.3    Suri, R.4
  • 28
    • 0024735795 scopus 로고
    • Sensitivity analysis via likelihood ratio
    • M. I. Reiman and A. Weiss, "Sensitivity analysis via likelihood ratio," Oper. Res., vol. 37, pp. 830-844, 1989.
    • (1989) Oper. Res. , vol.37 , pp. 830-844
    • Reiman, M.I.1    Weiss, A.2
  • 30
    • 0024621270 scopus 로고
    • Single run optimization of discrete event simulations-An empirical study using M/M/I queue
    • R. Suri and Y. T. Leung, "Single run optimization of discrete event simulations-An empirical study using M/M/I queue," IIE Trans., vol. 21, pp. 35-49, 1989.
    • (1989) IIE Trans. , vol.21 , pp. 35-49
    • Suri, R.1    Leung, Y.T.2
  • 31
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Machine Learn., vol. 3, pp. 835-846, 1988.
    • (1988) Machine Learn. , vol.3 , pp. 835-846
    • Sutton, R.S.1
  • 33
    • 0042758707 scopus 로고    scopus 로고
    • Actor-critic algorithms
    • Cambridge, MA, Tech. Rep., Lab. Inform. Decision Systems, Mass. Inst. Technol.
    • J. N. Tsitsiklis and V. R. Konda "Actor-critic algorithms," Cambridge, MA, Tech. Rep., Lab. Inform. Decision Systems, Mass. Inst. Technol., 2001.
    • (2001)
    • Tsitsiklis, J.N.1    Konda, V.R.2
  • 34
    • 0029752470 scopus 로고
    • Feature-based methods for large-scale dynamic programming
    • J. N. Tsitsiklis and B. Van Roy, "Feature-based methods for large-scale dynamic programming," Machine Learn., vol. 22, pp. 59-4, 1994.
    • (1994) Machine Learn. , vol.22 , pp. 59-94
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 35
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • May
    • ____, "An analysis of temporal-difference learning with function approximation," IEEE Trans. Automat. Contr., vol. 42, pp. 674-690, May 1997.
    • (1997) IEEE Trans. Automat. Contr. , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 36
    • 0033221519 scopus 로고    scopus 로고
    • Average cost temporal-difference learning
    • ____, "Average cost temporal-difference learning," Automatica, vol. 35, pp. 1799-1808, 1999.
    • (1999) Automatica , vol.35 , pp. 1799-1808
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 37
    • 0004049893 scopus 로고
    • Learning from delayed rewards
    • Ph.D. dissertaton, Cambridge Univ., Cambridge, U.K.
    • C. Watkins, "Learning from delayed rewards," Ph.D. dissertaton, Cambridge Univ., Cambridge, U.K., 1989.
    • (1989)
    • Watkins, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.