메뉴 건너뛰기




Volumn 54, Issue 3, 2005, Pages 207-213

An actor-critic algorithm for constrained Markov decision processes

Author keywords

Actor critic algorithms; Constrained Markov decision processes; Envelope theorem; Reinforcement learning; Stochastic approximation

Indexed keywords

ALGORITHMS; APPROXIMATION THEORY; DECISION THEORY; DYNAMIC PROGRAMMING; LEARNING SYSTEMS; THEOREM PROVING;

EID: 13244278201     PISSN: 01676911     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.sysconle.2004.08.007     Document Type: Article
Times cited : (211)

References (17)
  • 2
    • 2342463476 scopus 로고    scopus 로고
    • Applications of Markov decision processes in communication networks
    • E.A. Feinberg A. Shwartz Kluwer Academic Publishers Dordrecht
    • E. Altman Applications of Markov decision processes in communication networks E.A. Feinberg A. Shwartz Handbook of Markov Decision Processes 2001 Kluwer Academic Publishers Dordrecht 489 536
    • (2001) Handbook of Markov Decision Processes , pp. 489-536
    • Altman, E.1
  • 6
    • 0031076413 scopus 로고    scopus 로고
    • Stochastic approximation with two time scales
    • V.S. Borkar Stochastic approximation with two time scales Systems Control Lett. 29 1997 291 294
    • (1997) Systems Control Lett. , vol.29 , pp. 291-294
    • Borkar, V.S.1
  • 7
    • 13244262450 scopus 로고    scopus 로고
    • Convex analytic methods in Markov decision processes
    • E.A. Feinberg A. Shwartz Kluwer Academic Publishers Dordrecht
    • V.S. Borkar Convex analytic methods in Markov decision processes E.A. Feinberg A. Shwartz Handbook of Markov Decision Processes 2001 Kluwer Academic Publishers Dordrecht 347 375
    • (2001) Handbook of Markov Decision Processes , pp. 347-375
    • Borkar, V.S.1
  • 8
    • 0343893613 scopus 로고    scopus 로고
    • Actor-critic-type learning algorithms for Markov decision processes
    • V.R. Konda, and V.S. Borkar Actor-critic-type learning algorithms for Markov decision processes SIAM J. Control Optim. 38 1999 94 123
    • (1999) SIAM J. Control Optim. , vol.38 , pp. 94-123
    • Konda, V.R.1    Borkar, V.S.2
  • 10
    • 79960013704 scopus 로고    scopus 로고
    • A geometric approach to multi-criterion reinforcement learning
    • S. Mannor, and N. Shimkin A geometric approach to multi-criterion reinforcement learning J. Mach. Learn. Res. 5 2004 325 360
    • (2004) J. Mach. Learn. Res. , vol.5 , pp. 325-360
    • Mannor, S.1    Shimkin, N.2
  • 12
    • 0036212678 scopus 로고    scopus 로고
    • Envelope theorems for arbitrary choice sets
    • P. Milgrom, and I. Segal Envelope theorems for arbitrary choice sets Econometrica 70 2002 583 601
    • (2002) Econometrica , vol.70 , pp. 583-601
    • Milgrom, P.1    Segal, I.2
  • 15
    • 0031143730 scopus 로고    scopus 로고
    • An analysis of temporal-difference learning with function approximation
    • J.N. Tsitsiklis, and B. Van Roy An analysis of temporal-difference learning with function approximation IEEE Trans. Automat. Control 42 1997 674 690
    • (1997) IEEE Trans. Automat. Control , vol.42 , pp. 674-690
    • Tsitsiklis, J.N.1    Van Roy, B.2
  • 16
    • 4544283129 scopus 로고    scopus 로고
    • Neuro-dynamic programming: Overview and recent trends
    • E.A. Feinberg A. Shwartz Kluwer Academic Publishers Dordrecht
    • B. Van Roy Neuro-dynamic programming overview and recent trends E.A. Feinberg A. Shwartz Handbook of Markov Decision Processes 2001 Kluwer Academic Publishers Dordrecht 431 459
    • (2001) Handbook of Markov Decision Processes , pp. 431-459
    • Van Roy, B.1
  • 17
    • 13244262451 scopus 로고    scopus 로고
    • Self learning control of constrained Markov decision processes - A gradient approach
    • F.J. Vazquez Abad, V. Krishnamurthy, Self learning control of constrained Markov decision processes - a gradient approach, Les Cahiers du GERAD, 2003.
    • (2003) Les Cahiers Du GERAD
    • Abad, F.J.V.1    Krishnamurthy, V.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.