메뉴 건너뛰기




Volumn 29, Issue 4, 2015, Pages 473-493

Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems

Author keywords

H control; neural networks; online concurrent reinforcement learning algorithm; two player zero sum games

Indexed keywords

CLOSED LOOP SYSTEMS; CONCURRENCY CONTROL; CONTINUOUS TIME SYSTEMS; E-LEARNING; GAME THEORY; LEARNING SYSTEMS; NEURAL NETWORKS; NONLINEAR CONTROL SYSTEMS; ONLINE SYSTEMS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING;

EID: 84927697693     PISSN: 08906327     EISSN: 10991115     Source Type: Journal    
DOI: 10.1002/acs.2485     Document Type: Article
Times cited : (22)

References (40)
  • 1
    • 0019559036 scopus 로고
    • Feedback and optimal sensitivity: Model reference transformations, multiplicative seminorms, and approximate inverses
    • Zames G,. Feedback and optimal sensitivity: model reference transformations, multiplicative seminorms, and approximate inverses. IEEE Transactions on Automatic Control 1981; 26 (2): 301-320.
    • (1981) IEEE Transactions on Automatic Control , vol.26 , Issue.2 , pp. 301-320
    • Zames, G.1
  • 2
    • 0029264110 scopus 로고
    • ∞ control via measurement feedback for general nonlinear systems
    • ∞ control via measurement feedback for general nonlinear systems. IEEE Transactions on Automatic Control 1995; 40 (3): 446-472.
    • (1995) IEEE Transactions on Automatic Control , vol.40 , Issue.3 , pp. 446-472
    • Isidori, A.1    Kang, W.2
  • 9
    • 0031332446 scopus 로고    scopus 로고
    • Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
    • Beard R, Saridis GN, Wen J,. Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation. Automatica 1997; 33 (10): 2159-2177.
    • (1997) Automatica , vol.33 , Issue.10 , pp. 2159-2177
    • Beard, R.1    Saridis, G.N.2    Wen, J.3
  • 10
    • 0032202335 scopus 로고    scopus 로고
    • Successive Galerkin approximation algorithms for nonlinear optimal and robust control
    • Beard R, McLain T,. Successive Galerkin approximation algorithms for nonlinear optimal and robust control. International Journal of Control 1998; 71 (5): 717-743.
    • (1998) International Journal of Control , vol.71 , Issue.5 , pp. 717-743
    • Beard, R.1    McLain, T.2
  • 13
    • 48949116222 scopus 로고    scopus 로고
    • Neurodynamic programming and zero-sum games for constrained control systems
    • Abu-Khalaf M, Lewis FL, Huang J,. Neurodynamic programming and zero-sum games for constrained control systems. IEEE Transactions on Neural Networks 2008; 19 (5): 1243-1252.
    • (2008) IEEE Transactions on Neural Networks , vol.19 , Issue.5 , pp. 1243-1252
    • Abu-Khalaf, M.1    Lewis, F.L.2    Huang, J.3
  • 15
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • Lewis FL, Vrabie D,. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits and Systems Magazine 2009; 9 (3): 32-50.
    • (2009) IEEE Circuits and Systems Magazine , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 17
    • 84864489666 scopus 로고    scopus 로고
    • Optimal control of nonlinear discrete-time systems based on adaptive dynamic programming approach
    • Wang D, Liu D, Wei Q, Zhao D, Jin N,. Optimal control of nonlinear discrete-time systems based on adaptive dynamic programming approach. Automatica 2012; 48 (6): 1825-1832.
    • (2012) Automatica , vol.48 , Issue.6 , pp. 1825-1832
    • Wang, D.1    Liu, D.2    Wei, Q.3    Zhao, D.4    Jin, N.5
  • 20
    • 84876066909 scopus 로고    scopus 로고
    • Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm
    • Liu D, Li H, Wang D,. Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm. Neurocomputing 2013; 110: 92-100.
    • (2013) Neurocomputing , vol.110 , pp. 92-100
    • Liu, D.1    Li, H.2    Wang, D.3
  • 21
    • 13244279592 scopus 로고    scopus 로고
    • Robust reinforcement learning
    • Morimoto J, Doya K,. Robust reinforcement learning. Neural Computation 2005; 17 (2): 335-359.
    • (2005) Neural Computation , vol.17 , Issue.2 , pp. 335-359
    • Morimoto, J.1    Doya, K.2
  • 22
    • 79960443754 scopus 로고    scopus 로고
    • Adaptive dynamic programming for online solution of a zero-sum differential game
    • Vrabie D, Lewis FL,. Adaptive dynamic programming for online solution of a zero-sum differential game. Journal of Control Theory and Applications 2011; 9 (3): 353-360.
    • (2011) Journal of Control Theory and Applications , vol.9 , Issue.3 , pp. 353-360
    • Vrabie, D.1    Lewis, F.L.2
  • 23
    • 77950630017 scopus 로고    scopus 로고
    • Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem
    • Vamvoudakis KG, Lewis FL,. Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem. Automatica 2010; 46: 878-888.
    • (2010) Automatica , vol.46 , pp. 878-888
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 24
    • 84864463039 scopus 로고    scopus 로고
    • Online solution of nonlinear two-player zero-sum games using synchronous policy iteration
    • Vamvoudakis KG, Lewis FL,. Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. International Journal of Robust and Nonlinear Control 2012; 22: 1460-1483.
    • (2012) International Journal of Robust and Nonlinear Control , vol.22 , pp. 1460-1483
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 25
    • 84881373865 scopus 로고    scopus 로고
    • A policy iteration approach to online optimal control of continuous-time constrained-input systems
    • Modares H, Naghibi Sistani MB, Lewis FL,. A policy iteration approach to online optimal control of continuous-time constrained-input systems. ISA Transactions 2013; 52: 611-621.
    • (2013) ISA Transactions , vol.52 , pp. 611-621
    • Modares, H.1    Naghibi Sistani, M.B.2    Lewis, F.L.3
  • 28
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems
    • Vrabie D, Lewis FL,. Neural network approach to continuous time direct adaptive optimal control for partially unknown nonlinear systems. Neural Networks 2009; 22: 237-246.
    • (2009) Neural Networks , vol.22 , pp. 237-246
    • Vrabie, D.1    Lewis, F.L.2
  • 34
    • 84893708995 scopus 로고    scopus 로고
    • Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems
    • Modares H, Lewis FL, Naghibi Sistani MB,. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 2014; 50 (1): 193-202.
    • (2014) Automatica , vol.50 , Issue.1 , pp. 193-202
    • Modares, H.1    Lewis, F.L.2    Naghibi Sistani, M.B.3
  • 36
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • White D.A. Sofge D.A. (eds). Van Nostrand Reinhold: New York.
    • Werbos PJ,. Approximate dynamic programming for real-time control and neural modeling. In Handbook of Intelligent Control, White DA, Sofge DA, (eds). Van Nostrand Reinhold: New York, 1992.
    • (1992) Handbook of Intelligent Control
    • Werbos, P.J.1
  • 39
    • 0025627940 scopus 로고
    • Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
    • Hornik K, Stinchcombe M, White H,. Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Networks 1990; 3: 551-560.
    • (1990) Neural Networks , vol.3 , pp. 551-560
    • Hornik, K.1    Stinchcombe, M.2    White, H.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.