메뉴 건너뛰기




Volumn 7, Issue 6, 2016, Pages 967-980

Reinforcement learning and neural networks for multi-agent nonzero-sum games of nonlinear constrained-input systems

Author keywords

Concurrent reinforcement learning; Coupled Hamilton Jacobi equations; Input constraints; Multi agent nonzero sum games; Neural networks

Indexed keywords

ADAPTIVE CONTROL SYSTEMS; CLOSED LOOP SYSTEMS; CONTINUOUS TIME SYSTEMS; GAME THEORY; NEURAL NETWORKS; ONLINE SYSTEMS; OPTIMAL CONTROL SYSTEMS; REINFORCEMENT LEARNING;

EID: 84994504422     PISSN: 18688071     EISSN: 1868808X     Source Type: Journal    
DOI: 10.1007/s13042-014-0300-y     Document Type: Article
Times cited : (40)

References (40)
  • 2
    • 34247618255 scopus 로고    scopus 로고
    • Newton’s method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems
    • Mukaidani H (2007) Newton’s method for solving cross-coupled sign-indefinite algebraic Riccati equations for weakly coupled large-scale systems. J Appl Math Comput 188(1):103–115
    • (2007) J Appl Math Comput , vol.188 , Issue.1 , pp. 103-115
    • Mukaidani, H.1
  • 4
    • 34250487269 scopus 로고
    • Nonzero-sum differential games
    • Starr A, Ho Y (1969) Nonzero-sum differential games. J Optim Theory Appl 3(3):148–206
    • (1969) J Optim Theory Appl , vol.3 , Issue.3 , pp. 148-206
    • Starr, A.1    Ho, Y.2
  • 6
    • 0000672181 scopus 로고
    • Lyapunov iterations for solving coupled algebraic Lyapunov equations of Nash differential games and algebraic Riccati equations of zero-sum games
    • Birkhäuser, Boston
    • Li T, Gajic Z (1994) Lyapunov iterations for solving coupled algebraic Lyapunov equations of Nash differential games and algebraic Riccati equations of zero-sum games. New Trends Dynam Appl. Birkhäuser, Boston, pp 489–494
    • (1994) New Trends Dynam Appl , pp. 489-494
    • Li, T.1    Gajic, Z.2
  • 7
    • 0030086666 scopus 로고    scopus 로고
    • On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games
    • Freiling G, Jank G, Abou-Kandil H (2002) On global existence of solutions to coupled matrix Riccati equations in closed-loop Nash games. IEEE Trans Autom Control 41(2):264–269
    • (2002) IEEE Trans Autom Control , vol.41 , Issue.2 , pp. 264-269
    • Freiling, G.1    Jank, G.2    Abou-Kandil, H.3
  • 8
    • 79953127250 scopus 로고    scopus 로고
    • Solving coupled Riccati equations for closed-loop Nash strategy by lack of trust approach
    • Jungers M, De Pieri E, Abu-Kandil H (2007) Solving coupled Riccati equations for closed-loop Nash strategy by lack of trust approach. Int J Tomography Stat 7:49–54
    • (2007) Int J Tomography Stat , vol.7 , pp. 49-54
    • Jungers, M.1    De Pieri, E.2    Abu-Kandil, H.3
  • 9
    • 33847202724 scopus 로고
    • Learning to predictive by the method of temporal differences
    • Sutton R (1988) Learning to predictive by the method of temporal differences. Mach Learn 3(1):9–44
    • (1988) Mach Learn , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.1
  • 10
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • Lewis FL, Vrabie D (2009) Reinforcement learning and adaptive dynamic programming for feedback control. IEEE Circuits Syst Mag 9(3):32–50
    • (2009) IEEE Circuits Syst Mag , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 11
    • 84883537695 scopus 로고    scopus 로고
    • Reinforcement learning and feedback control
    • Lewis FL, Vrabie D, Vamvoudakis K (2012) Reinforcement learning and feedback control. IEEE Control Syst 32(6):76–105
    • (2012) IEEE Control Syst , vol.32 , Issue.6 , pp. 76-105
    • Lewis, F.L.1    Vrabie, D.2    Vamvoudakis, K.3
  • 12
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • White DA, Sofge DA, (eds), Multiscience Press, Brentwood
    • Werbos PJ (1992) Approximate dynamic programming for real-time control and neural modeling. In: White DA, Sofge DA (eds) Handbook of intelligent control. Multiscience Press, Brentwood
    • (1992) Handbook of intelligent control
    • Werbos, P.J.1
  • 15
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
    • Vrabie D, Lewis FL (2009) Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems. Neural Netw 22(3):237–246
    • (2009) Neural Netw , vol.22 , Issue.3 , pp. 237-246
    • Vrabie, D.1    Lewis, F.L.2
  • 16
    • 77950630017 scopus 로고    scopus 로고
    • Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem
    • Vamvoudakis K, Lewis FL (2010) Online actor-critic algorithm to solve the continuous infinite time horizon optimal control problem. Automatica 46(5):878–888
    • (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
    • Vamvoudakis, K.1    Lewis, F.L.2
  • 17
    • 84871319455 scopus 로고    scopus 로고
    • A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
    • Bhasin S, Kamalapurkar R, Johnson M, Vamvoudakis K, Lewis FL, Dixon WD (2012) A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems. Automatica 49(1):82–92
    • (2012) Automatica , vol.49 , Issue.1 , pp. 82-92
    • Bhasin, S.1    Kamalapurkar, R.2    Johnson, M.3    Vamvoudakis, K.4    Lewis, F.L.5    Dixon, W.D.6
  • 18
    • 84885176157 scopus 로고    scopus 로고
    • Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks
    • Modares H, Lewis FL, Naghibi Sistani MB (2013) Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Trans Neural Netw Learning Syst 24(10):1513–1525
    • (2013) IEEE Trans Neural Netw Learning Syst , vol.24 , Issue.10 , pp. 1513-1525
    • Modares, H.1    Lewis, F.L.2    Naghibi Sistani, M.B.3
  • 19
    • 79960443754 scopus 로고    scopus 로고
    • Adaptive dynamic programming for online solution of a zero-sum differential game
    • Vrabie D, Lewis FL (2011) Adaptive dynamic programming for online solution of a zero-sum differential game. J Control Theory Appl 9(3):353–360
    • (2011) J Control Theory Appl , vol.9 , Issue.3 , pp. 353-360
    • Vrabie, D.1    Lewis, F.L.2
  • 20
    • 79953155097 scopus 로고    scopus 로고
    • Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. In Proc
    • Vamvoudakis K, Lewis FL (2010) Online solution of nonlinear two-player zero-sum games using synchronous policy iteration. In Proc. 49th IEEE CDC, pp 3040-3047
    • (2010) 49th IEEE CDC , pp. 3040-3047
    • Vamvoudakis, K.1    Lewis, F.L.2
  • 22
    • 84860670757 scopus 로고    scopus 로고
    • Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proc. IEEE CDC
    • Johnson M, Bhasin S, Dixon WE (2011) Nonlinear two-player zero-sum game approximate solution using a policy iteration algorithm. In: Proc. IEEE CDC, pp 142–147
    • (2011) pp 142–147
    • Johnson, M.1    Bhasin, S.2    Dixon, W.E.3
  • 23
    • 79953133535 scopus 로고    scopus 로고
    • Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proc. 49th IEEE CDC
    • Vrabie D, Lewis FL (2010) Integral reinforcement learning for online computation of feedback Nash strategies of nonzero-sum differential games. In: Proc. 49th IEEE CDC, pp 3066–3071
    • (2010) pp 3066–3071
    • Vrabie, D.1    Lewis, F.L.2
  • 24
    • 79960897012 scopus 로고    scopus 로고
    • Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations
    • Vamvoudakis K, Lewis FL (2011) Multi-player non-zero-sum games: online adaptive learning solution of coupled Hamilton-Jacobi equations. Automatica 47(8):1556–1569
    • (2011) Automatica , vol.47 , Issue.8 , pp. 1556-1569
    • Vamvoudakis, K.1    Lewis, F.L.2
  • 25
    • 84885835001 scopus 로고    scopus 로고
    • Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP
    • Zhang H, Cui L, Luo Y (2013) Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using single-network ADP. IEEE Trans Cybern 45(1):206–216
    • (2013) IEEE Trans Cybern , vol.45 , Issue.1 , pp. 206-216
    • Zhang, H.1    Cui, L.2    Luo, Y.3
  • 26
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    • Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.L.2
  • 27
    • 48949116222 scopus 로고    scopus 로고
    • Neurodynamic programming and zero-sum games for constrained control systems
    • Abu-Khalaf M, Lewis FL, Huang J (2008) Neurodynamic programming and zero-sum games for constrained control systems. IEEE Trans Neural Netw 19(7):1243–1252
    • (2008) IEEE Trans Neural Netw , vol.19 , Issue.7 , pp. 1243-1252
    • Abu-Khalaf, M.1    Lewis, F.L.2    Huang, J.3
  • 30
    • 84893708995 scopus 로고    scopus 로고
    • Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems
    • Modares H, Lewis FL, Naghibi Sistani MB (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202
    • (2014) Automatica , vol.50 , Issue.1 , pp. 193-202
    • Modares, H.1    Lewis, F.L.2    Naghibi Sistani, M.B.3
  • 31
    • 84927697693 scopus 로고    scopus 로고
    • Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems
    • Yasini S, Karimpour A, Naghibi Sistani MB, Modares H (2014) Online concurrent reinforcement learning algorithm to solve two-player zero-sum games for partially unknown nonlinear continuous-time systems. Int J Adapt Cont Sig Proc. doi:10.1002/acs.2485
    • (2014) Int J Adapt Cont Sig Proc
    • Yasini, S.1    Karimpour, A.2    Naghibi Sistani, M.B.3    Modares, H.4
  • 33
    • 84881324637 scopus 로고    scopus 로고
    • Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals. In Proc
    • Lyshevski SE (1998) Optimal control of nonlinear continuous-time systems: design of bounded controllers via generalized nonquadratic functionals. In Proc. IEEE ACC. pp 205–209
    • (1998) IEEE ACC , pp. 205-209
    • Lyshevski, S.E.1
  • 34
    • 0025627940 scopus 로고
    • Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks
    • Hornik K, Stinchcombe M, White H (1990) Universal approximation of an unknown mapping and its derivatives using multilayer feedforward networks. Neural Netw 3(5):551–560
    • (1990) Neural Netw , vol.3 , Issue.5 , pp. 551-560
    • Hornik, K.1    Stinchcombe, M.2    White, H.3
  • 35
    • 40649105766 scopus 로고    scopus 로고
    • A definition of partial derivative of random functions and its application to RBFNN sensitivity analysis
    • Wang XZ, Li CG, Yeung DS, Song S, Feng H (2008) A definition of partial derivative of random functions and its application to RBFNN sensitivity analysis. Neurocomputing 71(7–9):1515–1526
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1515-1526
    • Wang, X.Z.1    Li, C.G.2    Yeung, D.S.3    Song, S.4    Feng, H.5
  • 36
    • 84961289486 scopus 로고    scopus 로고
    • Online neural network model for non-stationary and imbalanced data stream classification
    • Ghazikhani A, Monsefi R, Sadoghi Yazdi H (2014) Online neural network model for non-stationary and imbalanced data stream classification. Int J Mach Learn Cyber 5(1):51–62. doi:10.1007/s13042-013-0180-6
    • (2014) Int J Mach Learn Cyber , vol.5 , Issue.1 , pp. 51-62
    • Ghazikhani, A.1    Monsefi, R.2    Sadoghi Yazdi, H.3
  • 37
    • 84877744884 scopus 로고    scopus 로고
    • Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues
    • Barakat M, Lefebvre D, Khalil M, Druaux F, Mustapha O (2013) Parameter selection algorithm with self adaptive growing neural network classifier for diagnosis issues. Int J Mach Learn Cyber 4(3):217–233. doi:10.1007/s13042-012-0089-5
    • (2013) Int J Mach Learn Cyber , vol.4 , Issue.3 , pp. 217-233
    • Barakat, M.1    Lefebvre, D.2    Khalil, M.3    Druaux, F.4    Mustapha, O.5
  • 38
    • 62949149213 scopus 로고    scopus 로고
    • Constrained nonlinear optimal control: A converse HJB approach. California Institute of Technology, Tech
    • Nevisitc V, Primbs JA (1996) Constrained nonlinear optimal control: A converse HJB approach. California Institute of Technology, Tech. Rep
    • (1996) Rep
    • Nevisitc, V.1    Primbs, J.A.2
  • 39
    • 84933509471 scopus 로고    scopus 로고
    • Dynamic analysis of discrete-time BAM neural networks with stochastic perturbations and impulses
    • Raja R, Karthik Raja U, Samidurai R, Leelamani A (2014) Dynamic analysis of discrete-time BAM neural networks with stochastic perturbations and impulses. Int J Mach Learn Cyber 5(1):39–50. doi:10.1007/s13042-013-0199-8
    • (2014) Int J Mach Learn Cyber , vol.5 , Issue.1 , pp. 39-50
    • Raja, R.1    Karthik Raja, U.2    Samidurai, R.3    Leelamani, A.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.