메뉴 건너뛰기




Volumn 50, Issue 4, 2014, Pages 1167-1175

Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics

Author keywords

Algebraic Riccati equation; Linear quadratic tracker; Policy iteration; Reinforcement learning

Indexed keywords

ALGEBRA; DIFFERENCE EQUATIONS; DIGITAL CONTROL SYSTEMS; DISCRETE TIME CONTROL SYSTEMS; ITERATIVE METHODS; NAVIGATION; REINFORCEMENT LEARNING; RICCATI EQUATIONS;

EID: 84898853127     PISSN: 00051098     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.automatica.2014.02.015     Document Type: Article
Times cited : (502)

References (30)
  • 1
    • 33846781129 scopus 로고    scopus 로고
    • Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
    • A. Al-Tamimi, F.L. Lewis, and M. Abu-Khalaf Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control Automatica 43 3 2007 473 481
    • (2007) Automatica , vol.43 , Issue.3 , pp. 473-481
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 3
    • 6344234575 scopus 로고
    • Reinforcement learning in continuous time: Advantage updating
    • Baird, L. C. III (1994). Reinforcement learning in continuous time: advantage updating. In Proc. of ICNN.
    • (1994) Proc. of ICNN
    • Baird III, L.C.1
  • 6
    • 0028584964 scopus 로고
    • Adaptive linear quadratic control using policy iteration
    • Bradtke, S. J.; Ydestie, B. E.; Barto, A. G. (1994). Adaptive linear quadratic control using policy iteration. In: Proc. of ACC (pp. 3475-3476).
    • (1994) Proc. of ACC , pp. 3475-3476
    • Bradtke, S.J.1    Ydestie, B.E.2    Barto, A.G.3
  • 7
    • 77957777969 scopus 로고    scopus 로고
    • Optimal control of affine nonlinear continuous-time systems
    • Dierks, T.; Jagannathan, S. (2010). Optimal control of affine nonlinear continuous-time systems. In Proc. Am. control conf. (pp. 1568-1573).
    • (2010) Proc. Am. Control Conf. , pp. 1568-1573
    • Dierks, T.1    Jagannathan, S.2
  • 8
    • 0015109409 scopus 로고
    • An iterative technique for the computation of steady state gains for the discrete optimal regulator
    • G.A. Hewer An iterative technique for the computation of steady state gains for the discrete optimal regulator IEEE Transactions on Automatic Control 16 4 1971 382 384
    • (1971) IEEE Transactions on Automatic Control , vol.16 , Issue.4 , pp. 382-384
    • Hewer, G.A.1
  • 9
    • 84865467087 scopus 로고    scopus 로고
    • Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics
    • Y. Jiang, and Z.P. Jiang Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics Automatica 48 2012 2699 2704
    • (2012) Automatica , vol.48 , pp. 2699-2704
    • Jiang, Y.1    Jiang, Z.P.2
  • 11
    • 84867400046 scopus 로고    scopus 로고
    • Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems
    • J.Y. Lee, J.B. Park, and Y.H. Choi Integral Q-learning and explorized policy iteration for adaptive optimal control of continuous-time linear systems Automatica 48 2012 2850 2859
    • (2012) Automatica , vol.48 , pp. 2850-2859
    • Lee, J.Y.1    Park, J.B.2    Choi, Y.H.3
  • 13
    • 79551685808 scopus 로고    scopus 로고
    • Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data
    • F.L. Lewis, and K. Vamvoudakis Reinforcement learning for partially observable dynamic processes: adaptive dynamic programming using measured output data IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics 41 1 2011 14 23
    • (2011) IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics , vol.41 , Issue.1 , pp. 14-23
    • Lewis, F.L.1    Vamvoudakis, K.2
  • 15
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • F.L. Lewis, and D. Vrabie Reinforcement learning and adaptive dynamic programming for feedback control IEEE Circuits and Systems Magazine 9 3 2009 32 50
    • (2009) IEEE Circuits and Systems Magazine , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 17
    • 84883537695 scopus 로고    scopus 로고
    • Reinforcement learning and feedback control using natural decision methods to design optimal adaptive controllers
    • F.L. Lewis, D. Vrabie, and K.G. Vamvoudakis Reinforcement learning and feedback control using natural decision methods to design optimal adaptive controllers IEEE Systems Magazine 32 6 2012 76 105
    • (2012) IEEE Systems Magazine , vol.32 , Issue.6 , pp. 76-105
    • Lewis, F.L.1    Vrabie, D.2    Vamvoudakis, K.G.3
  • 20
    • 58349110975 scopus 로고    scopus 로고
    • Adaptive optimal control for continuous-time linear systems based on policy iteration
    • D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F.L. Lewis Adaptive optimal control for continuous-time linear systems based on policy iteration Automatica 45 2009 477 484
    • (2009) Automatica , vol.45 , pp. 477-484
    • Vrabie, D.1    Pastravanu, O.2    Abu-Khalaf, M.3    Lewis, F.L.4
  • 21
    • 82755160758 scopus 로고    scopus 로고
    • Finite-horizon neurooptimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach
    • D. Wang, D. Liu, and Q. Wei Finite-horizon neurooptimal tracking control for a class of discrete-time nonlinear systems using adaptive dynamic programming approach Neurocomputing 78 2012 14 22
    • (2012) Neurocomputing , vol.78 , pp. 14-22
    • Wang, D.1    Liu, D.2    Wei, Q.3
  • 23
    • 0024888479 scopus 로고
    • Neural networks for control and system identification
    • Werbos, P. J. (1989). Neural networks for control and system identification. In: Proc. of CDC (pp. 260-265).
    • (1989) Proc. of CDC , pp. 260-265
    • Werbos, P.J.1
  • 24
    • 0002011091 scopus 로고
    • A menu of designs for reinforcement learning over time
    • MIT Press Cambridge, MA
    • P.J. Werbos A menu of designs for reinforcement learning over time Neural networks for control 1991 MIT Press Cambridge, MA 67 95
    • (1991) Neural Networks for Control , pp. 67-95
    • Werbos, P.J.1
  • 25
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • D.A. White, D.A. Sofge, Van Nostrand Reinhold New York
    • P.J. Werbos Approximate dynamic programming for real-time control and neural modeling D.A. White, D.A. Sofge, Handbook of intelligent control 1992 Van Nostrand Reinhold New York
    • (1992) Handbook of Intelligent Control
    • Werbos, P.J.1
  • 27
    • 83655163786 scopus 로고    scopus 로고
    • Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
    • H. Zhang, L. Cui, X. Zhang, and X. Luo Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method IEEE Transactions on Neural Networks 22 2011 2226 2236
    • (2011) IEEE Transactions on Neural Networks , vol.22 , pp. 2226-2236
    • Zhang, H.1    Cui, L.2    Zhang, X.3    Luo, X.4
  • 29
    • 78650805234 scopus 로고    scopus 로고
    • An iterative approximate dynamic programming method to solve for a class of nonlinear zero-sum differential games
    • H. Zhang, Q. Wei, and D. Liu An iterative approximate dynamic programming method to solve for a class of nonlinear zero-sum differential games Automatica 47 1 2011 207 214
    • (2011) Automatica , vol.47 , Issue.1 , pp. 207-214
    • Zhang, H.1    Wei, Q.2    Liu, D.3
  • 30
    • 49049119493 scopus 로고    scopus 로고
    • A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm
    • H. Zhang, Q. Wei, and Y. Luo A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 38 2008 937 942
    • (2008) IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics , vol.38 , pp. 937-942
    • Zhang, H.1    Wei, Q.2    Luo, Y.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.