메뉴 건너뛰기




Volumn 26, Issue 1, 2015, Pages 140-151

Actor-critic-based optimal tracking for partially unknown nonlinear discrete-time systems

Author keywords

Actor critic algorithm; discrete time (DT) nonlinear optimal tracking; input constraints; neural network (NN); reinforcement learning (RL)

Indexed keywords

DIGITAL CONTROL SYSTEMS; DISCRETE TIME CONTROL SYSTEMS; DYNAMIC PROGRAMMING; DYNAMICS; LEARNING ALGORITHMS; NAVIGATION; OPTIMIZATION; SOCIAL NETWORKING (ONLINE);

EID: 84919687575     PISSN: 2162237X     EISSN: 21622388     Source Type: Journal    
DOI: 10.1109/TNNLS.2014.2358227     Document Type: Article
Times cited : (297)

References (47)
  • 6
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • Aug.
    • F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control," IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32-50, Aug. 2009.
    • (2009) IEEE Circuits Syst. Mag. , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 7
    • 84883537695 scopus 로고    scopus 로고
    • Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers
    • Dec.
    • F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, "Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers," IEEE Control Syst., vol. 32, no. 6, pp. 76-105, Dec. 2012.
    • (2012) IEEE Control Syst. , vol.32 , Issue.6 , pp. 76-105
    • Lewis, F.L.1    Vrabie, D.2    Vamvoudakis, K.G.3
  • 9
    • 84886388184 scopus 로고    scopus 로고
    • An adaptive recurrent neural-network controller using a stabilization matrix and predictive inputs to solve a tracking problem under disturbances
    • Jan.
    • M. Fairbank, S. Li, X. Fu, E. Alonso, and D. Wunsch, "An adaptive recurrent neural-network controller using a stabilization matrix and predictive inputs to solve a tracking problem under disturbances," Neural Netw., vol. 49, pp. 74-86, Jan. 2014.
    • (2014) Neural Netw. , vol.49 , pp. 74-86
    • Fairbank, M.1    Li, S.2    Fu, X.3    Alonso, E.4    Wunsch, D.5
  • 10
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
    • Apr.
    • D. Vrabie and F. Lewis, "Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems," Neural Netw., vol. 22, no. 3, pp. 237-246, Apr. 2009.
    • (2009) Neural Netw. , vol.22 , Issue.3 , pp. 237-246
    • Vrabie, D.1    Lewis, F.2
  • 11
    • 84887472008 scopus 로고    scopus 로고
    • Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics
    • Dec.
    • D. Liu, X. Yang, and H. Li, "Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics," Neural Comput. Appl., vol. 23, nos. 7-8, pp. 1843-1850, Dec. 2013.
    • (2013) Neural Comput. Appl. , vol.23 , Issue.7-8 , pp. 1843-1850
    • Liu, D.1    Yang, X.2    Li, H.3
  • 12
    • 84871319455 scopus 로고    scopus 로고
    • A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
    • Jan.
    • S. Bhasin, R. Kamalapurkar, M. Johnson, K. Vamvoudakis, F. L. Lewis, and W. Dixon, "A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems," Automatica, vol. 49, no. 1, pp. 82-92, Jan. 2013.
    • (2013) Automatica , vol.49 , Issue.1 , pp. 82-92
    • Bhasin, S.1    Kamalapurkar, R.2    Johnson, M.3    Vamvoudakis, K.4    Lewis, F.L.5    Dixon, W.6
  • 13
    • 49049089962 scopus 로고    scopus 로고
    • Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
    • Aug.
    • A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 943-949, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.38 , Issue.4 , pp. 943-949
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 14
    • 0031236002 scopus 로고    scopus 로고
    • Adaptive critic designs
    • Sep.
    • D. V. Prokhorov and D. C. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997.
    • (1997) IEEE Trans. Neural Netw. , vol.8 , Issue.5 , pp. 997-1007
    • Prokhorov, D.V.1    Wunsch, D.C.2
  • 15
    • 50049091526 scopus 로고    scopus 로고
    • Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators
    • Y. Luo and H. Zhang, "Approximate optimal control for a class of nonlinear discrete-time systems with saturating actuators," Prog. Natural Sci., vol. 18, no. 8, pp. 1023-1029, 2008.
    • (2008) Prog. Natural Sci. , vol.18 , Issue.8 , pp. 1023-1029
    • Luo, Y.1    Zhang, H.2
  • 16
    • 79960462685 scopus 로고    scopus 로고
    • Online optimal control of nonlinear discrete-time systems using approximate dynamic programming
    • T. Dierks and S. Jagannathan, "Online optimal control of nonlinear discrete-time systems using approximate dynamic programming," J. Control Theory Appl., vol. 9, no. 3, pp. 361-369, 2011.
    • (2011) J. Control Theory Appl. , vol.9 , Issue.3 , pp. 361-369
    • Dierks, T.1    Jagannathan, S.2
  • 17
    • 0035273403 scopus 로고    scopus 로고
    • Online learning control by association and reinforcement
    • Mar.
    • J. Si and Y.-T. Wang, "Online learning control by association and reinforcement," IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264-276, Mar. 2001.
    • (2001) IEEE Trans. Neural Netw. , vol.12 , Issue.2 , pp. 264-276
    • Si, J.1    Wang, Y.-T.2
  • 18
    • 0002011091 scopus 로고
    • A menu of designs for reinforcement learning over time
    • Cambridge, MA, USA: MIT Press
    • P. J. Werbos, "A menu of designs for reinforcement learning over time," in Neural Networks for Control. Cambridge, MA, USA: MIT Press, 1991.
    • (1991) Neural Networks for Control
    • Werbos, P.J.1
  • 19
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • D. A. White and D. A. Sofge, Eds. New York, NY, USA: Reinhold
    • P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," in Handbook of Intelligent Control, D. A. White and D. A. Sofge, Eds. New York, NY, USA: Reinhold, 1992.
    • (1992) Handbook of Intelligent Control
    • Werbos, P.J.1
  • 20
    • 0024888479 scopus 로고
    • Neural networks for control and system identification
    • Dec.
    • P. J. Werbos, "Neural networks for control and system identification," in Proc. 28th IEEE CDC, Dec. 1989, pp. 260-265.
    • (1989) Proc. 28th IEEE CDC , pp. 260-265
    • Werbos, P.J.1
  • 22
    • 84902352795 scopus 로고    scopus 로고
    • Data-driven neuro-optimal temperature control of water-gas shift reaction using stable iterative adaptive dynamic programming
    • Nov.
    • Q. Wei and D. Liu, "Data-driven neuro-optimal temperature control of water-gas shift reaction using stable iterative adaptive dynamic programming," IEEE Trans. Ind. Electron., vol. 61, no. 11, pp. 6399-6408, Nov. 2014.
    • (2014) IEEE Trans. Ind. Electron. , vol.61 , Issue.11 , pp. 6399-6408
    • Wei, Q.1    Liu, D.2
  • 23
    • 84912130786 scopus 로고    scopus 로고
    • A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems
    • to be published
    • Q. Wei and D. Liu, "A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems," IEEE Trans. Autom. Sci. Eng., to be published.
    • IEEE Trans. Autom. Sci. Eng.
    • Wei, Q.1    Liu, D.2
  • 24
    • 84897594646 scopus 로고    scopus 로고
    • Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems
    • Mar.
    • D. Liu and Q. Wei, "Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems," IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621-634, Mar. 2014.
    • (2014) IEEE Trans. Neural Netw. Learn. Syst. , vol.25 , Issue.3 , pp. 621-634
    • Liu, D.1    Wei, Q.2
  • 25
    • 84881555023 scopus 로고    scopus 로고
    • Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems
    • Apr.
    • D. Liu and Q. Wei, "Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems," IEEE Trans. Cybern., vol. 43, no. 2, pp. 779-789, Apr. 2013.
    • (2013) IEEE Trans. Cybern. , vol.43 , Issue.2 , pp. 779-789
    • Liu, D.1    Wei, Q.2
  • 26
    • 84906781179 scopus 로고    scopus 로고
    • Adaptive dynamic programming for a class of complex-valued nonlinear systems
    • Sep.
    • R. Song, W. Xiao, H. Zhang, and C. Sun, "Adaptive dynamic programming for a class of complex-valued nonlinear systems," IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 9, pp. 1733-1739, Sep. 2014.
    • (2014) IEEE Trans. Neural Netw. Learn. Syst. , vol.25 , Issue.9 , pp. 1733-1739
    • Song, R.1    Xiao, W.2    Zhang, H.3    Sun, C.4
  • 27
    • 84904706555 scopus 로고    scopus 로고
    • Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics
    • Aug.
    • D. Liu, H. Li, and D. Wang, "Online synchronous approximate optimal learning algorithm for multi-player non-zero-sum games with unknown dynamics," IEEE Trans. Syst., Man, Cybern., Syst., vol. 44, no. 8, pp. 1015-1027, Aug. 2014.
    • (2014) IEEE Trans. Syst., Man, Cybern., Syst. , vol.44 , Issue.8 , pp. 1015-1027
    • Liu, D.1    Li, H.2    Wang, D.3
  • 28
    • 84893640946 scopus 로고    scopus 로고
    • Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach
    • Feb.
    • D. Liu, D. Wang, and H. Li, "Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach," IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 418-428, Feb. 2014.
    • (2014) IEEE Trans. Neural Netw. Learn. Syst. , vol.25 , Issue.2 , pp. 418-428
    • Liu, D.1    Wang, D.2    Li, H.3
  • 29
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    • M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.L.2
  • 30
    • 84885176157 scopus 로고    scopus 로고
    • Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks
    • Oct.
    • H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, "Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 10, pp. 1513-1525, Oct. 2013.
    • (2013) IEEE Trans. Neural Netw. Learn. Syst. , vol.24 , Issue.10 , pp. 1513-1525
    • Modares, H.1    Lewis, F.L.2    Naghibi-Sistani, M.-B.3
  • 31
    • 84893708995 scopus 로고    scopus 로고
    • Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems
    • H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, "Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems," Automatica, vol. 50, no. 1, pp. 193-202, 2014.
    • (2014) Automatica , vol.50 , Issue.1 , pp. 193-202
    • Modares, H.1    Lewis, F.L.2    Naghibi-Sistani, M.-B.3
  • 32
    • 49049119493 scopus 로고    scopus 로고
    • A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm
    • Aug.
    • H. Zhang, Q. Wei, and Y. Luo, "A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 937-942, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.38 , Issue.4 , pp. 937-942
    • Zhang, H.1    Wei, Q.2    Luo, Y.3
  • 33
    • 77950853735 scopus 로고    scopus 로고
    • Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics
    • Dec.
    • T. Dierks and S. Jagannathan, "Optimal tracking control of affine nonlinear discrete-time systems with unknown internal dynamics," in Proc. 48th IEEE CDC, Dec. 2009, pp. 6750-6755.
    • (2009) Proc. 48th IEEE CDC , pp. 6750-6755
    • Dierks, T.1    Jagannathan, S.2
  • 34
    • 84888030007 scopus 로고    scopus 로고
    • Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm
    • Feb.
    • Y. Huang and D. Liu, "Neural-network-based optimal tracking control scheme for a class of unknown discrete-time nonlinear systems using iterative ADP algorithm," Neurocomputing, vol. 125, pp. 46-56, Feb. 2014.
    • (2014) Neurocomputing , vol.125 , pp. 46-56
    • Huang, Y.1    Liu, D.2
  • 35
    • 83655163786 scopus 로고    scopus 로고
    • Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
    • Dec.
    • H. Zhang, L. Cui, X. Zhang, and X. Luo, "Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method," IEEE Trans. Neural Netw., vol. 22, no. 12, pp. 2226-2236, Dec. 2011.
    • (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.12 , pp. 2226-2236
    • Zhang, H.1    Cui, L.2    Zhang, X.3    Luo, X.4
  • 36
    • 84912136508 scopus 로고    scopus 로고
    • Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification
    • to be published
    • Q. Wei and D. Liu, "Adaptive dynamic programming for optimal tracking control of unknown nonlinear systems with application to coal gasification," IEEE Trans. Autom. Sci. Eng., to be published.
    • IEEE Trans. Autom. Sci. Eng.
    • Wei, Q.1    Liu, D.2
  • 37
    • 84898853127 scopus 로고    scopus 로고
    • Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics
    • B. Kiumarsi, F. L. Lewis, H. Modares, A. Karimpour, and M.-B. Naghibi-Sistani, "Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics," Automatica, vol. 50, no. 4, pp. 1167-1175, 2014.
    • (2014) Automatica , vol.50 , Issue.4 , pp. 1167-1175
    • Kiumarsi, B.1    Lewis, F.L.2    Modares, H.3    Karimpour, A.4    Naghibi-Sistani, M.-B.5
  • 38
    • 84902308118 scopus 로고    scopus 로고
    • Optimal tracking control for linear discrete-time systems using reinforcement learning
    • Florence, Italy, Dec.
    • B. Kiumarsi-Khomartash, F. L. Lewis, M.-B. Naghibi-Sistani, and A. Karimpour, "Optimal tracking control for linear discrete-time systems using reinforcement learning," in Proc. IEEE 52nd Annu. CDC, Florence, Italy, Dec. 2013, pp. 3845-3850.
    • (2013) Proc. IEEE 52nd Annu. CDC , pp. 3845-3850
    • Kiumarsi-Khomartash, B.1    Lewis, F.L.2    Naghibi-Sistani, M.-B.3    Karimpour, A.4
  • 40
    • 84881324637 scopus 로고    scopus 로고
    • Optimal control of nonlinear continuous-time systems: Design of bounded controllers via generalized nonquadratic functionals
    • Jun.
    • S. E. Lyshevski, "Optimal control of nonlinear continuous-time systems: Design of bounded controllers via generalized nonquadratic functionals," in Proc. IEEE ACC, Jun. 1998, pp. 205-209.
    • (1998) Proc. IEEE ACC , pp. 205-209
    • Lyshevski, S.E.1
  • 41
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
    • (2000) Neural Comput. , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 42
    • 84919605868 scopus 로고    scopus 로고
    • Ph.D. dissertation Dept. Comput. Sci., City Univ. London, London, U.K.
    • M. Fairbank, "Value-gradient learning," Ph.D. dissertation, Dept. Comput. Sci., City Univ. London, London, U.K., 2014.
    • (2014) Value-gradient Learning
    • Fairbank, M.1
  • 44
    • 77950630017 scopus 로고    scopus 로고
    • Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
    • K. G. Vamvodakis and F. L. Lewis, "Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem," Automatica, vol. 46, no. 5, pp. 878-888, 2010.
    • (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
    • Vamvodakis, K.G.1    Lewis, F.L.2
  • 46
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • L. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. 12th Int. Conf. Mach. Learn., 1995, pp. 30-37.
    • (1995) Proc. 12th Int. Conf. Mach. Learn. , pp. 30-37
    • Baird, L.1
  • 47
    • 84893557286 scopus 로고    scopus 로고
    • Stability of direct heuristic dynamic programming for nonlinear tracking control using PID neural network
    • Dallas, TX, USA, Aug.
    • X. Luo and J. Si, "Stability of direct heuristic dynamic programming for nonlinear tracking control using PID neural network," in Proc. IJCNN, Dallas, TX, USA, Aug. 2013, pp. 1-7.
    • (2013) Proc. IJCNN , pp. 1-7
    • Luo, X.1    Si, J.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.