메뉴 건너뛰기




Volumn 45, Issue 1, 2015, Pages 65-76

Off-policy reinforcement learning for H∞ control design

Author keywords

H control design; Hamilton Jacobi Isaacs equation; Neural network; Off policy learning; Reinforcement learning

Indexed keywords

AIRCRAFT CONTROL; DESIGN; LEAST SQUARES APPROXIMATIONS; NEURAL NETWORKS; NONLINEAR EQUATIONS; PARTIAL DIFFERENTIAL EQUATIONS;

EID: 84919730591     PISSN: 21682267     EISSN: None     Source Type: Journal    
DOI: 10.1109/TCYB.2014.2319577     Document Type: Article
Times cited : (338)

References (64)
  • 6
    • 34547133970 scopus 로고    scopus 로고
    • Robust/optimal temperature profile control of a high-speed aerospace vehicle using neural networks
    • Jul.
    • V. Yadav, R. Padhi, and S. Balakrishnan, "Robust/optimal temperature profile control of a high-speed aerospace vehicle using neural networks," IEEE Trans. Neural Netw., vol. 18, no. 4, pp. 1115-1128, Jul. 2007.
    • (2007) IEEE Trans. Neural Netw. , vol.18 , Issue.4 , pp. 1115-1128
    • Yadav, V.1    Padhi, R.2    Balakrishnan, S.3
  • 7
    • 49049119493 scopus 로고    scopus 로고
    • A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm
    • Aug.
    • H. Zhang, Q. Wei, and Y. Luo, "A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 937-942, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.38 , Issue.4 , pp. 937-942
    • Zhang, H.1    Wei, Q.2    Luo, Y.3
  • 8
    • 58349110975 scopus 로고    scopus 로고
    • Adaptive optimal control for continuous-time linear systems based on policy iteration
    • D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuous-time linear systems based on policy iteration," Automatica, vol. 45, no. 2, pp. 477-484, 2009.
    • (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
    • Vrabie, D.1    Pastravanu, O.2    Abu-Khalaf, M.3    Lewis, F.L.4
  • 9
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuoustime direct adaptive optimal control for partially unknown nonlinear systems
    • D. Vrabie and F. L. Lewis, "Neural network approach to continuoustime direct adaptive optimal control for partially unknown nonlinear systems," Neural Netw., vol. 22, no. 3, pp. 237-246, 2009.
    • (2009) Neural Netw. , vol.22 , Issue.3 , pp. 237-246
    • Vrabie, D.1    Lewis, F.L.2
  • 10
    • 70349253929 scopus 로고    scopus 로고
    • Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints
    • Sep.
    • H. Zhang, Y. Luo, and D. Liu, "Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints," IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490-1503, Sep. 2009.
    • (2009) IEEE Trans. Neural Netw. , vol.20 , Issue.9 , pp. 1490-1503
    • Zhang, H.1    Luo, Y.2    Liu, D.3
  • 11
    • 83655163786 scopus 로고    scopus 로고
    • Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method
    • Dec.
    • H. Zhang, L. Cui, X. Zhang, and Y. Luo, "Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method," IEEE Trans. Neural Netw., vol. 22, no. 12, pp. 2226-2236, Dec. 2011.
    • (2011) IEEE Trans. Neural Netw. , vol.22 , Issue.12 , pp. 2226-2236
    • Zhang, H.1    Cui, L.2    Zhang, X.3    Luo, Y.4
  • 12
    • 84864324494 scopus 로고    scopus 로고
    • Online policy iteration algorithm for optimal control of linear hyperbolic PDE systems
    • B. Luo and H.-N. Wu, "Online policy iteration algorithm for optimal control of linear hyperbolic PDE systems," J. Process Control, vol. 22, no. 7, pp. 1161-1170, 2012.
    • (2012) J. Process Control , vol.22 , Issue.7 , pp. 1161-1170
    • Luo, B.1    Wu, H.-N.2
  • 14
    • 84863467146 scopus 로고    scopus 로고
    • Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming
    • Jul.
    • D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, "Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming," IEEE Trans. Autom. Sci. Eng., vol. 9, no. 3, pp. 628-634, Jul. 2012.
    • (2012) IEEE Trans. Autom. Sci. Eng. , vol.9 , Issue.3 , pp. 628-634
    • Liu, D.1    Wang, D.2    Zhao, D.3    Wei, Q.4    Jin, N.5
  • 15
    • 84869489097 scopus 로고    scopus 로고
    • Approximate optimal control design for nonlinear one-dimensional parabolic PDE systems using empirical eigenfunctions and neural network
    • Dec.
    • B. Luo and H.-N. Wu, "Approximate optimal control design for nonlinear one-dimensional parabolic PDE systems using empirical eigenfunctions and neural network," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 42, no. 6, pp. 1538-1549, Dec. 2012.
    • (2012) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.42 , Issue.6 , pp. 1538-1549
    • Luo, B.1    Wu, H.-N.2
  • 16
    • 84863856475 scopus 로고    scopus 로고
    • Heuristic dynamic programming algorithm for optimal control design of linear continuous-time hyperbolic PDE systems
    • H.-N. Wu and B. Luo, "Heuristic dynamic programming algorithm for optimal control design of linear continuous-time hyperbolic PDE systems," Ind. Eng. Chem. Res., vol. 51, no. 27, pp. 9310-9319, 2012.
    • (2012) Ind. Eng. Chem. Res. , vol.51 , Issue.27 , pp. 9310-9319
    • Wu, H.-N.1    Luo, B.2
  • 17
    • 84881555023 scopus 로고    scopus 로고
    • Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems
    • Apr.
    • D. Liu and Q. Wei, "Finite-approximation-error-based optimal control approach for discrete-time nonlinear systems," IEEE Trans. Cybern., vol. 43, no. 2, pp. 779-789, Apr. 2013.
    • (2013) IEEE Trans. Cybern. , vol.43 , Issue.2 , pp. 779-789
    • Liu, D.1    Wei, Q.2
  • 18
    • 84961643244 scopus 로고    scopus 로고
    • A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems
    • to be published
    • Q. Wei and D. Liu, "A novel iterative θ-adaptive dynamic programming for discrete-time nonlinear systems," IEEE Trans. Autom. Sci. Eng., 2013, to be published.
    • (2013) IEEE Trans. Autom. Sci. Eng.
    • Wei, Q.1    Liu, D.2
  • 19
    • 84893640946 scopus 로고    scopus 로고
    • Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach
    • Feb.
    • D. Liu, D. Wang, and H. Li, "Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach," IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 418-428, Feb. 2014.
    • (2014) IEEE Trans. Neural Netw. Learn. Syst. , vol.25 , Issue.2 , pp. 418-428
    • Liu, D.1    Wang, D.2    Li, H.3
  • 21
    • 84893708995 scopus 로고    scopus 로고
    • Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems
    • H. Modares, F. L. Lewis, and M.-B. Naghibi-Sistani, "Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems," Automatica, vol. 50, no. 1, pp. 193-202, 2014.
    • (2014) Automatica , vol.50 , Issue.1 , pp. 193-202
    • Modares, H.1    Lewis, F.L.2    Naghibi-Sistani, M.-B.3
  • 22
    • 84897594646 scopus 로고    scopus 로고
    • Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems
    • Mar.
    • D. Liu and Q. Wei, "Policy iteration adaptive dynamic programming algorithm for discrete-time nonlinear systems," IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 3, pp. 621-634, Mar. 2014.
    • (2014) IEEE Trans. Neural Netw. Learn. Syst. , vol.25 , Issue.3 , pp. 621-634
    • Liu, D.1    Wei, Q.2
  • 23
    • 84988290534 scopus 로고    scopus 로고
    • Data-based suboptimal neuro-control design with reinforcement learning for dissipative spatially distributed processes
    • to be published
    • B. Luo, H.-N. Wu, and H.-X. Li, "Data-based suboptimal neuro-control design with reinforcement learning for dissipative spatially distributed processes," Ind. Eng. Chem. Res., 2014, to be published.
    • (2014) Ind. Eng. Chem. Res.
    • Luo, B.1    Wu, H.-N.2    Li, H.-X.3
  • 28
    • 0029264110 scopus 로고
    • ∞ control via measurement feedback for general nonlinear systems
    • Mar.
    • ∞ control via measurement feedback for general nonlinear systems," IEEE Trans. Autom. Control, vol. 40, no. 3, pp. 466-472, Mar. 1995.
    • (1995) IEEE Trans. Autom. Control , vol.40 , Issue.3 , pp. 466-472
    • Isidori, A.1    Kang, W.2
  • 29
    • 0026927363 scopus 로고
    • ∞-control via measurement feedback in nonlinear systems
    • Sep.
    • ∞-control via measurement feedback in nonlinear systems," IEEE Trans. Autom. Control, vol. 37, no. 9, pp. 1283-1293, Sep. 1992.
    • (1992) IEEE Trans. Autom. Control , vol.37 , Issue.9 , pp. 1283-1293
    • Isidori, A.1    Astolfi, A.2
  • 30
    • 0032202335 scopus 로고    scopus 로고
    • Successive Galerkin approximation algorithms for nonlinear optimal and robust control
    • R. W. Beard, "Successive Galerkin approximation algorithms for nonlinear optimal and robust control," Int. J. Control, vol. 71, no. 5, pp. 717-743, 1998.
    • (1998) Int. J. Control , vol.71 , Issue.5 , pp. 717-743
    • Beard, R.W.1
  • 32
    • 84864463039 scopus 로고    scopus 로고
    • Online solution of nonlinear two-player zero-sum games using synchronous policy iteration
    • K. G. Vamvoudakis and F. L. Lewis, "Online solution of nonlinear two-player zero-sum games using synchronous policy iteration," Int. J. Robust Nonlinear Control, vol. 22, no. 13, pp. 1460-1483, 2012.
    • (2012) Int. J. Robust Nonlinear Control , vol.22 , Issue.13 , pp. 1460-1483
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 34
    • 84876066909 scopus 로고    scopus 로고
    • Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm
    • D. Liu, H. Li, and D. Wang, "Neural-network-based zero-sum game for discrete-time nonlinear systems via iterative adaptive dynamic programming algorithm," Neurocomputing, vol. 110, no. 13, pp. 92-100, 2013.
    • (2013) Neurocomputing , vol.110 , Issue.13 , pp. 92-100
    • Liu, D.1    Li, H.2    Wang, D.3
  • 35
    • 56549083444 scopus 로고    scopus 로고
    • Analytical approximation methods for the stabilizing solution of the Hamilton-Jacobi equation
    • Nov.
    • N. Sakamoto and A. V. D. Schaft, "Analytical approximation methods for the stabilizing solution of the Hamilton-Jacobi equation," IEEE Trans. Autom. Control, vol. 53, no. 10, pp. 2335-2350, Nov. 2008.
    • (2008) IEEE Trans. Autom. Control , vol.53 , Issue.10 , pp. 2335-2350
    • Sakamoto, N.1    Schaft, A.V.D.2
  • 36
    • 0018441647 scopus 로고
    • An approximation theory of optimal control for trainable manipulators
    • Mar.
    • G. N. Saridis and C.-S. G. Lee, "An approximation theory of optimal control for trainable manipulators," IEEE Trans. Syst., Man, Cybern., vol. 9, no. 3, pp. 152-159, Mar. 1979.
    • (1979) IEEE Trans. Syst., Man, Cybern. , vol.9 , Issue.3 , pp. 152-159
    • Saridis, G.N.1    Lee, C.-S.G.2
  • 37
    • 0031332446 scopus 로고    scopus 로고
    • Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
    • R. W. Beard, G. N. Saridis, and J. T. Wen, "Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation," Automatica, vol. 33, no. 12, pp. 2159-2177, 1997.
    • (1997) Automatica , vol.33 , Issue.12 , pp. 2159-2177
    • Beard, R.W.1    Saridis, G.N.2    Wen, J.T.3
  • 38
    • 0032387028 scopus 로고    scopus 로고
    • Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation
    • R. W. Beard, G. Saridis, and J. Wen, "Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation," J. Optim. Theory Appl., vol. 96, no. 3, pp. 589-626, 1998.
    • (1998) J. Optim. Theory Appl. , vol.96 , Issue.3 , pp. 589-626
    • Beard, R.W.1    Saridis, G.2    Wen, J.3
  • 39
    • 84890058601 scopus 로고    scopus 로고
    • Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks
    • Dec.
    • S. Mehraeen, T. Dierks, S. Jagannathan, and M. L. Crow, "Zero-sum two-player game theoretic formulation of affine nonlinear discrete-time systems using neural networks," IEEE Trans. Cybern., vol. 43, no. 6, pp. 1641-1655, Dec. 2013.
    • (2013) IEEE Trans. Cybern. , vol.43 , Issue.6 , pp. 1641-1655
    • Mehraeen, S.1    Dierks, T.2    Jagannathan, S.3    Crow, M.L.4
  • 40
    • 48949116222 scopus 로고    scopus 로고
    • Neurodynamic programming and zero-sum games for constrained control systems
    • Jul.
    • M. Abu-Khalaf, F. L. Lewis, and J. Huang, "Neurodynamic programming and zero-sum games for constrained control systems," IEEE Trans. Neural Netw., vol. 19, no. 7, pp. 1243-1252, Jul. 2008.
    • (2008) IEEE Trans. Neural Netw. , vol.19 , Issue.7 , pp. 1243-1252
    • Abu-Khalaf, M.1    Lewis, F.L.2    Huang, J.3
  • 42
    • 78650805234 scopus 로고    scopus 로고
    • An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games
    • H. Zhang, Q. Wei, and D. Liu, "An iterative adaptive dynamic programming method for solving a class of nonlinear zero-sum differential games," Automatica, vol. 47, no. 1, pp. 207-214, 2011.
    • (2011) Automatica , vol.47 , Issue.1 , pp. 207-214
    • Zhang, H.1    Wei, Q.2    Liu, D.3
  • 43
    • 77950630017 scopus 로고    scopus 로고
    • Online actor - Critic algorithm to solve the continuous-time infinite horizon optimal control problem
    • K. G. Vamvoudakis and F. L. Lewis, "Online actor - Critic algorithm to solve the continuous-time infinite horizon optimal control problem," Automatica, vol. 46, no. 5, pp. 878-888, 2010.
    • (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 44
    • 84876816423 scopus 로고    scopus 로고
    • ∞ state feedback control with Galerkin's method
    • ∞ state feedback control with Galerkin's method," Int. J. Robust Nonlinear Control, vol. 23, no. 9, pp. 991-1012, 2013.
    • (2013) Int. J. Robust Nonlinear Control , vol.23 , Issue.9 , pp. 991-1012
    • Luo, B.1    Wu, H.-N.2
  • 46
    • 84870062175 scopus 로고    scopus 로고
    • ∞ state feedback control
    • Feb.
    • ∞ state feedback control," Inf. Sci., vol. 222, pp. 472-485, Feb. 2013.
    • (2013) Inf. Sci. , vol.222 , pp. 472-485
    • Wu, H.-N.1    Luo, B.2
  • 47
    • 79960443754 scopus 로고    scopus 로고
    • Adaptive dynamic programming for online solution of a zero-sum differential game
    • D. Vrabie and F. Lewis, "Adaptive dynamic programming for online solution of a zero-sum differential game," J. Control Theory Appl., vol. 9, no. 3, pp. 353-360, 2011.
    • (2011) J. Control Theory Appl. , vol.9 , Issue.3 , pp. 353-360
    • Vrabie, D.1    Lewis, F.2
  • 49
    • 84885835001 scopus 로고    scopus 로고
    • Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using singlenetwork ADP
    • Feb.
    • H. Zhang, L. Cui, and Y. Luo, "Near-optimal control for nonzero-sum differential games of continuous-time nonlinear systems using singlenetwork ADP," IEEE Trans. Cybern., vol. 43, no. 1, pp. 206-216, Feb. 2013.
    • (2013) IEEE Trans. Cybern. , vol.43 , Issue.1 , pp. 206-216
    • Zhang, H.1    Cui, L.2    Luo, Y.3
  • 50
  • 53
    • 0011636441 scopus 로고
    • A new algorithm for adaptive multidimensional integration
    • G. Peter Lepage, "A new algorithm for adaptive multidimensional integration," J. Comput. Phys., vol. 27, no. 2, pp. 192-203, 1978.
    • (1978) J. Comput. Phys. , vol.27 , Issue.2 , pp. 192-203
    • Lepage, G.P.1
  • 56
    • 84914965022 scopus 로고
    • On an iterative technique for Riccati equation computations
    • Feb.
    • D. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Autom. Control, vol. 13, no. 1, pp. 114-115, Feb. 1968.
    • (1968) IEEE Trans. Autom. Control , vol.13 , Issue.1 , pp. 114-115
    • Kleinman, D.1
  • 57
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    • M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.L.2
  • 58
    • 39549085591 scopus 로고    scopus 로고
    • Generalized Hamilton-Jacobi-Bellman formulation-based neural network control of affine nonlinear discretetime systems
    • Jan.
    • Z. Chen and S. Jagannathan, "Generalized Hamilton-Jacobi-Bellman formulation-based neural network control of affine nonlinear discretetime systems," IEEE Trans. Neural Netw., vol. 19, no. 1, pp. 90-106, Jan. 2008.
    • (2008) IEEE Trans. Neural Netw. , vol.19 , Issue.1 , pp. 90-106
    • Chen, Z.1    Jagannathan, S.2
  • 59
    • 79960897012 scopus 로고    scopus 로고
    • Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
    • K. G. Vamvoudakis and F. L. Lewis, "Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations," Automatica, vol. 47, no. 8, pp. 1556-1569, 2011.
    • (2011) Automatica , vol.47 , Issue.8 , pp. 1556-1569
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 60
    • 4644328593 scopus 로고    scopus 로고
    • Off-policy temporal-difference learning with function approximation
    • D. Precup, R. S. Sutton, and S. Dasgupta, "Off-policy temporal-difference learning with function approximation," in Proc. 18th ICML, 2001, pp. 417-424.
    • (2001) Proc. 18th ICML , pp. 417-424
    • Precup, D.1    Sutton, R.S.2    Dasgupta, S.3
  • 61
    • 77956541799 scopus 로고    scopus 로고
    • Toward off-policy learning control with function approximation
    • H. R. Maei, C. Szepesvári, S. Bhatnagar, and R. S. Sutton, "Toward off-policy learning control with function approximation," in Proc. 27th ICML, 2010, pp. 719-726.
    • (2010) Proc. 27th ICML , pp. 719-726
    • Maei, H.R.1    Szepesvári, C.2    Bhatnagar, S.3    Sutton, R.S.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.