메뉴 건너뛰기




Volumn 32, Issue 6, 2012, Pages 76-105

Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE CONTROLLERS; CONTINUOUS-TIME DYNAMICAL SYSTEMS; FEEDBACK CONTROLLER; INDIRECT ADAPTIVE CONTROLLERS; OPTIMAL CONTROL POLICY; OPTIMAL CONTROLLER; PERFORMANCE FUNCTIONS; SYSTEM IDENTIFICATION TECHNIQUES;

EID: 84883537695     PISSN: 1066033X     EISSN: None     Source Type: Journal    
DOI: 10.1109/MCS.2012.2214134     Document Type: Review
Times cited : (955)

References (60)
  • 4
    • 0031213212 scopus 로고    scopus 로고
    • Optimal design of adaptive tracking controllers for non-linear systems
    • PII S0005109897000721
    • Z.-H. Li and M. Krstic, "Optimal design of adaptive tracking controllers for nonlinear systems," Automatica, vol. 33, no. 8, pp. 1459-1473, 1997. (Pubitemid 127392279)
    • (1997) Automatica , vol.33 , Issue.8 , pp. 1459-1473
    • Li, Z.-H.1    Krstic, M.2
  • 7
    • 0002011091 scopus 로고
    • A menu of designs for reinforcement learning over time
    • W. T. Miller, R. S. Sutton, and P. J. Werbos, Eds. Cambridge, MA: MIT Press
    • P. J. Werbos, "A menu of designs for reinforcement learning over time," in Neural Networks for Control, W. T. Miller, R. S. Sutton, and P. J. Werbos, Eds. Cambridge, MA: MIT Press, 1991, pp. 67-95.
    • (1991) Neural Networks for Control , pp. 67-95
    • Werbos, P.J.1
  • 8
    • 49049110053 scopus 로고    scopus 로고
    • Special issue on adaptive dynamic programming and reinforcement learning for feedback control
    • Aug.
    • F. L. Lewis, G. Lendaris, and D. Liu, "Special issue on adaptive dynamic programming and reinforcement learning for feedback control," IEEE Trans. Syst., Man, Cybern. B, vol. 38, no. 4, pp. 896-897, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B , vol.38 , Issue.4 , pp. 896-897
    • Lewis, F.L.1    Lendaris, G.2    Liu, D.3
  • 12
    • 1842684992 scopus 로고    scopus 로고
    • Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioural ecology
    • DOI 10.1016/j.conb.2004.03.017, PII S0959438804000492
    • W. Schultz, "Neural coding of basic reward terms of animal learning theory, game theory, microeconomics and behavioral ecology," Current Opinion Neurobiol., vol. 14, no. 2, pp. 139-147, 2004. (Pubitemid 38479929)
    • (2004) Current Opinion in Neurobiology , vol.14 , Issue.2 , pp. 139-147
    • Schultz, W.1
  • 13
    • 0035422340 scopus 로고    scopus 로고
    • Neural mechanisms for learning and control
    • Aug.
    • K. Doya, H. Kimura, and M. Kawato, "Neural mechanisms for learning and control," IEEE Control Syst. Mag., vol. 21, no. 4, pp. 42-54, Aug. 2000.
    • (2000) IEEE Control Syst. Mag. , vol.21 , Issue.4 , pp. 42-54
    • Doya, K.1    Kimura, H.2    Kawato, M.3
  • 14
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • D. A. White and D. A. Sofge, Eds. New York: Van Nostrand Reinhold
    • P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," in Handbook of Intelligent Control, D. A. White and D. A. Sofge, Eds. New York: Van Nostrand Reinhold, 1992.
    • (1992) Handbook of Intelligent Control
    • Werbos, P.J.1
  • 15
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuoustime direct adaptive optimal control for partially-unknown nonlinear systems
    • Apr.
    • D. Vrabie and F. L. Lewis, "Neural network approach to continuoustime direct adaptive optimal control for partially-unknown nonlinear systems," Neural Netw., vol. 22, no. 3, pp. 237-246, Apr. 2009.
    • (2009) Neural Netw. , vol.22 , Issue.3 , pp. 237-246
    • Vrabie, D.1    Lewis, F.L.2
  • 16
    • 0020970738 scopus 로고
    • Neuron-like adaptive elements that can solve difficult learning control problems
    • Sep./Oct.
    • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuron-like adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. SMC-13, no. 5, pp. 834-846, Sep./Oct. 1983.
    • (1983) IEEE Trans. Syst., Man, Cybern. , vol.SMC-13 , Issue.5 , pp. 834-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 18
    • 0003787146 scopus 로고
    • Princeton, NJ: Princeton Univ. Press
    • R. E. Bellman, Dynamic Programming. Princeton, NJ: Princeton Univ. Press, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 20
    • 0031236002 scopus 로고    scopus 로고
    • Adaptive critic designs
    • Sep.
    • D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997.
    • (1997) IEEE Trans. Neural Netw. , vol.8 , Issue.5 , pp. 997-1007
    • Prokhorov, D.1    Wunsch, D.2
  • 22
    • 49049111594 scopus 로고    scopus 로고
    • Issues on stability of ADP feedback controllers for dynamical systems
    • Aug.
    • S. N. Balakrishnan, J. Ding, and F. L. Lewis, "Issues on stability of ADP feedback controllers for dynamical systems," IEEE Trans. Syst., Man, Cybern. B, vol. 38, no. 4, pp. 913-917, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B , vol.38 , Issue.4 , pp. 913-917
    • Balakrishnan, S.N.1    Ding, J.2    Lewis, F.L.3
  • 23
    • 66449130966 scopus 로고    scopus 로고
    • Adaptive dynamic programming: An introduction
    • May
    • F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Comput, Intell, Mag., vol. 4, no. 2, pp. 39-47, May 2009.
    • (2009) IEEE Comput, Intell, Mag. , vol.4 , Issue.2 , pp. 39-47
    • Wang, F.Y.1    Zhang, H.2    Liu, D.3
  • 24
    • 70349116541 scopus 로고    scopus 로고
    • Reinforcement learning and adaptive dynamic programming for feedback control
    • F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control," IEEE Circuits Syst. Mag., vol. 9, no. 3, pp. 32-50, 2009.
    • (2009) IEEE Circuits Syst. Mag. , vol.9 , Issue.3 , pp. 32-50
    • Lewis, F.L.1    Vrabie, D.2
  • 25
    • 0036641793 scopus 로고    scopus 로고
    • State-constrained agile missile control with adaptive-critic-based neural networks
    • DOI 10.1109/TCST.2002.1014669, PII S1063653602053605
    • D. Han and S. N. Balakrishnan, "State-constrained agile missile control with adaptive-critic-based neural networks," IEEE Trans. Control Syst. Technol., vol. 10, no. 4, pp. 481-489, Jul. 2002. (Pubitemid 34798672)
    • (2002) IEEE Transactions on Control Systems Technology , vol.10 , Issue.4 , pp. 481-489
    • Han, D.1    Balakrishnan, S.N.2
  • 28
    • 0029592634 scopus 로고
    • Adaptive critic designs: A case study for neurocontrol
    • DOI 10.1016/0893-6080(95)00042-9
    • D. Prokhorov, R. A. Santiago, and D. C. Wunsch, II, "Adaptive critic designs: A case study for neurocontrol," Neural Netw., vol. 8, no. 9, pp. 1367-1372, 1995. (Pubitemid 26072896)
    • (1995) Neural Networks , vol.8 , Issue.9 , pp. 1367-1372
    • Prokhorov, D.V.1    Santiago, R.A.2    Wunsch II, D.C.3
  • 30
    • 0042767744 scopus 로고    scopus 로고
    • Helicopter flight control reconfiguration for main rotor actuator failures
    • R. Enns and J. Si, "Helicopter flight control reconfiguration for main rotor actuator failures," AIAA J. Guidance, Control, Dynamics, vol. 26, no. 4, pp. 572-584, 2003.
    • (2003) AIAA J. Guidance, Control, Dynamics , vol.26 , Issue.4 , pp. 572-584
    • Enns, R.1    Si, J.2
  • 31
    • 49049106959 scopus 로고    scopus 로고
    • Direct heuristic dynamic programming method for power system stability enhancement
    • C. Lu, J. Si, and X. Xie, "Direct heuristic dynamic programming method for power system stability enhancement," IEEE Trans. Syst., Man, Cybern. B, vol. 38, no. 4, pp. 1008-1013, 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B , vol.38 , Issue.4 , pp. 1008-1013
    • Lu, C.1    Si, J.2    Xie, X.3
  • 32
  • 34
    • 49049089962 scopus 로고    scopus 로고
    • Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
    • Aug.
    • A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof," IEEE Trans. Syst., Man, Cybern. B, vol. 38, no. 4, pp. 943-949, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B , vol.38 , Issue.4 , pp. 943-949
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 37
    • 58349110975 scopus 로고    scopus 로고
    • Adaptive optimal control for continuous-time linear systems based on policy iteration
    • D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuous-time linear systems based on policy iteration," Automatica, vol. 45, no. 2, pp. 477-484, 2009.
    • (2009) Automatica , vol.45 , Issue.2 , pp. 477-484
    • Vrabie, D.1    Pastravanu, O.2    Abu-Khalaf, M.3    Lewis, F.L.4
  • 39
    • 0022738693 scopus 로고
    • Decentralized learning in finite Markov chains
    • June
    • R. M. Wheeler and K. S. Narendra, "Decentralized learning in finite Markov chains," IEEE Trans. Autom. Control, vol. 31, no. 6, pp. 519-526, June 1986.
    • (1986) IEEE Trans. Autom. Control , vol.31 , Issue.6 , pp. 519-526
    • Wheeler, R.M.1    Narendra, K.S.2
  • 42
    • 0004049893 scopus 로고
    • Ph.D. dissertation, Cambridge University, Cambridge, U.K.
    • C. Watkins, "Learning from delayed rewards," Ph.D. dissertation, Cambridge University, Cambridge, U.K., 1989.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.1
  • 43
    • 34249833101 scopus 로고
    • Q-learning
    • C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3-4, pp. 279-292, 1992.
    • (1992) Mach. Learn. , vol.8 , Issue.3-4 , pp. 279-292
    • Watkins, C.J.C.H.1    Dayan, P.2
  • 44
    • 0015109409 scopus 로고
    • An iterative technique for the computation of the steady state gains for the discrete optimal regulator
    • Aug.
    • G. A. Hewer, "An iterative technique for the computation of the steady state gains for the discrete optimal regulator," IEEE Trans Autom. Control, vol. 16, no. 4, pp. 382-384, Aug. 1971.
    • (1971) IEEE Trans Autom. Control , vol.16 , Issue.4 , pp. 382-384
    • Hewer, G.A.1
  • 47
    • 0026883666 scopus 로고
    • L2-gain analysis of nonlinear systems and nonlinear state feedback H? Control
    • A. J. Van, "L2-gain analysis of nonlinear systems and nonlinear state feedback H? control," IEEE Trans. Autom. Control, vol. 37, no. 6, pp. 770-784, 1992.
    • (1992) IEEE Trans. Autom. Control , vol.37 , Issue.6 , pp. 770-784
    • Van, A.J.1
  • 50
    • 79551685808 scopus 로고    scopus 로고
    • Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data
    • Feb.
    • F. L. Lewis and K. G. Vamvoudakis, "Reinforcement learning for partially observable dynamic processes: Adaptive dynamic programming using measured output data," IEEE Trans. Syst., Man, Cybern. B, vol. 41, no. 1, pp. 14-25, Feb. 2011.
    • (2011) IEEE Trans. Syst., Man, Cybern. B , vol.41 , Issue.1 , pp. 14-25
    • Lewis, F.L.1    Vamvoudakis, K.G.2
  • 51
    • 33845759425 scopus 로고    scopus 로고
    • Policy iterations on the Hamilton-Jacobi-Isaacs equation for H state feedback control with input saturation
    • DOI 10.1109/TAC.2006.884959
    • M. Abu-Khalaf, F. L. Lewis, and J. Huang, "Policy iterations on the Hamilton-Jacobi-Isaacs equation for state feedback control with input saturation," IEEE Trans. Autom. Control, vol. 51, no. 12, pp. 1989-1995, Dec. 2006. (Pubitemid 46002295)
    • (2006) IEEE Transactions on Automatic Control , vol.51 , Issue.12 , pp. 1989-1995
    • Abu-Khalaf, M.1    Lewis, F.L.2    Huang, J.3
  • 53
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
    • (2000) Neural Comput. , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 55
    • 84914965022 scopus 로고
    • On an iterative technique for Riccati equation computations
    • Feb.
    • D. L. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Autom. Control, vol. AC-13, no. 1, pp. 114-115, Feb. 1968.
    • (1968) IEEE Trans. Autom. Control , vol.AC-13 , Issue.1 , pp. 114-115
    • Kleinman, D.L.1
  • 57
    • 77950630017 scopus 로고    scopus 로고
    • Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem
    • K. G. Vamvoudakis and F. L. Lewis, "Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem," Automatica, vol. 46, no. 5, pp. 878-888, 2010.
    • (2010) Automatica , vol.46 , Issue.5 , pp. 878-888
    • Vamvoudakis, K.G.1    Lewis, F.L.2
  • 58
    • 79960443754 scopus 로고    scopus 로고
    • Adaptive dynamic programming for online solution of a zero-sum differential game
    • D. Vrabie and F. L. Lewis, "Adaptive dynamic programming for online solution of a zero-sum differential game," J. Control Theory: Its Appl., vol. 9, no. 3, pp. 353-360, 2011.
    • (2011) J. Control Theory: Its Appl. , vol.9 , Issue.3 , pp. 353-360
    • Vrabie, D.1    Lewis, F.L.2
  • 59
    • 79960897012 scopus 로고    scopus 로고
    • Multi-player non-zero sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations
    • K. G. Vamvoudakis and F. Lewis, "Multi-player non-zero sum games: Online adaptive learning solution of coupled Hamilton-Jacobi equations," Automatica, vol. 47, no. 8, pp. 556-569, 2011.
    • (2011) Automatica , vol.47 , Issue.8 , pp. 556-569
    • Vamvoudakis, K.G.1    Lewis, F.2
  • 60
    • 77955423822 scopus 로고    scopus 로고
    • Model-free H-infinity control design for unknown linear discrete-time systems via Q-learning with LMI
    • Aug.
    • J. H. Kim and F. L. Lewis, "Model-free H-infinity control design for unknown linear discrete-time systems via Q-learning with LMI," Automatica, vol. 46, no. 8, pp. 1320-1326, Aug. 2010.
    • (2010) Automatica , vol.46 , Issue.8 , pp. 1320-1326
    • Kim, J.H.1    Lewis, F.L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.