메뉴 건너뛰기




Volumn 42, Issue 2, 2012, Pages 377-390

Reinforcement learning controller design for affine nonlinear discrete-time systems using online approximators

Author keywords

Adaptive critic; dynamic programming (DP); Lyapunov method; neural networks (NNs); online approximators (OLAs); online learning; reinforcement learning

Indexed keywords

ACTION NETWORK; ADAPTIVE CRITIC; APPROXIMATORS; BALANCING SYSTEM; BOUNDED DISTURBANCES; CONTROLLER DESIGNS; CRITIC NETWORK; HEURISTIC DYNAMIC PROGRAMMING; LEARNING CONTROLLERS; LYAPUNOV; LYAPUNOV THEORIES; MULTI-OUTPUT; MULTIINPUT; NONLINEAR DISCRETE-TIME SYSTEMS; ONLINE LEARNING; OPTIMAL SIGNALS; OUTPUT-FEEDBACK; RADIAL BASIS FUNCTIONS; RECURSIVE EQUATIONS; SEPARATION PRINCIPLE; SYSTEM STATE; TWO-LINK; UNIFORM ULTIMATE BOUNDEDNESS;

EID: 84859001250     PISSN: 10834419     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSMCB.2011.2166384     Document Type: Article
Times cited : (183)

References (31)
  • 2
    • 0039319294 scopus 로고
    • Suboptimal design of intentionally nonlinear controllers
    • Oct
    • Z. V. Rekasius, "Suboptimal design of intentionally nonlinear controllers," IEEE Trans. Autom. Control, vol. AC-9, no. 4, pp. 380-386, Oct. 1964.
    • (1964) IEEE Trans. Autom. Control , vol.AC-9 , Issue.4 , pp. 380-386
    • Rekasius, Z.V.1
  • 3
    • 0002526302 scopus 로고
    • Construction of suboptimal control sequences
    • R. J. Leake and R. Liu, "Construction of suboptimal control sequences," SIAM J. Control Optim., vol. 5, no. 1, pp. 54-63, 1967.
    • (1967) SIAM J. Control Optim. , vol.5 , Issue.1 , pp. 54-63
    • Leake, R.J.1    Liu, R.2
  • 6
    • 0035273403 scopus 로고    scopus 로고
    • On-line learning control by association and reinforcement
    • Mar
    • J. Si and Y. T. Wang, "On-line learning control by association and reinforcement," IEEE Trans. Neural Netw., vol. 12, no. 2, pp. 264-276, Mar. 2001.
    • (2001) IEEE Trans. Neural Netw. , vol.12 , Issue.2 , pp. 264-276
    • Si, J.1    Wang, Y.T.2
  • 9
    • 0020970738 scopus 로고
    • Neuron like adaptive elements that can solve difficult learning control problems
    • Sep./Oct
    • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuron like adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. SMC-13, no. 5, pp. 834-847, Sep./Oct. 1983.
    • (1983) IEEE Trans. Syst., Man, Cybern. , vol.SMC-13 , Issue.5 , pp. 834-847
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 10
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal difference
    • Aug
    • R. S. Sutton, "Learning to predict by the methods of temporal difference," Mach. Learn., vol. 3, no. 1, pp. 9-44, Aug. 1988.
    • (1988) Mach. Learn. , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 11
    • 34249833101 scopus 로고
    • Q-learning
    • May
    • C. J. C. H. Watkins and P. Dayan, "Q-learning," Mach. Learn., vol. 8, no. 3/4, pp. 279-292, May 1992.
    • (1992) Mach. Learn. , vol.8 , Issue.3-4 , pp. 279-292
    • Watkins, C.J.C.H.1    Dayan, P.2
  • 13
    • 0030675610 scopus 로고    scopus 로고
    • Efficient reinforcement learning: Model-based acrobot control
    • Albuquerque, NM
    • G. Boone, "Efficient reinforcement learning: Model-based acrobot control," in Proc. IEEE Int. Conf. Robot. Autom., Albuquerque, NM, 1997, pp. 229-234.
    • (1997) Proc. IEEE Int. Conf. Robot. Autom. , pp. 229-234
    • Boone, G.1
  • 14
    • 13644265156 scopus 로고    scopus 로고
    • Reinforcement learning-based output feedback control of nonlinear systems with input constraints
    • Feb
    • P. He and S. Jagannathan, "Reinforcement learning-based output feedback control of nonlinear systems with input constraints," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 35, no. 1, pp. 150-154, Feb. 2005.
    • (2005) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.35 , Issue.1 , pp. 150-154
    • He, P.1    Jagannathan, S.2
  • 15
    • 70349615619 scopus 로고    scopus 로고
    • Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error
    • Dec
    • L. Yang, J. Si, K. S. Tsakalis, and A. A. Rodriguez, "Direct heuristic dynamic programming for nonlinear tracking control with filtered tracking error," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 39, no. 6, pp. 1617-1622, Dec. 2009.
    • (2009) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.39 , Issue.6 , pp. 1617-1622
    • Yang, L.1    Si, J.2    Tsakalis, K.S.3    Rodriguez, A.A.4
  • 18
    • 0029403793 scopus 로고
    • Stochastic choice of basis functions in adaptive function approximation and the functional-link net
    • Nov
    • B. Igelnik and Y. H. Pao, "Stochastic choice of basis functions in adaptive function approximation and the functional-link net," IEEE Trans. Neural Networks, vol. 6, no. 6, pp. 1320-1329, Nov. 1995.
    • (1995) IEEE Trans. Neural Networks , vol.6 , Issue.6 , pp. 1320-1329
    • Igelnik, B.1    Pao, Y.H.2
  • 19
    • 0031236002 scopus 로고    scopus 로고
    • Adaptive critic designs
    • Sep
    • D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997.
    • (1997) IEEE Trans. Neural Netw. , vol.8 , Issue.5 , pp. 997-1007
    • Prokhorov, D.1    Wunsch, D.2
  • 21
    • 0023169119 scopus 로고
    • Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research
    • Jan
    • P. J.Werbos, "Building and understanding adaptive systems: A statistical/numerical approach to factory automation and brain research," IEEE Trans. Syst., Man, Cybern., vol. SMC-17, no. 1, pp. 7-20, Jan. 1987.
    • (1987) IEEE Trans. Syst., Man, Cybern. , vol.SMC-17 , Issue.1 , pp. 7-20
    • Werbos, P.J.1
  • 22
    • 0002557583 scopus 로고
    • Advanced forecasting methods for global crisis warning and models of intelligence
    • P. J.Werbos, "Advanced forecasting methods for global crisis warning and models of intelligence," Gen. Syst. Yearbook, vol. 22, pp. 25-38, 1977.
    • (1977) Gen. Syst. Yearbook , vol.22 , pp. 25-38
    • Werbos, P.J.1
  • 23
    • 0031281590 scopus 로고    scopus 로고
    • Learning through reinforcement and replicator dynamics
    • Nov
    • T. Borgers and R. Sarin, "Learning through reinforcement and replicator dynamics," J. Economic Theory, vol. 77, no. 1, pp. 1-17, Nov. 1997.
    • (1997) J. Economic Theory , vol.77 , Issue.1 , pp. 1-17
    • Borgers, T.1    Sarin, R.2
  • 24
    • 49049091364 scopus 로고    scopus 로고
    • Control of nonaffine nonlinear discrete-time systems using reinforcement learning-based linearly parameterized neural networks
    • Aug
    • Q. Yang, J. B. Vance, and S. Jagannathan, "Control of nonaffine nonlinear discrete-time systems using reinforcement learning-based linearly parameterized neural networks," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 994-1001, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.38 , Issue.4 , pp. 994-1001
    • Yang, Q.1    Vance, J.B.2    Jagannathan, S.3
  • 25
    • 0032785795 scopus 로고    scopus 로고
    • Discrete-time CMAC control of a feedback linearizable nonlinear systems under a persistence of excitation
    • Jan
    • S. Jagannathan, "Discrete-time CMAC control of a feedback linearizable nonlinear systems under a persistence of excitation," IEEE Trans. Neural Netw., vol. 10, no. 1, pp. 128-137, Jan. 1999.
    • (1999) IEEE Trans. Neural Netw. , vol.10 , Issue.1 , pp. 128-137
    • Jagannathan, S.1
  • 27
    • 79960462685 scopus 로고    scopus 로고
    • Online optimal control of nonlinear discrete-time systems using approximate dynamic programming
    • T. Dierks and S. Jagannathan, "Online optimal control of nonlinear discrete-time systems using approximate dynamic programming," J. Control Theory Appl., vol. 9, no. 3, pp. 361-369, 2011.
    • (2011) J. Control Theory Appl. , vol.9 , Issue.3 , pp. 361-369
    • Dierks, T.1    Jagannathan, S.2
  • 31
    • 49049089962 scopus 로고    scopus 로고
    • Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
    • Aug
    • A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof," IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38, no. 4, pp. 943-949, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern. B, Cybern. , vol.38 , Issue.4 , pp. 943-949
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.