메뉴 건너뛰기




Volumn , Issue , 2011, Pages 142-147

Nonlinear two-player zero-sum game approximate solution using a Policy Iteration algorithm

Author keywords

[No Author keywords available]

Indexed keywords

CONTINUOUS TIME SYSTEMS; GAME THEORY; GRADIENT METHODS; ROBUST CONTROL;

EID: 84860670757     PISSN: 07431546     EISSN: 25762370     Source Type: Conference Proceeding    
DOI: 10.1109/CDC.2011.6160778     Document Type: Conference Paper
Times cited : (37)

References (40)
  • 4
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    • M. Abu-Khalaf and F. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.2
  • 6
    • 0020970738 scopus 로고
    • Neuron-like adaptive elements that can solve difficult learning control problems
    • A. Barto, R. Sutton, and C. Anderson, "Neuron-like adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst. Man Cybern., vol. 13, no. 5, pp. 834-846, 1983.
    • (1983) IEEE Trans. Syst. Man Cybern. , vol.13 , Issue.5 , pp. 834-846
    • Barto, A.1    Sutton, R.2    Anderson, C.3
  • 8
    • 0026852362 scopus 로고
    • Reinforcement learning is direct adaptive optimal control
    • R. Sutton, A. Barto, and R. Williams, "Reinforcement learning is direct adaptive optimal control," IEEE Contr. Syst. Mag., vol. 12, no. 2, pp. 19-22, 1992.
    • (1992) IEEE Contr. Syst. Mag. , vol.12 , Issue.2 , pp. 19-22
    • Sutton, R.1    Barto, A.2    Williams, R.3
  • 9
    • 0033285710 scopus 로고    scopus 로고
    • Adaptive critic neural network for feedforward compensation
    • J. Campos and F. Lewis, "Adaptive critic neural network for feedforward compensation," in Proc. Am. Control Conf., vol. 4, 1999.
    • (1999) Proc. Am. Control Conf. , vol.4
    • Campos, J.1    Lewis, F.2
  • 10
    • 33847648898 scopus 로고    scopus 로고
    • Adaptive critic designs for discrete-time zero-sum games with application to h-[infinity] control
    • A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Adaptive critic designs for discrete-time zero-sum games with application to h-[infinity] control," IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 37, pp. 240-247, 2007.
    • (2007) IEEE Trans. Syst. Man Cybern. Part B Cybern. , vol.37 , pp. 240-247
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 11
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • D. A. White and D. A. Sofge, Eds. New York: Van Nostrand Reinhold
    • P. Werbos, "Approximate dynamic programming for real-time control and neural modeling," in Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches, D. A. White and D. A. Sofge, Eds. New York: Van Nostrand Reinhold, 1992.
    • (1992) Handbook of Intelligent Control: Neural, Fuzzy, and Adaptive Approaches
    • Werbos, P.1
  • 14
    • 49049089962 scopus 로고    scopus 로고
    • Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
    • A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof," IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 38, pp. 943-949, 2008.
    • (2008) IEEE Trans. Syst. Man Cybern. Part B Cybern. , vol.38 , pp. 943-949
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 15
    • 33846781129 scopus 로고    scopus 로고
    • Model-free q-learning designs for linear discrete-time zero-sum games with application to h-[infinity] control
    • -, "Model-free q-learning designs for linear discrete-time zero-sum games with application to h-[infinity] control," Automatica, vol. 43, pp. 473-481, 2007.
    • (2007) Automatica , vol.43 , pp. 473-481
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 16
    • 0030196717 scopus 로고    scopus 로고
    • Adaptive-critic-based neural networks for aircraft optimal control
    • S. Balakrishnan, "Adaptive-critic-based neural networks for aircraft optimal control," J. Guid. Contr. Dynam., vol. 19, no. 4, pp. 893-898, 1996.
    • (1996) J. Guid. Contr. Dynam. , vol.19 , Issue.4 , pp. 893-898
    • Balakrishnan, S.1
  • 17
    • 0033685661 scopus 로고    scopus 로고
    • Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle
    • G. Lendaris, L. Schultz, and T. Shannon, "Adaptive critic design for intelligent steering and speed control of a 2-axle vehicle," in Int. Joint Conf. Neural Netw., 2000, pp. 73-78.
    • (2000) Int. Joint Conf. Neural Netw. , pp. 73-78
    • Lendaris, G.1    Schultz, L.2    Shannon, T.3
  • 19
    • 0036641793 scopus 로고    scopus 로고
    • State-constrained agile missile control with adaptive-critic-based neural networks
    • D. Han and S. Balakrishnan, "State-constrained agile missile control with adaptive-critic-based neural networks," IEEE Trans. Control Syst. Technol., vol. 10, no. 4, pp. 481-489, 2002.
    • (2002) IEEE Trans. Control Syst. Technol. , vol.10 , Issue.4 , pp. 481-489
    • Han, D.1    Balakrishnan, S.2
  • 20
    • 34047138362 scopus 로고    scopus 로고
    • Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints
    • P. He and S. Jagannathan, "Reinforcement learning neural-network-based controller for nonlinear discrete-time systems with input constraints," IEEE Trans. Syst. Man Cybern. Part B Cybern., vol. 37, no. 2, pp. 425-436, 2007.
    • (2007) IEEE Trans. Syst. Man Cybern. Part B Cybern. , vol.37 , Issue.2 , pp. 425-436
    • He, P.1    Jagannathan, S.2
  • 21
    • 0004370245 scopus 로고
    • Wright Lab, Wright-Patterson Air Force Base, OH, Tech. Rep.
    • L. Baird, "Advantage updating," Wright Lab, Wright-Patterson Air Force Base, OH, Tech. Rep., 1993.
    • (1993) Advantage Updating
    • Baird, L.1
  • 22
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
    • (2000) Neural Comput. , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 24
    • 0031332446 scopus 로고    scopus 로고
    • Galerkin approximations of the generalized hamilton-jacobi-bellman equation
    • R. Beard, G. Saridis, and J. Wen, "Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation," Automatica, vol. 33, pp. 2159-2178, 1997.
    • (1997) Automatica , vol.33 , pp. 2159-2178
    • Beard, R.1    Saridis, G.2    Wen, J.3
  • 25
    • 67349145396 scopus 로고    scopus 로고
    • Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems
    • D. Vrabie and F. Lewis, "Neural network approach to continuous-time direct adaptive optimal control for partially unknown nonlinear systems," Neural Networks, vol. 22, no. 3, pp. 237 - 246, 2009.
    • (2009) Neural Networks , vol.22 , Issue.3 , pp. 237-246
    • Vrabie, D.1    Lewis, F.2
  • 26
  • 27
    • 79953155097 scopus 로고    scopus 로고
    • Online neural network solution of nonlinear two-player zero-sum games using synchronous policy iteration
    • -, "Online neural network solution of nonlinear two-player zero-sum games using synchronous policy iteration," in Proc. IEEE Conf. Decis. Control, 2010.
    • (2010) Proc. IEEE Conf. Decis. Control
    • Vamvoudakis, K.1    Lewis, F.2
  • 28
    • 79953151751 scopus 로고    scopus 로고
    • A model-free robust policy iteration algorithm for optimal control of nonlinear systems
    • S. Bhasin, M. Johnson, and W. E. Dixon, "A model-free robust policy iteration algorithm for optimal control of nonlinear systems," in Proc. IEEE Conf. Decis. Control, 2010, pp. 3060-3065.
    • (2010) Proc. IEEE Conf. Decis. Control , pp. 3060-3065
    • Bhasin, S.1    Johnson, M.2    Dixon, W.E.3
  • 29
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
    • (1988) Mach. Learn. , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.1
  • 31
    • 0026883666 scopus 로고
    • L2-gain analysis of nonlinear systems and nonlinear H-[infinity] control
    • A. Van der Schaft, "L2-gain analysis of nonlinear systems and nonlinear H-[infinity] control," IEEE Trans. Autom. Control, vol. 37, no. 6, pp. 770-784, 1992.
    • (1992) IEEE Trans. Autom. Control , vol.37 , Issue.6 , pp. 770-784
    • Van Der Schaft, A.1
  • 34
    • 0024861871 scopus 로고
    • Approximation by superpositions of a sigmoidal function
    • G. Cybenko, "Approximation by superpositions of a sigmoidal function," Math. Control Signals Syst., vol. 2, pp. 303-314, 1989.
    • (1989) Math. Control Signals Syst. , vol.2 , pp. 303-314
    • Cybenko, G.1
  • 36
    • 0000466705 scopus 로고    scopus 로고
    • Nonlinear network structures for feedback control
    • F. L. Lewis, "Nonlinear network structures for feedback control," Asian J. Control, vol. 1, no. 4, pp. 205-228, 1999.
    • (1999) Asian J. Control , vol.1 , Issue.4 , pp. 205-228
    • Lewis, F.L.1
  • 37
    • 0004469897 scopus 로고
    • Neurons with graded response have collective computational properties like those of two-state neurons
    • J. Hopfield, "Neurons with graded response have collective computational properties like those of two-state neurons," Proc. Nat. Acad. Sci. U.S.A., vol. 81, no. 10, p. 3088, 1984.
    • (1984) Proc. Nat. Acad. Sci. U.S.A. , vol.81 , Issue.10 , pp. 3088
    • Hopfield, J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.