메뉴 건너뛰기




Volumn 25, Issue 10, 2014, Pages 1909-1920

Clipping in neurocontrol by adaptive dynamic programming

Author keywords

Backpropagation through time (BPTT); clipping; dual heuristic programming (DHP); neurocontrol; value gradient learning

Indexed keywords

BACKPROPAGATION THROUGH TIME; CLIPPING; DUAL HEURISTIC PROGRAMMING; NEURO CONTROL; VALUE-GRADIENT LEARNING;

EID: 84907821545     PISSN: 2162237X     EISSN: 21622388     Source Type: Journal    
DOI: 10.1109/TNNLS.2014.2297991     Document Type: Article
Times cited : (12)

References (35)
  • 1
    • 66449130966 scopus 로고    scopus 로고
    • Adaptive dynamic programming: An introduction
    • May
    • F.-Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.
    • (2009) IEEE Comput. Intell. Mag , vol.4 , Issue.2 , pp. 39-47
    • Wang, F.-Y.1    Zhang, H.2    Liu, D.3
  • 2
    • 84907812505 scopus 로고
    • Neurocontrol: Where it is going and why it is crucial
    • North Holland, Amsterdam, I. Aleksander, Ed. New York, NY, USA: Taylor
    • P. J. Werbos, "Neurocontrol: Where it is going and why it is crucial," in Artifical Neural Networks II (ICANN-2), North Holland, Amsterdam, I. Aleksander, Ed. New York, NY, USA: Taylor, 1992, pp. 61-68.
    • (1992) Artifical Neural Networks II (ICANN-2) , pp. 61-68
    • Werbos, P.J.1
  • 3
    • 0033284518 scopus 로고    scopus 로고
    • Neural networks for control
    • M. Hagan and H. Demuth, "Neural networks for control," in Proc. Amer. Control Conf., vol. 3. 1999, pp. 1642-1656.
    • (1999) Proc. Amer. Control Conf , vol.3 , pp. 1642-1656
    • Hagan, M.1    Demuth, H.2
  • 5
    • 84883537695 scopus 로고    scopus 로고
    • Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers
    • Dec
    • F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, "Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers," IEEE Control Syst., vol. 32, no. 6, pp. 76-105, Dec. 2012.
    • (2012) IEEE Control Syst , vol.32 , Issue.6 , pp. 76-105
    • Lewis, F.L.1    Vrabie, D.2    Vamvoudakis, K.G.3
  • 6
    • 0002031779 scopus 로고
    • Approximating dynamic programming for real-time control and neural modeling
    • D. A. White and D. A. Sofge, Eds. New York, NY, USA: Van Nostrand, ch. 13
    • P. J. Werbos, "Approximating dynamic programming for real-time control and neural modeling." in Handbook of Intelligent Control, D. A. White and D. A. Sofge, Eds. New York, NY, USA: Van Nostrand, 1992, pp. 493-525, ch. 13.
    • (1992) Handbook of Intelligent Control , pp. 493-525
    • Werbos, P.J.1
  • 7
    • 0031236002 scopus 로고    scopus 로고
    • Adaptive critic designs
    • Sep
    • D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997.
    • (1997) IEEE Trans. Neural Netw , vol.8 , Issue.5 , pp. 997-1007
    • Prokhorov, D.1    Wunsch, D.2
  • 8
    • 0025503558 scopus 로고
    • Backpropagation through time: What it does and how to do it
    • Oct
    • P. J. Werbos, "Backpropagation through time: What it does and how to do it," Proc. IEEE, vol. 78, no. 10, pp. 1550-1560, Oct. 1990.
    • (1990) Proc. IEEE , vol.78 , Issue.10 , pp. 1550-1560
    • Werbos, P.J.1
  • 9
    • 84865069763 scopus 로고    scopus 로고
    • Value-gradient learning
    • Jun
    • M. Fairbank and E. Alonso, "Value-gradient learning," in Proc. IEEE IJCNN, Jun. 2012, pp. 3062-3069.
    • (2012) Proc. IEEE IJCNN , pp. 3062-3069
    • Fairbank, M.1    Alonso, E.2
  • 12
    • 84887996993 scopus 로고    scopus 로고
    • An equivalence between adaptive dynamic programming with a critic and backpropagation through time
    • Dec
    • M. Fairbank, E. Alonso, and D. Prokhorov, "An equivalence between adaptive dynamic programming with a critic and backpropagation through time," IEEE Trans. Neural Netw. Learn. Syst., vol. 24, no. 12, pp. 2088-2100, Dec. 2013.
    • (2013) IEEE Trans. Neural Netw. Learn. Syst , vol.24 , Issue.12 , pp. 2088-2100
    • Fairbank, M.1    Alonso, E.2    Prokhorov, D.3
  • 13
    • 34548772284 scopus 로고    scopus 로고
    • Backpropagation through time and derivative adaptive critics: A common framework for comparison
    • J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley
    • D. Prokhorov, "Backpropagation through time and derivative adaptive critics: A common framework for comparison," in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004.
    • (2004) Handbook of Learning and Approximate Dynamic Programming
    • Prokhorov, D.1
  • 15
    • 33646384929 scopus 로고    scopus 로고
    • Policy gradient in continuous time
    • Jan
    • R. Munos, "Policy gradient in continuous time," J. Mach. Learn. Res., vol. 7, pp. 413-427, Jan. 2006.
    • (2006) J. Mach. Learn. Res , vol.7 , pp. 413-427
    • Munos, R.1
  • 16
    • 84876158475 scopus 로고    scopus 로고
    • Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks
    • Oct
    • M. Fairbank, E. Alonso, and D. Prokhorov, "Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 10, pp. 1671-1678, Oct. 2012.
    • (2012) IEEE Trans. Neural Netw. Learn. Syst , vol.23 , Issue.10 , pp. 1671-1678
    • Fairbank, M.1    Alonso, E.2    Prokhorov, D.3
  • 17
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network hjb approach
    • M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.L.2
  • 18
    • 33847648898 scopus 로고    scopus 로고
    • Adaptive critic designs for discrete-time zero-sum games with application to h∞ control
    • Feb
    • A. Al-Tamimi, M. Abu-Khalaf, and F. L. Lewis, "Adaptive critic designs for discrete-time zero-sum games with application to h∞ control," IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 37, no. 1, pp. 240-247, Feb. 2007.
    • (2007) IEEE Trans. Syst., Man, Cybern., B, Cybern , vol.37 , Issue.1 , pp. 240-247
    • Al-Tamimi, A.1    Abu-Khalaf, M.2    Lewis, F.L.3
  • 19
    • 49049089962 scopus 로고    scopus 로고
    • Discrete-time nonlinear hjb solution using approximate dynamic programming: Convergence proof
    • Aug
    • A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof," IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 38, no. 4, pp. 943-949, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern., B, Cybern , vol.38 , Issue.4 , pp. 943-949
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 20
    • 70349253929 scopus 로고    scopus 로고
    • Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints
    • Sep
    • H. Zhang, Y. Luo, and D. Liu, "Neural-network-based near-optimal control for a class of discrete-time affine nonlinear systems with control constraints," IEEE Trans. Neural Netw., vol. 20, no. 9, pp. 1490-1503, Sep. 2009.
    • (2009) IEEE Trans. Neural Netw , vol.20 , Issue.9 , pp. 1490-1503
    • Zhang, H.1    Luo, Y.2    Liu, D.3
  • 21
    • 49049119493 scopus 로고    scopus 로고
    • A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy hdp iteration algorithm
    • Aug
    • H. Zhang, Q. Wei, and Y. Luo, "A novel infinite-time optimal tracking control scheme for a class of discrete-time nonlinear systems via the greedy HDP iteration algorithm," IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 38, no. 4, pp. 937-942, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern., B, Cybern , vol.38 , Issue.4 , pp. 937-942
    • Zhang, H.1    Wei, Q.2    Luo, Y.3
  • 22
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams, "Simple statistical gradient-following algorithms for connectionist reinforcement learning," Mach. Learn., vol. 8, nos. 3-4, pp. 229-356, 1992.
    • (1992) Mach. Learn , vol.8 , Issue.3-4 , pp. 229-356
    • Williams, R.J.1
  • 23
  • 24
    • 0004049893 scopus 로고
    • Ph.D. dissertation, Dept. Comput. Sci., Cambridge Univ., Cambridge, U.K.
    • C. J. C. H. Watkins, "Learning from delayed rewards," Ph.D. dissertation, Dept. Comput. Sci., Cambridge Univ., Cambridge, U.K., 1989.
    • (1989) Learning from Delayed Rewards
    • Watkins, C.J.C.H.1
  • 25
    • 85032189594 scopus 로고    scopus 로고
    • Model-based adaptive critic designs
    • J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley
    • S. Ferrari and R. F. Stengel, "Model-based adaptive critic designs," in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004, pp. 65-96.
    • (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 65-96
    • Ferrari, S.1    Stengel, R.F.2
  • 26
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
    • (1988) Mach. Learn , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 27
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
    • (2000) Neural Comput , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 29
    • 0022471098 scopus 로고
    • Learning representations by back-propagating errors
    • Oct
    • D. Rumelhart, G. Hinton, and R. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, Oct. 1986.
    • (1986) Nature , vol.323 , Issue.6088 , pp. 533-536
    • Rumelhart, D.1    Hinton, G.2    Williams, R.3
  • 30
    • 0027554566 scopus 로고
    • Temporal-difference methods and markov models
    • Mar./Apr
    • E. Barnard, "Temporal-difference methods and Markov models," IEEE Trans. Syst., Man, Cybern., vol. 23, no. 2, pp. 357-365, Mar./Apr. 1993.
    • (1993) IEEE Trans. Syst., Man, Cybern , vol.23 , Issue.2 , pp. 357-365
    • Barnard, E.1
  • 32
    • 84865077338 scopus 로고    scopus 로고
    • A comparison of learning speed and ability to cope without exploration between dhp and td(0)
    • Jun
    • M. Fairbank and E. Alonso, "A comparison of learning speed and ability to cope without exploration between DHP and TD(0)," in Proc. IEEE Int. Joint Conf. Neural Netw., Jun. 2012, pp. 1478-1485.
    • (2012) Proc. IEEE Int. Joint Conf. Neural Netw , pp. 1478-1485
    • Fairbank, M.1    Alonso, E.2
  • 33
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • Sep./Oct
    • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. 13, no. 5, pp. 834-846, Sep./Oct. 1983.
    • (1983) IEEE Trans. Syst., Man, Cybern , vol.13 , Issue.5 , pp. 834-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 34
    • 0030702730 scopus 로고    scopus 로고
    • Training strategies for critic and action neural networks in dual heuristic programming method
    • Jun
    • G. G. Lendaris and C. Paintz, "Training strategies for critic and action neural networks in dual heuristic programming method," in Proc. Int. Conf. Neural Netw., Jun. 1997, pp. 712-717.
    • (1997) Proc. Int. Conf. Neural Netw , pp. 712-717
    • Lendaris, G.G.1    Paintz, C.2
  • 35
    • 76649091717 scopus 로고    scopus 로고
    • Correct equations for the dynamics of the cart-pole system
    • Univ. Bucharest, Cluj-Napoca, Romania, Tech. Rep
    • R. V. Florian, "Correct equations for the dynamics of the cart-pole system," Center for Cognitive and Neural Studies (Coneural), Univ. Bucharest, Cluj-Napoca, Romania, Tech. Rep., 2007.
    • (2007) Center for Cognitive and Neural Studies (Coneural)
    • Florian, R.V.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.