메뉴 건너뛰기




Volumn 24, Issue 12, 2013, Pages 2088-2100

An equivalence between adaptive dynamic programming with a critic and backpropagation through time

Author keywords

Adaptive dynamic programming (ADP); backpropagation through time; dual heuristic programming (DHP); neural networks; value gradient learning

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; BACK-PROPAGATION THROUGH TIME; CONTINUOUS STATE SPACE; DUAL HEURISTIC PROGRAMMING; GUARANTEED CONVERGENCE; NONLINEAR FUNCTIONS; SMOOTHNESS CONDITIONS; VALUE-GRADIENT LEARNING;

EID: 84887996993     PISSN: 2162237X     EISSN: 21622388     Source Type: Journal    
DOI: 10.1109/TNNLS.2013.2271778     Document Type: Article
Times cited : (38)

References (29)
  • 1
    • 66449130966 scopus 로고    scopus 로고
    • Adaptive dynamic programming: An introduction
    • May
    • F.-Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.
    • (2009) IEEE Comput. Intell. Mag. , vol.4 , Issue.2 , pp. 39-47
    • Wang, F.-Y.1    Zhang, H.2    Liu, D.3
  • 2
    • 85012688561 scopus 로고
    • Princeton NJ USA: Princeton Univ. Press
    • R. E. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 1957.
    • (1957) Dynamic Programming
    • Bellman, R.E.1
  • 3
    • 0002031779 scopus 로고
    • Approximate dynamic programming for real-time control and neural modeling
    • D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold ch. 13
    • P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," in Handbook of Intelligent Control, D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, ch. 13, pp. 493-525.
    • (1992) Handbook of Intelligent Control , pp. 493-525
    • Werbos, P.J.1
  • 5
    • 85032189594 scopus 로고    scopus 로고
    • Model-based adaptive critic designs
    • J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley
    • S. Ferrari and R. F. Stengel, "Model-based adaptive critic designs," in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004, pp. 65-96.
    • (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 65-96
    • Ferrari, S.1    Stengel, R.F.2
  • 6
    • 84876158475 scopus 로고    scopus 로고
    • Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks
    • Oct 2012
    • M. Fairbank, E. Alonso, and D. Prokhorov, "Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 10, pp. 1671-1678, Oct. 2012.
    • IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.10 , pp. 1671-1678
    • Fairbank, M.1    Alonso, E.2    Prokhorov, D.3
  • 8
    • 84865069763 scopus 로고    scopus 로고
    • Value-gradient learning
    • Jun.
    • M. Fairbank and E. Alonso, "Value-gradient learning," in Proc. IEEE IJCNN, Jun. 2012, pp. 3062-3069.
    • (2012) Proc. IEEE IJCNN , pp. 3062-3069
    • Fairbank, M.1    Alonso, E.2
  • 9
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
    • (1988) Mach. Learn. , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 10
    • 0037561866 scopus 로고    scopus 로고
    • Dual heuristic programming excitation neurocontrol for generators in a multimachine power system
    • Mar./Apr
    • G. K. Venayagamoorthy and D. C. Wunsch, "Dual heuristic programming excitation neurocontrol for generators in a multimachine power system," IEEE Trans. Ind. Appl., vol. 39, no. 2, pp. 382-394, Mar./Apr. 2003.
    • (2003) IEEE Trans. Ind. Appl. , vol.39 , Issue.2 , pp. 382-394
    • Venayagamoorthy, G.K.1    Wunsch, D.C.2
  • 11
    • 0030702730 scopus 로고    scopus 로고
    • Training strategies for critic and action neural networks in dual heuristic programming method
    • Jun.
    • G. G. Lendaris and C. Paintz, "Training strategies for critic and action neural networks in dual heuristic programming method," in Proc. Int. Conf. Neural Netw., Jun. 1997, pp. 712-717.
    • (1997) Proc. Int. Conf. Neural Netw. , pp. 712-717
    • Lendaris, G.G.1    Paintz, C.2
  • 14
    • 84865077338 scopus 로고    scopus 로고
    • A comparison of learning speed and ability to cope without exploration between DHP and TD(0)
    • Jun.
    • M. Fairbank and E. Alonso, "A comparison of learning speed and ability to cope without exploration between DHP and TD(0)," in Proc. IEEE IJCNN, Jun. 2012, pp. 1478-1485.
    • (2012) Proc. IEEE IJCNN , pp. 1478-1485
    • Fairbank, M.1    Alonso, E.2
  • 15
    • 0008011457 scopus 로고
    • Neural networks, system identification, and control in the chemical process industries
    • D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold ch. 10
    • P. J. Werbos, "Neural networks, system identification, and control in the chemical process industries," in Handbook of Intelligent Control, D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, ch. 10, pp. 283-356.
    • (1992) Handbook of Intelligent Control , pp. 283-356
    • Werbos, P.J.1
  • 17
    • 49049089962 scopus 로고    scopus 로고
    • Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
    • Aug
    • A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof," IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 38, no. 4, pp. 943-949, Aug. 2008.
    • (2008) IEEE Trans. Syst., Man, Cybern., B, Cybern. , vol.38 , Issue.4 , pp. 943-949
    • Al-Tamimi, A.1    Lewis, F.L.2    Abu-Khalaf, M.3
  • 18
    • 80053166137 scopus 로고    scopus 로고
    • Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics
    • Jun./Jul.
    • A. Heydari and S. N. Balakrishnan, "Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics," in Proc. ACC, Jun./Jul. 2011, pp. 3047-3052.
    • (2011) Proc. ACC , pp. 3047-3052
    • Heydari, A.1    Balakrishnan, S.N.2
  • 20
    • 0032687566 scopus 로고    scopus 로고
    • Stable adaptive control using new critic designs
    • Mar, ArXiv:adap-org/9810001
    • P. J. Werbos, "Stable adaptive control using new critic designs," Proc. SPIE, vol. 3728, p. 510, Mar. 1999, ArXiv:adap-org/9810001
    • (1999) Proc. SPIE , vol.3728 , pp. 510
    • Werbos, P.J.1
  • 21
    • 85151728371 scopus 로고
    • Residual algorithms: Reinforcement learning with function approximation
    • L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. Int. Conf. Mach. Learn., 1995, pp. 30-37.
    • (1995) Proc. Int. Conf. Mach. Learn. , pp. 30-37
    • Baird, L.C.1
  • 22
    • 0025229247 scopus 로고
    • Consistency of HDP applied to a simple reinforcement learning problem
    • Jan
    • P. J. Werbos, "Consistency of HDP applied to a simple reinforcement learning problem," Neural Netw., vol. 3, pp. 179-189, Jan. 1990.
    • (1990) Neural Netw. , vol.3 , pp. 179-189
    • Werbos, P.J.1
  • 23
    • 0025503558 scopus 로고
    • Backpropagation through time: What it does and how to do it
    • Oct
    • P. J. Werbos, "Backpropagation through time: What it does and how to do it," Proc. IEEE, vol. 78, no. 10, pp. 1550-1560, Oct. 1990.
    • (1990) Proc. IEEE , vol.78 , Issue.10 , pp. 1550-1560
    • Werbos, P.J.1
  • 24
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
    • (2000) Neural Comput. , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 25
    • 0027554566 scopus 로고
    • Temporal-difference methods and Markov models
    • Mar./Apr
    • E. Barnard, "Temporal-difference methods and Markov models," IEEE Trans. Syst., Man, Cybern., vol. 23, no. 2, pp. 357-365, Mar./Apr. 1993.
    • (1993) IEEE Trans. Syst., Man, Cybern. , vol.23 , Issue.2 , pp. 357-365
    • Barnard, E.1
  • 27
    • 84943274699 scopus 로고
    • A direct adaptive method for faster backpropagation learning: The RPROP algorithm
    • San Francisco, CA, USA, Apr.
    • M. Riedmiller and H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm," in Proc. IEEE Int. Conf. Neural Netw., San Francisco, CA, USA, Apr. 1993, pp. 586-591.
    • (1993) Proc. IEEE Int. Conf. Neural Netw. , pp. 586-591
    • Riedmiller, M.1    Braun, H.2
  • 28
    • 0020970738 scopus 로고
    • Neuronlike adaptive elements that can solve difficult learning control problems
    • A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. 13, no. 5, pp. 834-846, Sep./Oct. 1983. (Pubitemid 14138646)
    • (1983) IEEE Transactions on Systems, Man and Cybernetics , vol.13 , Issue.5 , pp. 834-846
    • Barto, A.G.1    Sutton, R.S.2    Anderson, C.W.3
  • 29
    • 76649091717 scopus 로고    scopus 로고
    • Correct equations for the dynamics of the cart-pole system
    • Cluj-Napoca, Romania, Tech. Rep.
    • R. V. Florian, "Correct equations for the dynamics of the cart-pole system," Center for Cognit., Neural Studies (Coneural), Cluj-Napoca, Romania, Tech. Rep., 2007.
    • (2007) Center for Cognit., Neural Studies (Coneural)
    • Florian, R.V.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.