SCOPUS 정보 검색 플랫폼

IEEE Transactions on Neural Networks and Learning Systems

Volumn 24, Issue 12, 2013, Pages 2088-2100

An equivalence between adaptive dynamic programming with a critic and backpropagation through time

(3) Fairbank, Michael a Alonso, Eduardo a Prokhorov, Danil b

b Toyota Motor North America Inc (United States)

Author keywords

Adaptive dynamic programming (ADP); backpropagation through time; dual heuristic programming (DHP); neural networks; value gradient learning

Indexed keywords

ADAPTIVE DYNAMIC PROGRAMMING; BACK-PROPAGATION THROUGH TIME; CONTINUOUS STATE SPACE; DUAL HEURISTIC PROGRAMMING; GUARANTEED CONVERGENCE; NONLINEAR FUNCTIONS; SMOOTHNESS CONDITIONS; VALUE-GRADIENT LEARNING;

DYNAMIC PROGRAMMING; HEURISTIC PROGRAMMING; NEURAL NETWORKS;

ALGORITHMS;

EID: 84887996993 PISSN: 2162237X EISSN: 21622388 Source Type: Journal
DOI: 10.1109/TNNLS.2013.2271778 Document Type: Article

Times cited : (38)

References (29)

1
- 66449130966
- Adaptive dynamic programming: An introduction
- May
- F.-Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: An introduction," IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009.
- (2009) IEEE Comput. Intell. Mag. , vol.4 , Issue.2 , pp. 39-47
- Wang, F.-Y.¹ Zhang, H.² Liu, D.³

2
- 85012688561
- Princeton NJ USA: Princeton Univ. Press
- R. E. Bellman, Dynamic Programming. Princeton, NJ, USA: Princeton Univ. Press, 1957.
- (1957) Dynamic Programming
- Bellman, R.E.¹

3
- 0002031779
- Approximate dynamic programming for real-time control and neural modeling
- D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold ch. 13
- P. J. Werbos, "Approximate dynamic programming for real-time control and neural modeling," in Handbook of Intelligent Control, D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, ch. 13, pp. 493-525.
- (1992) Handbook of Intelligent Control , pp. 493-525
- Werbos, P.J.¹

4
- 0031236002
- Adaptive critic designs
- PII S1045922797052430
- D. Prokhorov and D. Wunsch, "Adaptive critic designs," IEEE Trans. Neural Netw., vol. 8, no. 5, pp. 997-1007, Sep. 1997. (Pubitemid 127763331)
- (1997) IEEE Transactions on Neural Networks , vol.8 , Issue.5 , pp. 997-1007
- Prokhorov, D.V.¹ Wunsch II, D.C.²

5
- 85032189594
- Model-based adaptive critic designs
- J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley
- S. Ferrari and R. F. Stengel, "Model-based adaptive critic designs," in Handbook of Learning and Approximate Dynamic Programming, J. Si, A. Barto, W. Powell, and D. Wunsch, Eds. New York, NY, USA: Wiley, 2004, pp. 65-96.
- (2004) Handbook of Learning and Approximate Dynamic Programming , pp. 65-96
- Ferrari, S.¹ Stengel, R.F.²

6
- 84876158475
- Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks
- Oct 2012
- M. Fairbank, E. Alonso, and D. Prokhorov, "Simple and fast calculation of the second-order gradients for globalized dual heuristic dynamic programming in neural networks," IEEE Trans. Neural Netw. Learn. Syst., vol. 23, no. 10, pp. 1671-1678, Oct. 2012.
- IEEE Trans. Neural Netw. Learn. Syst. , vol.23 , Issue.10 , pp. 1671-1678
- Fairbank, M.¹ Alonso, E.² Prokhorov, D.³

7
- 84865080650
- M. Fairbank, "Reinforcement learning by value gradients," (2008) [Online]. Available: http://arxiv.org/abs/0803.3539
- (2008) Reinforcement Learning by Value Gradients
- Fairbank, M.¹

8
- 84865069763
- Value-gradient learning
- Jun.
- M. Fairbank and E. Alonso, "Value-gradient learning," in Proc. IEEE IJCNN, Jun. 2012, pp. 3062-3069.
- (2012) Proc. IEEE IJCNN , pp. 3062-3069
- Fairbank, M.¹ Alonso, E.²

9
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, 1988.
- (1988) Mach. Learn. , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.S.¹

10
- 0037561866
- Dual heuristic programming excitation neurocontrol for generators in a multimachine power system
- Mar./Apr
- G. K. Venayagamoorthy and D. C. Wunsch, "Dual heuristic programming excitation neurocontrol for generators in a multimachine power system," IEEE Trans. Ind. Appl., vol. 39, no. 2, pp. 382-394, Mar./Apr. 2003.
- (2003) IEEE Trans. Ind. Appl. , vol.39 , Issue.2 , pp. 382-394
- Venayagamoorthy, G.K.¹ Wunsch, D.C.²

11
- 0030702730
- Training strategies for critic and action neural networks in dual heuristic programming method
- Jun.
- G. G. Lendaris and C. Paintz, "Training strategies for critic and action neural networks in dual heuristic programming method," in Proc. Int. Conf. Neural Netw., Jun. 1997, pp. 712-717.
- (1997) Proc. Int. Conf. Neural Netw. , pp. 712-717
- Lendaris, G.G.¹ Paintz, C.²

12
- 0003716450
- New York, NY, USA: Wiley
- L. S. Pontryagin, V. G. Boltayanskii, R. V. Gamkrelidze, and E. F. Mishchenko, The Mathematical Theory of Optimal Processes, vol. 4. New York, NY, USA: Wiley, 1962.
- (1962) The Mathematical Theory of Optimal Processes , vol.4
- Pontryagin, L.S.¹ Boltayanskii, V.G.² Gamkrelidze, R.V.³ Mishchenko, E.F.⁴

13
- 84865070696
- M. Fairbank and E. Alonso, "The local optimality of reinforcement learning by value gradients, and its relationship to policy gradient learning," (2011) [Online]. Available: http://arxiv.org/abs/1101.0428
- (2011) The Local Optimality of Reinforcement Learning by Value Gradients, and Its Relationship to Policy Gradient Learning
- Fairbank, M.¹ Alonso, E.²

14
- 84865077338
- A comparison of learning speed and ability to cope without exploration between DHP and TD(0)
- Jun.
- M. Fairbank and E. Alonso, "A comparison of learning speed and ability to cope without exploration between DHP and TD(0)," in Proc. IEEE IJCNN, Jun. 2012, pp. 1478-1485.
- (2012) Proc. IEEE IJCNN , pp. 1478-1485
- Fairbank, M.¹ Alonso, E.²

15
- 0008011457
- Neural networks, system identification, and control in the chemical process industries
- D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold ch. 10
- P. J. Werbos, "Neural networks, system identification, and control in the chemical process industries," in Handbook of Intelligent Control, D. White and D. Sofge, Eds. New York, NY, USA: Van Nostrand Reinhold, 1992, ch. 10, pp. 283-356.
- (1992) Handbook of Intelligent Control , pp. 283-356
- Werbos, P.J.¹

16
- 84888016114
- Cambridge MA USA: MIT Press ch. 4
- R. A. Howard, Dynamic Programming and Markov Processes. Cambridge, MA, USA: MIT Press, 1960, ch. 4, pp. 42-43.
- (1960) Dynamic Programming and Markov Processes , pp. 42-43
- Howard, R.A.¹

17
- 49049089962
- Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
- Aug
- A. Al-Tamimi, F. L. Lewis, and M. Abu-Khalaf, "Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof," IEEE Trans. Syst., Man, Cybern., B, Cybern., vol. 38, no. 4, pp. 943-949, Aug. 2008.
- (2008) IEEE Trans. Syst., Man, Cybern., B, Cybern. , vol.38 , Issue.4 , pp. 943-949
- Al-Tamimi, A.¹ Lewis, F.L.² Abu-Khalaf, M.³

18
- 80053166137
- Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics
- Jun./Jul.
- A. Heydari and S. N. Balakrishnan, "Finite-horizon input-constrained nonlinear optimal control using single network adaptive critics," in Proc. ACC, Jun./Jul. 2011, pp. 3047-3052.
- (2011) Proc. ACC , pp. 3047-3052
- Heydari, A.¹ Balakrishnan, S.N.²

19
- 0031372596
- Convergence of critic-based training
- Oct.
- D. V. Prokhorov and D. C. Wunsch, "Convergence of critic-based training," in Proc. IEEE Int. Conf. Syst. Man, Cybern. Comput. Cybern. Simul., Oct. 1997, pp. 3057-3060.
- (1997) Proc. IEEE Int. Conf. Syst. Man, Cybern. Comput. Cybern. Simul. , pp. 3057-3060
- Prokhorov, D.V.¹ Wunsch, D.C.²

20
- 0032687566
- Stable adaptive control using new critic designs
- Mar, ArXiv:adap-org/9810001
- P. J. Werbos, "Stable adaptive control using new critic designs," Proc. SPIE, vol. 3728, p. 510, Mar. 1999, ArXiv:adap-org/9810001
- (1999) Proc. SPIE , vol.3728 , pp. 510
- Werbos, P.J.¹

21
- 85151728371
- Residual algorithms: Reinforcement learning with function approximation
- L. C. Baird, "Residual algorithms: Reinforcement learning with function approximation," in Proc. Int. Conf. Mach. Learn., 1995, pp. 30-37.
- (1995) Proc. Int. Conf. Mach. Learn. , pp. 30-37
- Baird, L.C.¹

22
- 0025229247
- Consistency of HDP applied to a simple reinforcement learning problem
- Jan
- P. J. Werbos, "Consistency of HDP applied to a simple reinforcement learning problem," Neural Netw., vol. 3, pp. 179-189, Jan. 1990.
- (1990) Neural Netw. , vol.3 , pp. 179-189
- Werbos, P.J.¹

23
- 0025503558
- Backpropagation through time: What it does and how to do it
- Oct
- P. J. Werbos, "Backpropagation through time: What it does and how to do it," Proc. IEEE, vol. 78, no. 10, pp. 1550-1560, Oct. 1990.
- (1990) Proc. IEEE , vol.78 , Issue.10 , pp. 1550-1560
- Werbos, P.J.¹

24
- 0033629916
- Reinforcement learning in continuous time and space
- K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
- (2000) Neural Comput. , vol.12 , Issue.1 , pp. 219-245
- Doya, K.¹

25
- 0027554566
- Temporal-difference methods and Markov models
- Mar./Apr
- E. Barnard, "Temporal-difference methods and Markov models," IEEE Trans. Syst., Man, Cybern., vol. 23, no. 2, pp. 357-365, Mar./Apr. 1993.
- (1993) IEEE Trans. Syst., Man, Cybern. , vol.23 , Issue.2 , pp. 357-365
- Barnard, E.¹

26
- 84886350301
- Approximating optimal control with value gradient learning
- F. Lewis and D. Liu, Eds. New York, NY, USA: Wiley, Sections 7.3.4 and 7.4.3
- M. Fairbank, D. Prokhorov, and E. Alonso, "Approximating optimal control with value gradient learning," in Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, F. Lewis and D. Liu, Eds. New York, NY, USA: Wiley, 2012, Sections 7.3.4 and 7.4.3.
- (2012) Reinforcement Learning and Approximate Dynamic Programming for Feedback Control
- Fairbank, M.¹ Prokhorov, D.² Alonso, E.³

27
- 84943274699
- A direct adaptive method for faster backpropagation learning: The RPROP algorithm
- San Francisco, CA, USA, Apr.
- M. Riedmiller and H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm," in Proc. IEEE Int. Conf. Neural Netw., San Francisco, CA, USA, Apr. 1993, pp. 586-591.
- (1993) Proc. IEEE Int. Conf. Neural Netw. , pp. 586-591
- Riedmiller, M.¹ Braun, H.²

28
- 0020970738
- Neuronlike adaptive elements that can solve difficult learning control problems
- A. G. Barto, R. S. Sutton, and C. W. Anderson, "Neuronlike adaptive elements that can solve difficult learning control problems," IEEE Trans. Syst., Man, Cybern., vol. 13, no. 5, pp. 834-846, Sep./Oct. 1983. (Pubitemid 14138646)
- (1983) IEEE Transactions on Systems, Man and Cybernetics , vol.13 , Issue.5 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

29
- 76649091717
- Correct equations for the dynamics of the cart-pole system
- Cluj-Napoca, Romania, Tech. Rep.
- R. V. Florian, "Correct equations for the dynamics of the cart-pole system," Center for Cognit., Neural Studies (Coneural), Cluj-Napoca, Romania, Tech. Rep., 2007.
- (2007) Center for Cognit., Neural Studies (Coneural)
- Florian, R.V.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.