SCOPUS 정보 검색 플랫폼

Volumn 18, Issue 4, 2007, Pages 1031-1041

Least squares solutions of the HJB equation with neural network value-function approximators

a HEBREW UNIVERSITY OF JERUSALEM (Israel)

b WASHINGTON UNIVERSITY IN ST LOUIS (United States)

Author keywords

Differential neural networks (NNs); Dynamic programming; Feedforward neural networks; Hamilton Jacoby Bellman (HJB) equation; Optimal control; Viscosity solution

Indexed keywords

FIRST- AND SECOND-ORDER DIFFERENTIAL BACKPROPAGATION; HAMILTON-JACOBI-BELLMAN (HLB) RESIDUAL; INVERTED-PENDULUM SYSTEM; VALUE FUNCTION COMPLEXITY;

BACKPROPAGATION; CONTROL NONLINEARITIES; CONVERGENCE OF NUMERICAL METHODS; DYNAMIC PROGRAMMING; LEAST SQUARES APPROXIMATIONS; OPTIMAL CONTROL SYSTEMS; RANDOM PROCESSES;

FEEDFORWARD NEURAL NETWORKS;

ALGORITHM; ARTICLE; ARTIFICIAL NEURAL NETWORK; COMPUTER SIMULATION; COMPUTER SYSTEM; DECISION SUPPORT SYSTEM; FEEDBACK SYSTEM; REGRESSION ANALYSIS; THEORETICAL MODEL;

ALGORITHMS; COMPUTER SIMULATION; COMPUTER SYSTEMS; DECISION SUPPORT TECHNIQUES; FEEDBACK; LEAST-SQUARES ANALYSIS; MODELS, THEORETICAL; NEURAL NETWORKS (COMPUTER);

EID: 34547095501 PISSN: 10459227 EISSN: None Source Type: Journal
DOI: 10.1109/TNN.2007.899249 Document Type: Article

Times cited : (75)

References (35)

1
- 84967758647
- Viscosity solutions of Hamilton-Jacobi equations
- M. Crandall and P. Lions, "Viscosity solutions of Hamilton-Jacobi equations," Trans. Amer. Math. Soc., vol. 277, 1983.
- (1983) Trans. Amer. Math. Soc , vol.277
- Crandall, M.¹ Lions, P.²

2
- 85153940465
- Generalization in reinforcement learning: Safely approximating the value function
- G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press
- J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 369-376.
- (1995) Advances in Neural Information Processing Systems 7 , pp. 369-376
- Boyan, J.A.¹ Moore, A.W.²

3
- 0024866495
- On the approximate realization of continuous mappings by neural networks
- K.-I. Funahashi, "On the approximate realization of continuous mappings by neural networks," Neural Netw., vol. 2, pp. 183-192, 1989.
- (1989) Neural Netw , vol.2 , pp. 183-192
- Funahashi, K.-I.¹

4
- 14844340822
- Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
- M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
- (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
- Abu-Khalaf, M.¹ Lewis, F.L.²

5
- 10944228202
- Reinforcement learning using neural networks, with applications to motor control,
- Ph.D. dissertation, Institut National Polytechnique de Grenoble, Grenoble, France
- R. Coulom, "Reinforcement learning using neural networks, with applications to motor control," Ph.D. dissertation, Institut National Polytechnique de Grenoble, Grenoble, France, 2002.
- (2002)
- Coulom, R.¹

6
- 0003565783
- Belmont, MA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific, 1995.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

7
- 0004291983
- New York: Elsevier
- D. H. Jacobson and D. Q. Mayne, Differential Dynamic Programming. New York: Elsevier, 1970.
- (1970) Differential Dynamic Programming
- Jacobson, D.H.¹ Mayne, D.Q.²

8
- 33847202724
- Learning to predict by the methods of temporal differences
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
- (1988) Mach. Learn , vol.3 , pp. 9-44
- Sutton, R.S.¹

9
- 0030896968
- A neural substrate of prediction and reward
- W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science, vol. 275, pp. 1593-1599, 1997.
- (1997) Science , vol.275 , pp. 1593-1599
- Schultz, W.¹ Dayan, P.² Montague, P.R.³

10
- 35048819671
- Least-squares methods in reinforcement learning for control
- M. G. Lagoudakis, R. E. Parr, and M. L. Littman, "Least-squares methods in reinforcement learning for control," in Proc. 2nd Hellenic Conf. Artif. Intell., 2002, vol. 2308, pp. 249-260.
- (2002) Proc. 2nd Hellenic Conf. Artif. Intell , vol.2308 , pp. 249-260
- Lagoudakis, M.G.¹ Parr, R.E.² Littman, M.L.³

11
- 84880680664
- Variable resolution discretization for high-accuracy solutions of optimal control problems
- R. Munos and A. W. Moore, "Variable resolution discretization for high-accuracy solutions of optimal control problems," in Proc. Int. Joint Conf. Artif. Intell., 1999, pp. 1348-1355.
- (1999) Proc. Int. Joint Conf. Artif. Intell , pp. 1348-1355
- Munos, R.¹ Moore, A.W.²

12
- 0004671869
- Temporal difference learning in continuous time and space
- D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press
- K. Doya, "Temporal difference learning in continuous time and space," in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, vol. 8.
- (1996) Advances in Neural Information Processing Systems , vol.8
- Doya, K.¹

13
- 2542485629
- Practical issues in temporal difference learning
- J. E. Moody, S. J. Hanson, and R. P. Lippmann, Eds. San Mateo, CA: Morgan Kaufmann
- G. Tesauro, "Practical issues in temporal difference learning," in Advances in Neural Information Processing Systems, J. E. Moody, S. J. Hanson, and R. P. Lippmann, Eds. San Mateo, CA: Morgan Kaufmann, 1992, vol. 4, pp. 259-266.
- (1992) Advances in Neural Information Processing Systems , vol.4 , pp. 259-266
- Tesauro, G.¹

14
- 0033308517
- Gradient descent approaches to neural net-based solutions of the Hamilton-Jacobi-Bellman equation
- R. Munos, L. Baird, and A. Moore, "Gradient descent approaches to neural net-based solutions of the Hamilton-Jacobi-Bellman equation," in Proc. Int. Joint Conf. Neural Netw., 1999, pp. 1316-1323.
- (1999) Proc. Int. Joint Conf. Neural Netw , pp. 1316-1323
- Munos, R.¹ Baird, L.² Moore, A.³

15
- 0003270924
- Issues in using function approximation for reinforcement learning
- M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, Eds
- S. Thrun and A. Schwartz, "Issues in using function approximation for reinforcement learning," in Proc. 1993 Connectionist Models Summer School, M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, Eds., 1993, pp. 255-263.
- (1993) Proc. 1993 Connectionist Models Summer School , pp. 255-263
- Thrun, S.¹ Schwartz, A.²

16
- 0032202335
- Successive galerkin approximation algorithms for nonlinear optimal and robust control
- R. Beard and T. McLain, "Successive galerkin approximation algorithms for nonlinear optimal and robust control, Proc. Int. J. Control: Special Issue Breakthroughs Control Nonlinear Syst., vol. 71, no. 5, pp. 717-743, 1998.
- (1998) Proc. Int. J. Control: Special Issue Breakthroughs Control Nonlinear Syst , vol.71 , Issue.5 , pp. 717-743
- Beard, R.¹ McLain, T.²

17
- 0003661003
- Cambridge, U.K, Cambridge Univ. Press
- J. A. Sethian, Level Set Methods and Fast Marching Methods. Cambridge, U.K.: Cambridge Univ. Press, 1999.
- (1999) Level Set Methods and Fast Marching Methods
- Sethian, J.A.¹

18
- 0025399567
- Identification and control of dynamical systems using neural networks
- Mar
- K. S. Narendra and K. Parthasarathy, "Identification and control of dynamical systems using neural networks," IEEE Trans. Neural Netw. vol. 1, no. 1, pp. 4-27, Mar. 1990.
- (1990) IEEE Trans. Neural Netw , vol.1 , Issue.1 , pp. 4-27
- Narendra, K.S.¹ Parthasarathy, K.²

19
- 0027594098
- On the nonlinear optimal regulator problem
- C. J. Goh, "On the nonlinear optimal regulator problem," Automatica vol. 29, no. 3, pp. 751-756, 1993.
- (1993) Automatica , vol.29 , Issue.3 , pp. 751-756
- Goh, C.J.¹

20
- 0001440803
- Tangent prop - A formalism for specifying selected invariances in an adaptive network
- J. M. R. Lippman and S. J. Hanson, Eds. San Mateo, CA: Morgan Kaufmann
- P. Simard, B. Victorri, Y. LeCun, and J. Denker, "Tangent prop - A formalism for specifying selected invariances in an adaptive network," in Neural Information Processing Systems, J. M. R. Lippman and S. J. Hanson, Eds. San Mateo, CA: Morgan Kaufmann, 1992, vol. 4.
- (1992) Neural Information Processing Systems , vol.4
- Simard, P.¹ Victorri, B.² LeCun, Y.³ Denker, J.⁴

21
- 0039224634
- Hybrid learning of mapping and its Jacobian in multilayer neural networks
- J. W. Lee and J. H. Oh, "Hybrid learning of mapping and its Jacobian in multilayer neural networks," Neural Comput., vol. 9, pp. 937-958, 1997.
- (1997) Neural Comput , vol.9 , pp. 937-958
- Lee, J.W.¹ Oh, J.H.²

22
- 0033699871
- Neural networks learning differential data
- R. Masuoka, "Neural networks learning differential data," IEICE Trans. Inf. Syst., vol. E83-D, no. 6, pp. 1291-1300, 2000.
- (2000) IEICE Trans. Inf. Syst , vol.E83-D , Issue.6 , pp. 1291-1300
- Masuoka, R.¹

23
- 0003423896
- New York: Springer-Verlag
- W. Fleming and H. Soner, Controlled Markov Processes and Viscosity Solutions. New York: Springer-Verlag, 1993.
- (1993) Controlled Markov Processes and Viscosity Solutions
- Fleming, W.¹ Soner, H.²

24
- 0018441647
- An approximation theory of optimal control for trainable manipulators
- Mar
- G. Saridis and C. S. Lee, "An approximation theory of optimal control for trainable manipulators," IEEE Trans. Syst., Man, Cybern., vol. SMC-9, no. 3, pp. 152-159, Mar. 1979.
- (1979) IEEE Trans. Syst., Man, Cybern , vol.SMC-9 , Issue.3 , pp. 152-159
- Saridis, G.¹ Lee, C.S.²

25
- 0000442791
- Generalization of back-propagation to recurrent neural networks
- F. Pineda, "Generalization of back-propagation to recurrent neural networks," Phys. Rev. Lett., vol. 19, no. 59, pp. 2229-2232, 1987.
- (1987) Phys. Rev. Lett , vol.19 , Issue.59 , pp. 2229-2232
- Pineda, F.¹

26
- 84890245567
- New York: Wiley
- M. S. Bazarraa, H. D. Sherali, and C. M. Shetty, Nonlinear Programming: Theory and Algorithms. New York: Wiley, 1993.
- (1993) Nonlinear Programming: Theory and Algorithms
- Bazarraa, M.S.¹ Sherali, H.D.² Shetty, C.M.³

27
- 0025536870
- Improving the learning speed of 2-layer neural network by choosing initial values of the adaptive weights
- D. H. Nguyen and B. Widrow, "Improving the learning speed of 2-layer neural network by choosing initial values of the adaptive weights," in Proc. 1st IEEE Int. Joint Conf. Neural Netw., 1990, vol. 3, pp. 21-26.
- (1990) Proc. 1st IEEE Int. Joint Conf. Neural Netw , vol.3 , pp. 21-26
- Nguyen, D.H.¹ Widrow, B.²

28
- 0002020770
- On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals
- J. H. Halton, "On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals," Numerische Mathematik, vol. 2, pp. 84-90, 1960.
- (1960) Numerische Mathematik , vol.2 , pp. 84-90
- Halton, J.H.¹

29
- 0033629916
- Reinforcement learning in continuous time and space
- K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
- (2000) Neural Comput , vol.12 , Issue.1 , pp. 219-245
- Doya, K.¹

30
- 0029200844
- Control system analysis and design upon the Lyapunov method
- Jun
- S. E. Lyshevski and A. U. Meyer, "Control system analysis and design upon the Lyapunov method," in Proc. Amer. Control Conf., Jun. 1995, pp. 3219-3223.
- (1995) Proc. Amer. Control Conf , pp. 3219-3223
- Lyshevski, S.E.¹ Meyer, A.U.²

31
- 84914965022
- On an iterative technique for Riccati equation computations
- Feb
- D. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Autom. Control, vol. 13, no. 1, pp. 114-115, Feb. 1968.
- (1968) IEEE Trans. Autom. Control , vol.13 , Issue.1 , pp. 114-115
- Kleinman, D.¹

32
- 0029514510
- The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
- A. Moore and C. Atkeson, "The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces," Mach. Learn., vol. 21, pp. 1-36, 1995.
- (1995) Mach. Learn , vol.21 , pp. 1-36
- Moore, A.¹ Atkeson, C.²

33
- 0011766779
- Local gain adaptation in stochastic gradient descent ISDIA, Lugano, Switzerland
- Tech. Rep. IDSIA-09-99
- N. N. Schraudolph, "Local gain adaptation in stochastic gradient descent ISDIA, Lugano, Switzerland, Tech. Rep. IDSIA-09-99, 1999, p. 8.
- (1999) , pp. 8
- Schraudolph, N.N.¹

34
- 27844606351
- Support vector regression for the simultaneous learning of a multivariate function and its derivatives
- M. Lazaro, I. Santamaria, F. Perez-Cruz, and A. Artes-Rodriguez, "Support vector regression for the simultaneous learning of a multivariate function and its derivatives," Neurocomput., vol. 69, pp. 42-61, 2005.
- (2005) Neurocomput , vol.69 , pp. 42-61
- Lazaro, M.¹ Santamaria, I.² Perez-Cruz, F.³ Artes-Rodriguez, A.⁴

35
- 0000255539
- Fast exact multiplication by the Hessian
- B. A. Pearlmutter, "Fast exact multiplication by the Hessian," Neural Comput., vol. 6, no. 1, pp. 147-160, 1994.
- (1994) Neural Comput , vol.6 , Issue.1 , pp. 147-160
- Pearlmutter, B.A.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.