메뉴 건너뛰기




Volumn 18, Issue 4, 2007, Pages 1031-1041

Least squares solutions of the HJB equation with neural network value-function approximators

Author keywords

Differential neural networks (NNs); Dynamic programming; Feedforward neural networks; Hamilton Jacoby Bellman (HJB) equation; Optimal control; Viscosity solution

Indexed keywords

FIRST- AND SECOND-ORDER DIFFERENTIAL BACKPROPAGATION; HAMILTON-JACOBI-BELLMAN (HLB) RESIDUAL; INVERTED-PENDULUM SYSTEM; VALUE FUNCTION COMPLEXITY;

EID: 34547095501     PISSN: 10459227     EISSN: None     Source Type: Journal    
DOI: 10.1109/TNN.2007.899249     Document Type: Article
Times cited : (74)

References (35)
  • 1
    • 84967758647 scopus 로고
    • Viscosity solutions of Hamilton-Jacobi equations
    • M. Crandall and P. Lions, "Viscosity solutions of Hamilton-Jacobi equations," Trans. Amer. Math. Soc., vol. 277, 1983.
    • (1983) Trans. Amer. Math. Soc , vol.277
    • Crandall, M.1    Lions, P.2
  • 2
    • 85153940465 scopus 로고
    • Generalization in reinforcement learning: Safely approximating the value function
    • G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press
    • J. A. Boyan and A. W. Moore, "Generalization in reinforcement learning: Safely approximating the value function," in Advances in Neural Information Processing Systems 7, G. Tesauro, D. S. Touretzky, and T. K. Leen, Eds. Cambridge, MA: MIT Press, 1995, pp. 369-376.
    • (1995) Advances in Neural Information Processing Systems 7 , pp. 369-376
    • Boyan, J.A.1    Moore, A.W.2
  • 3
    • 0024866495 scopus 로고
    • On the approximate realization of continuous mappings by neural networks
    • K.-I. Funahashi, "On the approximate realization of continuous mappings by neural networks," Neural Netw., vol. 2, pp. 183-192, 1989.
    • (1989) Neural Netw , vol.2 , pp. 183-192
    • Funahashi, K.-I.1
  • 4
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    • M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.L.2
  • 5
    • 10944228202 scopus 로고    scopus 로고
    • Reinforcement learning using neural networks, with applications to motor control,
    • Ph.D. dissertation, Institut National Polytechnique de Grenoble, Grenoble, France
    • R. Coulom, "Reinforcement learning using neural networks, with applications to motor control," Ph.D. dissertation, Institut National Polytechnique de Grenoble, Grenoble, France, 2002.
    • (2002)
    • Coulom, R.1
  • 8
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, pp. 9-44, 1988.
    • (1988) Mach. Learn , vol.3 , pp. 9-44
    • Sutton, R.S.1
  • 9
    • 0030896968 scopus 로고    scopus 로고
    • A neural substrate of prediction and reward
    • W. Schultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science, vol. 275, pp. 1593-1599, 1997.
    • (1997) Science , vol.275 , pp. 1593-1599
    • Schultz, W.1    Dayan, P.2    Montague, P.R.3
  • 11
    • 84880680664 scopus 로고    scopus 로고
    • Variable resolution discretization for high-accuracy solutions of optimal control problems
    • R. Munos and A. W. Moore, "Variable resolution discretization for high-accuracy solutions of optimal control problems," in Proc. Int. Joint Conf. Artif. Intell., 1999, pp. 1348-1355.
    • (1999) Proc. Int. Joint Conf. Artif. Intell , pp. 1348-1355
    • Munos, R.1    Moore, A.W.2
  • 12
    • 0004671869 scopus 로고    scopus 로고
    • Temporal difference learning in continuous time and space
    • D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press
    • K. Doya, "Temporal difference learning in continuous time and space," in Advances in Neural Information Processing Systems, D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, vol. 8.
    • (1996) Advances in Neural Information Processing Systems , vol.8
    • Doya, K.1
  • 13
    • 2542485629 scopus 로고
    • Practical issues in temporal difference learning
    • J. E. Moody, S. J. Hanson, and R. P. Lippmann, Eds. San Mateo, CA: Morgan Kaufmann
    • G. Tesauro, "Practical issues in temporal difference learning," in Advances in Neural Information Processing Systems, J. E. Moody, S. J. Hanson, and R. P. Lippmann, Eds. San Mateo, CA: Morgan Kaufmann, 1992, vol. 4, pp. 259-266.
    • (1992) Advances in Neural Information Processing Systems , vol.4 , pp. 259-266
    • Tesauro, G.1
  • 14
    • 0033308517 scopus 로고    scopus 로고
    • Gradient descent approaches to neural net-based solutions of the Hamilton-Jacobi-Bellman equation
    • R. Munos, L. Baird, and A. Moore, "Gradient descent approaches to neural net-based solutions of the Hamilton-Jacobi-Bellman equation," in Proc. Int. Joint Conf. Neural Netw., 1999, pp. 1316-1323.
    • (1999) Proc. Int. Joint Conf. Neural Netw , pp. 1316-1323
    • Munos, R.1    Baird, L.2    Moore, A.3
  • 15
    • 0003270924 scopus 로고
    • Issues in using function approximation for reinforcement learning
    • M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, Eds
    • S. Thrun and A. Schwartz, "Issues in using function approximation for reinforcement learning," in Proc. 1993 Connectionist Models Summer School, M. Mozer, P. Smolensky, D. Touretzky, J. Elman, and A. Weigend, Eds., 1993, pp. 255-263.
    • (1993) Proc. 1993 Connectionist Models Summer School , pp. 255-263
    • Thrun, S.1    Schwartz, A.2
  • 18
    • 0025399567 scopus 로고
    • Identification and control of dynamical systems using neural networks
    • Mar
    • K. S. Narendra and K. Parthasarathy, "Identification and control of dynamical systems using neural networks," IEEE Trans. Neural Netw. vol. 1, no. 1, pp. 4-27, Mar. 1990.
    • (1990) IEEE Trans. Neural Netw , vol.1 , Issue.1 , pp. 4-27
    • Narendra, K.S.1    Parthasarathy, K.2
  • 19
    • 0027594098 scopus 로고
    • On the nonlinear optimal regulator problem
    • C. J. Goh, "On the nonlinear optimal regulator problem," Automatica vol. 29, no. 3, pp. 751-756, 1993.
    • (1993) Automatica , vol.29 , Issue.3 , pp. 751-756
    • Goh, C.J.1
  • 20
    • 0001440803 scopus 로고
    • Tangent prop - A formalism for specifying selected invariances in an adaptive network
    • J. M. R. Lippman and S. J. Hanson, Eds. San Mateo, CA: Morgan Kaufmann
    • P. Simard, B. Victorri, Y. LeCun, and J. Denker, "Tangent prop - A formalism for specifying selected invariances in an adaptive network," in Neural Information Processing Systems, J. M. R. Lippman and S. J. Hanson, Eds. San Mateo, CA: Morgan Kaufmann, 1992, vol. 4.
    • (1992) Neural Information Processing Systems , vol.4
    • Simard, P.1    Victorri, B.2    LeCun, Y.3    Denker, J.4
  • 21
    • 0039224634 scopus 로고    scopus 로고
    • Hybrid learning of mapping and its Jacobian in multilayer neural networks
    • J. W. Lee and J. H. Oh, "Hybrid learning of mapping and its Jacobian in multilayer neural networks," Neural Comput., vol. 9, pp. 937-958, 1997.
    • (1997) Neural Comput , vol.9 , pp. 937-958
    • Lee, J.W.1    Oh, J.H.2
  • 22
    • 0033699871 scopus 로고    scopus 로고
    • Neural networks learning differential data
    • R. Masuoka, "Neural networks learning differential data," IEICE Trans. Inf. Syst., vol. E83-D, no. 6, pp. 1291-1300, 2000.
    • (2000) IEICE Trans. Inf. Syst , vol.E83-D , Issue.6 , pp. 1291-1300
    • Masuoka, R.1
  • 24
    • 0018441647 scopus 로고
    • An approximation theory of optimal control for trainable manipulators
    • Mar
    • G. Saridis and C. S. Lee, "An approximation theory of optimal control for trainable manipulators," IEEE Trans. Syst., Man, Cybern., vol. SMC-9, no. 3, pp. 152-159, Mar. 1979.
    • (1979) IEEE Trans. Syst., Man, Cybern , vol.SMC-9 , Issue.3 , pp. 152-159
    • Saridis, G.1    Lee, C.S.2
  • 25
    • 0000442791 scopus 로고
    • Generalization of back-propagation to recurrent neural networks
    • F. Pineda, "Generalization of back-propagation to recurrent neural networks," Phys. Rev. Lett., vol. 19, no. 59, pp. 2229-2232, 1987.
    • (1987) Phys. Rev. Lett , vol.19 , Issue.59 , pp. 2229-2232
    • Pineda, F.1
  • 27
    • 0025536870 scopus 로고
    • Improving the learning speed of 2-layer neural network by choosing initial values of the adaptive weights
    • D. H. Nguyen and B. Widrow, "Improving the learning speed of 2-layer neural network by choosing initial values of the adaptive weights," in Proc. 1st IEEE Int. Joint Conf. Neural Netw., 1990, vol. 3, pp. 21-26.
    • (1990) Proc. 1st IEEE Int. Joint Conf. Neural Netw , vol.3 , pp. 21-26
    • Nguyen, D.H.1    Widrow, B.2
  • 28
    • 0002020770 scopus 로고
    • On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals
    • J. H. Halton, "On the efficiency of certain quasi-random sequences of points in evaluating multi-dimensional integrals," Numerische Mathematik, vol. 2, pp. 84-90, 1960.
    • (1960) Numerische Mathematik , vol.2 , pp. 84-90
    • Halton, J.H.1
  • 29
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, 2000.
    • (2000) Neural Comput , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 30
    • 0029200844 scopus 로고
    • Control system analysis and design upon the Lyapunov method
    • Jun
    • S. E. Lyshevski and A. U. Meyer, "Control system analysis and design upon the Lyapunov method," in Proc. Amer. Control Conf., Jun. 1995, pp. 3219-3223.
    • (1995) Proc. Amer. Control Conf , pp. 3219-3223
    • Lyshevski, S.E.1    Meyer, A.U.2
  • 31
    • 84914965022 scopus 로고
    • On an iterative technique for Riccati equation computations
    • Feb
    • D. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Autom. Control, vol. 13, no. 1, pp. 114-115, Feb. 1968.
    • (1968) IEEE Trans. Autom. Control , vol.13 , Issue.1 , pp. 114-115
    • Kleinman, D.1
  • 32
    • 0029514510 scopus 로고
    • The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces
    • A. Moore and C. Atkeson, "The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces," Mach. Learn., vol. 21, pp. 1-36, 1995.
    • (1995) Mach. Learn , vol.21 , pp. 1-36
    • Moore, A.1    Atkeson, C.2
  • 33
    • 0011766779 scopus 로고    scopus 로고
    • Local gain adaptation in stochastic gradient descent ISDIA, Lugano, Switzerland
    • Tech. Rep. IDSIA-09-99
    • N. N. Schraudolph, "Local gain adaptation in stochastic gradient descent ISDIA, Lugano, Switzerland, Tech. Rep. IDSIA-09-99, 1999, p. 8.
    • (1999) , pp. 8
    • Schraudolph, N.N.1
  • 34
    • 27844606351 scopus 로고    scopus 로고
    • Support vector regression for the simultaneous learning of a multivariate function and its derivatives
    • M. Lazaro, I. Santamaria, F. Perez-Cruz, and A. Artes-Rodriguez, "Support vector regression for the simultaneous learning of a multivariate function and its derivatives," Neurocomput., vol. 69, pp. 42-61, 2005.
    • (2005) Neurocomput , vol.69 , pp. 42-61
    • Lazaro, M.1    Santamaria, I.2    Perez-Cruz, F.3    Artes-Rodriguez, A.4
  • 35
    • 0000255539 scopus 로고
    • Fast exact multiplication by the Hessian
    • B. A. Pearlmutter, "Fast exact multiplication by the Hessian," Neural Comput., vol. 6, no. 1, pp. 147-160, 1994.
    • (1994) Neural Comput , vol.6 , Issue.1 , pp. 147-160
    • Pearlmutter, B.A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.