메뉴 건너뛰기




Volumn 38, Issue 4, 2008, Pages 950-956

Reinforcement learning in continuous time and space: Interference and not Ill conditioning is the main problem when using distributed function approximators

Author keywords

Continuous time systems; Distributed memory systems; Feedforward neural networks; Ill conditioning; Interference

Indexed keywords

EDUCATION; GRADIENT METHODS; LEARNING SYSTEMS; LIGHTNING; MATHEMATICAL MODELS; MATHEMATICAL PROGRAMMING; NEURAL NETWORKS; POLYNOMIAL APPROXIMATION; REINFORCEMENT; REINFORCEMENT LEARNING; SYSTEMS ENGINEERING;

EID: 49049087720     PISSN: 10834419     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSMCB.2008.921000     Document Type: Article
Times cited : (25)

References (31)
  • 1
    • 14844340822 scopus 로고    scopus 로고
    • Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    • M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
    • (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
    • Abu-Khalaf, M.1    Lewis, F.L.2
  • 2
    • 0000396062 scopus 로고    scopus 로고
    • Natural gradient works efficiently in learning
    • Feb
    • S. Amari, "Natural gradient works efficiently in learning," Neural Comput., vol. 10, no. 2, pp. 251-276, Feb. 1998.
    • (1998) Neural Comput , vol.10 , Issue.2 , pp. 251-276
    • Amari, S.1
  • 3
    • 0344154963 scopus 로고
    • Strategy learning with multilayer connectionist representations
    • C. W. Anderson, "Strategy learning with multilayer connectionist representations," in Proc. 4th Int. Workshop Mach. Learn., 1987, pp. 103-114.
    • (1987) Proc. 4th Int. Workshop Mach. Learn , pp. 103-114
    • Anderson, C.W.1
  • 4
    • 0031457098 scopus 로고    scopus 로고
    • Avoiding catastrophic forgetting by coupling two reverberating neural networks
    • B. Ans and S. Rousset, "Avoiding catastrophic forgetting by coupling two reverberating neural networks," C. R. Acad. sci., Sér., Sci. vie, vol. 320, no. 12, pp. 989-997, 1997.
    • (1997) C. R. Acad. sci., Sér., Sci. vie , vol.320 , Issue.12 , pp. 989-997
    • Ans, B.1    Rousset, S.2
  • 5
    • 0004370245 scopus 로고
    • Wright Lab, Wright Patterson Air Force Base, Dayton, OH, Tech. Rep. WL-TR-93-1146
    • L. C. Baird, "Advantage updating," Wright Lab., Wright Patterson Air Force Base, Dayton, OH, Tech. Rep. WL-TR-93-1146, 1993.
    • (1993) Advantage updating
    • Baird, L.C.1
  • 6
    • 0027599793 scopus 로고
    • Universal approximation bounds for superpositions of a sigmoidal function
    • May
    • A. R. Barron, "Universal approximation bounds for superpositions of a sigmoidal function," IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 930-945, May 1993.
    • (1993) IEEE Trans. Inf. Theory , vol.39 , Issue.3 , pp. 930-945
    • Barron, A.R.1
  • 7
    • 0001864005 scopus 로고
    • On the bang-bang control problem
    • R. Bellman, I. Glicksberg, and O. Gross, "On the bang-bang control problem," Q. Appl. Math., vol. 14, no. 1, pp. 11-18, 1956.
    • (1956) Q. Appl. Math , vol.14 , Issue.1 , pp. 11-18
    • Bellman, R.1    Glicksberg, I.2    Gross, O.3
  • 11
    • 10944228202 scopus 로고    scopus 로고
    • Reinforcement learning using neural networks, with applications to motor control,
    • M.S. thesis, Inst. Nat. Polytech. Grenoble, Grenoble, France
    • R. Coulom, "Reinforcement learning using neural networks, with applications to motor control," M.S. thesis, Inst. Nat. Polytech. Grenoble, Grenoble, France, 2002.
    • (2002)
    • Coulom, R.1
  • 12
    • 34250718115 scopus 로고    scopus 로고
    • High-accuracy value-function approximation with neural networks applied to the acrobat
    • M. Verleysen, Ed, Bruges, Belgium
    • R. Coulom, "High-accuracy value-function approximation with neural networks applied to the acrobat," in Proc. Eur. Symp. Artif. Neural Netw., M. Verleysen, Ed., Bruges, Belgium, 2004.
    • (2004) Proc. Eur. Symp. Artif. Neural Netw
    • Coulom, R.1
  • 13
    • 85156231814 scopus 로고    scopus 로고
    • Temporal difference learning in continuous time and space
    • D. S Touretzky, M. C Mozer, and M. E Hasselmo, Eds. Cambridge, MA: MIT Press
    • K. Doya, "Temporal difference learning in continuous time and space," in Advances in Neural Information Processing Systems, vol. 8, D. S Touretzky, M. C Mozer, and M. E Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, pp. 1073-1079.
    • (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1073-1079
    • Doya, K.1
  • 14
    • 0033629916 scopus 로고    scopus 로고
    • Reinforcement learning in continuous time and space
    • Jan
    • K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, Jan. 2000.
    • (2000) Neural Comput , vol.12 , Issue.1 , pp. 219-245
    • Doya, K.1
  • 15
    • 0347683832 scopus 로고    scopus 로고
    • Pseudo-recurrent connectionist networks: An approach to the 'sensitivity-stability' dilemma
    • Dec
    • R. M. French, "Pseudo-recurrent connectionist networks: An approach to the 'sensitivity-stability' dilemma," Connect. Sci., vol. 9, no. 4, pp. 353-380, Dec. 1997.
    • (1997) Connect. Sci , vol.9 , Issue.4 , pp. 353-380
    • French, R.M.1
  • 16
    • 84880694195 scopus 로고
    • Stable function approximation in dynamic programming
    • A. Prieditis and S. Russel, Eds, San Francisco, CA
    • G. J. Gordon, "Stable function approximation in dynamic programming," in Proc. 12th Int. Conf. Mach. Learn., A. Prieditis and S. Russel, Eds., San Francisco, CA, 1995.
    • (1995) Proc. 12th Int. Conf. Mach. Learn
    • Gordon, G.J.1
  • 18
    • 0000873069 scopus 로고
    • A method for the solution of certain non-linear problems in least squares
    • K. Levenberg, "A method for the solution of certain non-linear problems in least squares," Q. J. Appl. Math., vol. II, no. 2, pp. 164-168, 1944.
    • (1944) Q. J. Appl. Math , vol.2 , Issue.2 , pp. 164-168
    • Levenberg, K.1
  • 19
    • 0000169232 scopus 로고
    • An algorithm for least-squares estimation of nonlinear parameters
    • Jun
    • D. W. Marquardt, "An algorithm for least-squares estimation of nonlinear parameters," J. Soc. Ind. Appl. Math., vol. 11, no. 2, pp. 431-441, Jun. 1963.
    • (1963) J. Soc. Ind. Appl. Math , vol.11 , Issue.2 , pp. 431-441
    • Marquardt, D.W.1
  • 20
    • 0027205884 scopus 로고
    • A scaled conjugate gradient algorithm for fast supervised learning
    • M. F. Moller, "A scaled conjugate gradient algorithm for fast supervised learning," Neural Netw., vol. 6, no. 4, pp. 525-533, 1993.
    • (1993) Neural Netw , vol.6 , Issue.4 , pp. 525-533
    • Moller, M.F.1
  • 21
    • 0033308517 scopus 로고    scopus 로고
    • Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation
    • R. Munos, L. C. Baird, and A. W. Moore, "Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation," in Proc. Int. Joint Conf. Neural Netw., 1999, pp. 2152-2157.
    • (1999) Proc. Int. Joint Conf. Neural Netw , pp. 2152-2157
    • Munos, R.1    Baird, L.C.2    Moore, A.W.3
  • 22
    • 0009589301 scopus 로고    scopus 로고
    • How to train neural networks
    • G. B. Orr and K.-R. Müller, Eds. New York: Springer-Verlag
    • R. Neuneier and H.-G. Zimmermann, "How to train neural networks," in Neural Networks: Tricks of the Trade, G. B. Orr and K.-R. Müller, Eds. New York: Springer-Verlag, 1998.
    • (1998) Neural Networks: Tricks of the Trade
    • Neuneier, R.1    Zimmermann, H.-G.2
  • 23
    • 0039225088 scopus 로고
    • On-line estimation of the optimal value function: HJB-estimates
    • C. L. Giles, S. J. Hanson, and J. D. Cowan, Eds. San Mateo, CA: Morgan Kaufmann
    • J. K. Peterson, "On-line estimation of the optimal value function: HJB-estimates," in Advances in Neural Information Processing Systems vol. 5, C. L. Giles, S. J. Hanson, and J. D. Cowan, Eds. San Mateo, CA: Morgan Kaufmann, 1993, pp. 319-326.
    • (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 319-326
    • Peterson, J.K.1
  • 24
    • 84943274699 scopus 로고
    • A direct adaptive method for faster backpropagation learning: The RPROP algorithm
    • M. Riedmiller and H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm," in Proc. IEEE Int. Conf. Neural Netw., 1993, pp. 586-591.
    • (1993) Proc. IEEE Int. Conf. Neural Netw , pp. 586-591
    • Riedmiller, M.1    Braun, H.2
  • 25
    • 38149038993 scopus 로고
    • Catastrophic forgetting, rehearsal, and pseudorehearsal
    • Jun
    • A. Robins, "Catastrophic forgetting, rehearsal, and pseudorehearsal," Connect. Sci., vol. 7, no. 2, pp. 123-146, Jun. 1995.
    • (1995) Connect. Sci , vol.7 , Issue.2 , pp. 123-146
    • Robins, A.1
  • 26
    • 0030896968 scopus 로고    scopus 로고
    • A neural substrate of prediction and reward
    • Mar
    • W. Shultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science, vol. 275, no. 5306, pp. 1593-1599, Mar. 1997.
    • (1997) Science , vol.275 , Issue.5306 , pp. 1593-1599
    • Shultz, W.1    Dayan, P.2    Montague, P.R.3
  • 27
    • 33847202724 scopus 로고
    • Learning to predict by the methods of temporal differences
    • Aug
    • R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, Aug. 1988.
    • (1988) Mach. Learn , vol.3 , Issue.1 , pp. 9-44
    • Sutton, R.S.1
  • 30
    • 0029276036 scopus 로고
    • Temporal difference learning and TD-Gammon
    • Mar
    • G. Tesauro, "Temporal difference learning and TD-Gammon," Commun. ACM, vol. 38, no. 3, pp. 58-68, Mar. 1995.
    • (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
    • Tesauro, G.1
  • 31
    • 0036829017 scopus 로고    scopus 로고
    • Optimal feedback control as a theory of motor coordination
    • Nov
    • E. Todorov and M. Jordan, "Optimal feedback control as a theory of motor coordination," Nat. Neurosci., vol. 5, no. 11, pp. 1226-1235, Nov. 2002.
    • (2002) Nat. Neurosci , vol.5 , Issue.11 , pp. 1226-1235
    • Todorov, E.1    Jordan, M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.