SCOPUS 정보 검색 플랫폼

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Volumn 38, Issue 4, 2008, Pages 950-956

Reinforcement learning in continuous time and space: Interference and not Ill conditioning is the main problem when using distributed function approximators

(1) Baddeley, Bart a

a UNIVERSITY OF SUSSEX (United Kingdom)

Author keywords

Continuous time systems; Distributed memory systems; Feedforward neural networks; Ill conditioning; Interference

Indexed keywords

EDUCATION; GRADIENT METHODS; LEARNING SYSTEMS; LIGHTNING; MATHEMATICAL MODELS; MATHEMATICAL PROGRAMMING; NEURAL NETWORKS; POLYNOMIAL APPROXIMATION; REINFORCEMENT; REINFORCEMENT LEARNING; SYSTEMS ENGINEERING;

CONTINUOUS TIME SYSTEMS; CONTINUOUS-TIME; DISTRIBUTED ARCHITECTURES; DISTRIBUTED FUNCTIONS; DISTRIBUTED MEMORY SYSTEMS; FEEDFORWARD NEURAL NETWORKS; FUNCTION APPROXIMATIONS; FUNCTION APPROXIMATORS; GRADIENT DESCENT; HIGH-DIMENSIONAL; ILL-CONDITIONING; INTERFERENCE; LEARNING PROCESSES; LOCAL LINEAR MODELS; MLP NETWORKS; MULTI-LAYER PERCEPTRON; NON-LINEAR MODELING; PENDULUM SWING-UP; RL TECHNIQUES; TEMPORAL DIFFERENCE LEARNING; VALUE FUNCTIONS; VALUE-FUNCTION APPROXIMATION;

LEARNING ALGORITHMS;

ALGORITHM; ARTICLE; ARTIFICIAL NEURAL NETWORK; COMPUTER SIMULATION; FEEDBACK SYSTEM; REINFORCEMENT; SYSTEM ANALYSIS; SYSTEMS THEORY; THEORETICAL MODEL;

ALGORITHMS; COMPUTER SIMULATION; FEEDBACK; MODELS, THEORETICAL; NEURAL NETWORKS (COMPUTER); PROGRAMMING, LINEAR; REINFORCEMENT (PSYCHOLOGY); SYSTEMS THEORY;

EID: 49049087720 PISSN: 10834419 EISSN: None Source Type: Journal
DOI: 10.1109/TSMCB.2008.921000 Document Type: Article

Times cited : (25)

References (31)

1
- 14844340822
- Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
- M. Abu-Khalaf and F. L. Lewis, "Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach," Automatica, vol. 41, no. 5, pp. 779-791, 2005.
- (2005) Automatica , vol.41 , Issue.5 , pp. 779-791
- Abu-Khalaf, M.¹ Lewis, F.L.²

2
- 0000396062
- Natural gradient works efficiently in learning
- Feb
- S. Amari, "Natural gradient works efficiently in learning," Neural Comput., vol. 10, no. 2, pp. 251-276, Feb. 1998.
- (1998) Neural Comput , vol.10 , Issue.2 , pp. 251-276
- Amari, S.¹

3
- 0344154963
- Strategy learning with multilayer connectionist representations
- C. W. Anderson, "Strategy learning with multilayer connectionist representations," in Proc. 4th Int. Workshop Mach. Learn., 1987, pp. 103-114.
- (1987) Proc. 4th Int. Workshop Mach. Learn , pp. 103-114
- Anderson, C.W.¹

4
- 0031457098
- Avoiding catastrophic forgetting by coupling two reverberating neural networks
- B. Ans and S. Rousset, "Avoiding catastrophic forgetting by coupling two reverberating neural networks," C. R. Acad. sci., Sér., Sci. vie, vol. 320, no. 12, pp. 989-997, 1997.
- (1997) C. R. Acad. sci., Sér., Sci. vie , vol.320 , Issue.12 , pp. 989-997
- Ans, B.¹ Rousset, S.²

5
- 0004370245
- Wright Lab, Wright Patterson Air Force Base, Dayton, OH, Tech. Rep. WL-TR-93-1146
- L. C. Baird, "Advantage updating," Wright Lab., Wright Patterson Air Force Base, Dayton, OH, Tech. Rep. WL-TR-93-1146, 1993.
- (1993) Advantage updating
- Baird, L.C.¹

6
- 0027599793
- Universal approximation bounds for superpositions of a sigmoidal function
- May
- A. R. Barron, "Universal approximation bounds for superpositions of a sigmoidal function," IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 930-945, May 1993.
- (1993) IEEE Trans. Inf. Theory , vol.39 , Issue.3 , pp. 930-945
- Barron, A.R.¹

7
- 0001864005
- On the bang-bang control problem
- R. Bellman, I. Glicksberg, and O. Gross, "On the bang-bang control problem," Q. Appl. Math., vol. 14, no. 1, pp. 11-18, 1956.
- (1956) Q. Appl. Math , vol.14 , Issue.1 , pp. 11-18
- Bellman, R.¹ Glicksberg, I.² Gross, O.³

8
- 0003565783
- Belmont, MA: Athena Scientific
- D. P. Bertsekas, Dynamic Programming and Optimal Control. Belmont, MA: Athena Scientific, 1995.
- (1995) Dynamic Programming and Optimal Control
- Bertsekas, D.P.¹

9
- 0003487482
- Belmont, MA: Athena Scientific
- D. P. Bertsekas and J. N. Tsitsiklis, Neuro-Dynamic Programming. Belmont, MA: Athena Scientific, 1996.
- (1996) Neuro-Dynamic Programming
- Bertsekas, D.P.¹ Tsitsiklis, J.N.²

10
- 0003487601
- Oxford, U.K, Oxford Univ. Press
- C. Bishop, Neural Networks for Pattern Recognition. Oxford, U.K.: Oxford Univ. Press, 1995.
- (1995) Neural Networks for Pattern Recognition
- Bishop, C.¹

11
- 10944228202
- Reinforcement learning using neural networks, with applications to motor control,
- M.S. thesis, Inst. Nat. Polytech. Grenoble, Grenoble, France
- R. Coulom, "Reinforcement learning using neural networks, with applications to motor control," M.S. thesis, Inst. Nat. Polytech. Grenoble, Grenoble, France, 2002.
- (2002)
- Coulom, R.¹

12
- 34250718115
- High-accuracy value-function approximation with neural networks applied to the acrobat
- M. Verleysen, Ed, Bruges, Belgium
- R. Coulom, "High-accuracy value-function approximation with neural networks applied to the acrobat," in Proc. Eur. Symp. Artif. Neural Netw., M. Verleysen, Ed., Bruges, Belgium, 2004.
- (2004) Proc. Eur. Symp. Artif. Neural Netw
- Coulom, R.¹

13
- 85156231814
- Temporal difference learning in continuous time and space
- D. S Touretzky, M. C Mozer, and M. E Hasselmo, Eds. Cambridge, MA: MIT Press
- K. Doya, "Temporal difference learning in continuous time and space," in Advances in Neural Information Processing Systems, vol. 8, D. S Touretzky, M. C Mozer, and M. E Hasselmo, Eds. Cambridge, MA: MIT Press, 1996, pp. 1073-1079.
- (1996) Advances in Neural Information Processing Systems , vol.8 , pp. 1073-1079
- Doya, K.¹

14
- 0033629916
- Reinforcement learning in continuous time and space
- Jan
- K. Doya, "Reinforcement learning in continuous time and space," Neural Comput., vol. 12, no. 1, pp. 219-245, Jan. 2000.
- (2000) Neural Comput , vol.12 , Issue.1 , pp. 219-245
- Doya, K.¹

15
- 0347683832
- Pseudo-recurrent connectionist networks: An approach to the 'sensitivity-stability' dilemma
- Dec
- R. M. French, "Pseudo-recurrent connectionist networks: An approach to the 'sensitivity-stability' dilemma," Connect. Sci., vol. 9, no. 4, pp. 353-380, Dec. 1997.
- (1997) Connect. Sci , vol.9 , Issue.4 , pp. 353-380
- French, R.M.¹

16
- 84880694195
- Stable function approximation in dynamic programming
- A. Prieditis and S. Russel, Eds, San Francisco, CA
- G. J. Gordon, "Stable function approximation in dynamic programming," in Proc. 12th Int. Conf. Mach. Learn., A. Prieditis and S. Russel, Eds., San Francisco, CA, 1995.
- (1995) Proc. 12th Int. Conf. Mach. Learn
- Gordon, G.J.¹

17
- 0029679044
- Reinforcement learning: A survey
- L. P. Kaelbling, M. L. Littman, and A. W. Moore, "Reinforcement learning: A survey," J. Artif. Intell. Res., vol. 4, pp. 237-285, 1996.
- (1996) J. Artif. Intell. Res , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

18
- 0000873069
- A method for the solution of certain non-linear problems in least squares
- K. Levenberg, "A method for the solution of certain non-linear problems in least squares," Q. J. Appl. Math., vol. II, no. 2, pp. 164-168, 1944.
- (1944) Q. J. Appl. Math , vol.2 , Issue.2 , pp. 164-168
- Levenberg, K.¹

19
- 0000169232
- An algorithm for least-squares estimation of nonlinear parameters
- Jun
- D. W. Marquardt, "An algorithm for least-squares estimation of nonlinear parameters," J. Soc. Ind. Appl. Math., vol. 11, no. 2, pp. 431-441, Jun. 1963.
- (1963) J. Soc. Ind. Appl. Math , vol.11 , Issue.2 , pp. 431-441
- Marquardt, D.W.¹

20
- 0027205884
- A scaled conjugate gradient algorithm for fast supervised learning
- M. F. Moller, "A scaled conjugate gradient algorithm for fast supervised learning," Neural Netw., vol. 6, no. 4, pp. 525-533, 1993.
- (1993) Neural Netw , vol.6 , Issue.4 , pp. 525-533
- Moller, M.F.¹

21
- 0033308517
- Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation
- R. Munos, L. C. Baird, and A. W. Moore, "Gradient descent approaches to neural-net-based solutions of the Hamilton-Jacobi-Bellman equation," in Proc. Int. Joint Conf. Neural Netw., 1999, pp. 2152-2157.
- (1999) Proc. Int. Joint Conf. Neural Netw , pp. 2152-2157
- Munos, R.¹ Baird, L.C.² Moore, A.W.³

22
- 0009589301
- How to train neural networks
- G. B. Orr and K.-R. Müller, Eds. New York: Springer-Verlag
- R. Neuneier and H.-G. Zimmermann, "How to train neural networks," in Neural Networks: Tricks of the Trade, G. B. Orr and K.-R. Müller, Eds. New York: Springer-Verlag, 1998.
- (1998) Neural Networks: Tricks of the Trade
- Neuneier, R.¹ Zimmermann, H.-G.²

23
- 0039225088
- On-line estimation of the optimal value function: HJB-estimates
- C. L. Giles, S. J. Hanson, and J. D. Cowan, Eds. San Mateo, CA: Morgan Kaufmann
- J. K. Peterson, "On-line estimation of the optimal value function: HJB-estimates," in Advances in Neural Information Processing Systems vol. 5, C. L. Giles, S. J. Hanson, and J. D. Cowan, Eds. San Mateo, CA: Morgan Kaufmann, 1993, pp. 319-326.
- (1993) Advances in Neural Information Processing Systems , vol.5 , pp. 319-326
- Peterson, J.K.¹

24
- 84943274699
- A direct adaptive method for faster backpropagation learning: The RPROP algorithm
- M. Riedmiller and H. Braun, "A direct adaptive method for faster backpropagation learning: The RPROP algorithm," in Proc. IEEE Int. Conf. Neural Netw., 1993, pp. 586-591.
- (1993) Proc. IEEE Int. Conf. Neural Netw , pp. 586-591
- Riedmiller, M.¹ Braun, H.²

25
- 38149038993
- Catastrophic forgetting, rehearsal, and pseudorehearsal
- Jun
- A. Robins, "Catastrophic forgetting, rehearsal, and pseudorehearsal," Connect. Sci., vol. 7, no. 2, pp. 123-146, Jun. 1995.
- (1995) Connect. Sci , vol.7 , Issue.2 , pp. 123-146
- Robins, A.¹

26
- 0030896968
- A neural substrate of prediction and reward
- Mar
- W. Shultz, P. Dayan, and P. R. Montague, "A neural substrate of prediction and reward," Science, vol. 275, no. 5306, pp. 1593-1599, Mar. 1997.
- (1997) Science , vol.275 , Issue.5306 , pp. 1593-1599
- Shultz, W.¹ Dayan, P.² Montague, P.R.³

27
- 33847202724
- Learning to predict by the methods of temporal differences
- Aug
- R. S. Sutton, "Learning to predict by the methods of temporal differences," Mach. Learn., vol. 3, no. 1, pp. 9-44, Aug. 1988.
- (1988) Mach. Learn , vol.3 , Issue.1 , pp. 9-44
- Sutton, R.S.¹

28
- 0004102479
- Cambridge, MA: MIT Press
- R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction Cambridge, MA: MIT Press, 1998.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

29
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- R. S. Sutton, D. McAllester, S. Singh, and Y. Mansour, "Policy gradient methods for reinforcement learning with function approximation," in Proc. Advances Neural Inf. Process. Syst., 2000, vol. 12, pp. 1057-1063.
- (2000) Proc. Advances Neural Inf. Process. Syst , vol.12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

30
- 0029276036
- Temporal difference learning and TD-Gammon
- Mar
- G. Tesauro, "Temporal difference learning and TD-Gammon," Commun. ACM, vol. 38, no. 3, pp. 58-68, Mar. 1995.
- (1995) Commun. ACM , vol.38 , Issue.3 , pp. 58-68
- Tesauro, G.¹

31
- 0036829017
- Optimal feedback control as a theory of motor coordination
- Nov
- E. Todorov and M. Jordan, "Optimal feedback control as a theory of motor coordination," Nat. Neurosci., vol. 5, no. 11, pp. 1226-1235, Nov. 2002.
- (2002) Nat. Neurosci , vol.5 , Issue.11 , pp. 1226-1235
- Todorov, E.¹ Jordan, M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.