SCOPUS 정보 검색 플랫폼

Neural Networks

Volumn 41, Issue , 2013, Pages 156-167

Autonomous reinforcement learning with experience replay

(2) Wawrzyński, Paweł a Tanwani, Ajay Kumar b

a WARSAW UNIVERSITY OF TECHNOLOGY (Poland)

b EPFL (Switzerland)

Author keywords

Actor critic; Autonomous learning; Reinforcement learning; Step size estimation

Indexed keywords

ACTOR CRITIC; AUTONOMOUS LEARNING; CONTROL POLICY; CONTROL TASK; FIXED-POINT ALGORITHMS; LEARNING CONTROL; ON-LINE NEURAL NETWORKS; STEP SIZE;

LEARNING ALGORITHMS; NEURAL NETWORKS;

REINFORCEMENT LEARNING;

ARTICLE; ARTIFICIAL NEURAL NETWORK; CALCULATION; LEARNING ALGORITHM; MATHEMATICAL COMPUTING; PRIORITY JOURNAL; PROBABILITY; PROCESS OPTIMIZATION; SIMULATION;

ALGORITHMS; ANIMALS; ARTIFICIAL INTELLIGENCE; COMPUTER SIMULATION; MARKOV CHAINS; MODELS, THEORETICAL; MOVEMENT; NEURAL NETWORKS (COMPUTER); OCTOPODIFORMES; PROBLEM SOLVING; REINFORCEMENT (PSYCHOLOGY); STOCHASTIC PROCESSES;

EID: 84875884428 PISSN: 08936080 EISSN: 18792782 Source Type: Journal
DOI: 10.1016/j.neunet.2012.11.007 Document Type: Article

Times cited : (58)

References (39)

1
- 31844444663
- Exploration and apprenticeship learning in reinforcement learning
- ACM
- Abbeel P., Ng A.Y. Exploration and apprenticeship learning in reinforcement learning. Proc. of the 22nd ICML 2005, 1-8. ACM.
- (2005) Proc. of the 22nd ICML , pp. 1-8
- Abbeel, P.¹ Ng, A.Y.²

2
- 84857501996
- Experience replay for real-time reinforcement learning control
- Adam S., Busoniu L., Babuska R. Experience replay for real-time reinforcement learning control. IEEE Transactions on Systems, Man, and Cybernetics, Part C 2012, 42(2):201-212.
- (2012) IEEE Transactions on Systems, Man, and Cybernetics, Part C , vol.42 , Issue.2 , pp. 201-212
- Adam, S.¹ Busoniu, L.² Babuska, R.³

3
- 0020970738
- Neuronlike adaptive elements that can learn difficult learning control problems
- Barto A.G., Sutton R.S., Anderson C.W. Neuronlike adaptive elements that can learn difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics 1983, 13:834-846.
- (1983) IEEE Transactions on Systems, Man, and Cybernetics , vol.13 , pp. 834-846
- Barto, A.G.¹ Sutton, R.S.² Anderson, C.W.³

4
- 33750119752
- On adaptive learning rate that guarantees convergence in feedforward networks
- Behera L., Kumar S., Patnaik A. On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Transactions on Neural Networks 2006, 17(5):1116-1125.
- (2006) IEEE Transactions on Neural Networks , vol.17 , Issue.5 , pp. 1116-1125
- Behera, L.¹ Kumar, S.² Patnaik, A.³

5
- 70349984547
- Natural actor-critic algorithms
- Bhatnagar S., Sutton R., Ghavamzadeh M., Lee M. Natural actor-critic algorithms. Automatica 2009, 45:2471-2482.
- (2009) Automatica , vol.45 , pp. 2471-2482
- Bhatnagar, S.¹ Sutton, R.² Ghavamzadeh, M.³ Lee, M.⁴

6
- 67649594251
- Kinematically redundant manipulators
- Chiaverini S., Oriolo G., Walker I.D. Kinematically redundant manipulators. Springer handbook of robotics 2008, 245-268.
- (2008) Springer handbook of robotics , pp. 245-268
- Chiaverini, S.¹ Oriolo, G.² Walker, I.D.³

7
- 0032649518
- An analysis of experience replay in temporal difference learning
- Cichosz P. An analysis of experience replay in temporal difference learning. Cybernetics and Systems 1999, 30:341-363.
- (1999) Cybernetics and Systems , vol.30 , pp. 341-363
- Cichosz, P.¹

8
- 34548800682
- Learning to control an octopus arm with gaussian process temporal difference methods.
- Engel, Y., Szabó, P., & Volkinshtein, D. (2005). Learning to control an octopus arm with gaussian process temporal difference methods. In NIPS.
- (2005) In NIPS
- Engel, Y.¹ Szabó, P.² Volkinshtein, D.³

9
- 33748998787
- Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
- George A.P., Powell W.B. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 2006, 65(1):167-198.
- (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
- George, A.P.¹ Powell, W.B.²

10
- 80053958320
- Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
- Hachiya H., Peters J., Sugiyama M. Reward-weighted regression with sample reuse for direct policy search in reinforcement learning. Neural Computation 2011, 23(11):2798-2832.
- (2011) Neural Computation , vol.23 , Issue.11 , pp. 2798-2832
- Hachiya, H.¹ Peters, J.² Sugiyama, M.³

11
- 0024137490
- Increased rates of convergence through learning rate adaptation
- Jacobs R.A. Increased rates of convergence through learning rate adaptation. Neural Networks 1988, 1(4):295-308.
- (1988) Neural Networks , vol.1 , Issue.4 , pp. 295-308
- Jacobs, R.A.¹

12
- 69249225560
- Neighborhood based modified backpropagation algorithm using adaptive learning parameters for training feedforward neural networks
- Kathirvalavakumar T., Subavathi S.J. Neighborhood based modified backpropagation algorithm using adaptive learning parameters for training feedforward neural networks. Neurocomputing 2009, 72:3915-3921.
- (2009) Neurocomputing , vol.72 , pp. 3915-3921
- Kathirvalavakumar, T.¹ Subavathi, S.J.²

13
- 0008336447
- An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value functions.
- Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value functions. In Proc. of the 15th ICML (pp. 278-286).
- (1998) In Proc. of the 15th ICML , pp. 278-286
- Kimura, H.¹ Kobayashi, S.²

14
- 78049390740
- Policy search for motor primitives in robotics
- Kober J., Peters J. Policy search for motor primitives in robotics. Machine Learning 2011, 84(1-2):171-203.
- (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 171-203
- Kober, J.¹ Peters, J.²

15
- 4043069840
- Actor-critic algorithms
- Konda V., Tsitsiklis J. Actor-critic algorithms. SIAM Journal on Control and Optimization 2003, 42(4):1143-1166.
- (2003) SIAM Journal on Control and Optimization , vol.42 , Issue.4 , pp. 1143-1166
- Konda, V.¹ Tsitsiklis, J.²

16
- 84455205906
- Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization.
- IROS. San Francisco, USA.
- Kormushev, P., Ugurlu, B., Calinon, S., Tsagarakis, N., & Caldwell, D.G. (2011). Bipedal walking energy minimization by reinforcement learning with evolving policy parameterization. In Proc. IEEE/RSJ intl conf. on intelligent robots and systems, IROS (pp. 318-324). San Francisco, USA.
- (2011) In Proc. IEEE/RSJ intl conf. on intelligent robots and systems , pp. 318-324
- Kormushev, P.¹ Ugurlu, B.² Calinon, S.³ Tsagarakis, N.⁴ Caldwell, D.G.⁵

17
- 77955872473
- Evolving neural networks in compressed weight space.
- In Proceedings of the conference on genetic and evolutionary computation, GECCO-10
- Koutnik, J., Gomez, F., & Schmidhuber, J. (2010). Evolving neural networks in compressed weight space. In Proceedings of the conference on genetic and evolutionary computation, GECCO-10 (pp. 619-626).
- (2010) , pp. 619-626
- Koutnik, J.¹ Gomez, F.² Schmidhuber, J.³

18
- 0004066022
- Springer-Verlag
- Kushner H.J., Yin G. Stochastic approximation algorithms and applications 1997, Springer-Verlag.
- (1997) Stochastic approximation algorithms and applications
- Kushner, H.J.¹ Yin, G.²

19
- 33845545664
- Field trials and testing of the octarm continuum manipulator.
- In ICRA
- McMahan, W., Chitrakaran, V.K., Csencsits, M.A., Dawson, D.M., Walker, I.D., & Jones, B.A. et al. (2006). Field trials and testing of the octarm continuum manipulator. In ICRA (pp. 2336-2341).
- (2006) , pp. 2336-2341
- McMahan, W.¹ Chitrakaran, V.K.² Csencsits, M.A.³ Dawson, D.M.⁴ Walker, I.D.⁵ Jones, B.A.⁶

20
- 76649105482
- Recursive adaptation of stepsize parameter for non-stationary environments
- Noda I. Recursive adaptation of stepsize parameter for non-stationary environments. Principles of practice in multi-agent systems 2009, 525-533.
- (2009) Principles of practice in multi-agent systems , pp. 525-533
- Noda, I.¹

21
- 84875903966
- Octopus-sources
- Octopus-sources (2006). http://www.cs.mcgill.ca/~dprecup/workshops/ICML06/Octopus/octopus-code-distribution.zip.
- (2006)

22
- 34648858905
- Bounds on sample size for policy evaluation in Markov environments.
- on computational learning theory, Berlin: Springer.
- Peshkin, L., & Mukherjee, S. (2001). Bounds on sample size for policy evaluation in Markov environments. In Proc. of the 14th annual conf. on computational learning theory, vol. 2111 (pp. 616-629). Berlin: Springer.
- (2001) In Proc. of the 14th annual conf. , vol.2111 , pp. 616-629
- Peshkin, L.¹ Mukherjee, S.²

23
- 33646413135
- Natural actor-critic
- Springer-Verlag, Berlin Heidelberg
- Peters J., Vijayakumar S., Schaal S. Natural actor-critic. Proc. of ECML 2005, 280-291. Springer-Verlag, Berlin Heidelberg.
- (2005) Proc. of ECML , pp. 280-291
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

24
- 0004080531
- John Wiley & Sons, Inc.
- Rubinstein R.Y. Simulation and the Monte Carlo method 1981, John Wiley & Sons, Inc.
- (1981) Simulation and the Monte Carlo method
- Rubinstein, R.Y.¹

25
- 0141853652
- MIT Press, (Chap. Learning representations by back-propagating errors)
- Rumelhart D.E., Hinton G.E., Williams R.J. Neurocomputing: foundations of research 1988, MIT Press, (pp. 696-699) (Chap. Learning representations by back-propagating errors).
- (1988) Neurocomputing: foundations of research , pp. 696-699
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

26
- 84898942573
- Online independent component analysis with local learning rate adaptation.
- In Advances in NIPS
- Schraudolph, N.N., & Giannakopoulos, X. (2000). Online independent component analysis with local learning rate adaptation. In Advances in NIPS, vol. 12 (pp. 789-795).
- (2000) , vol.12 , pp. 789-795
- Schraudolph, N.N.¹ Giannakopoulos, X.²

27
- 52149116388
- Fast online policy gradient learning with SMD gain vector adaptation.
- In Advances in NIPS
- Schraudolph, N.N., Yu, J., & Aberdeen, D. (2006). Fast online policy gradient learning with SMD gain vector adaptation. In Advances in NIPS, vol. 18 (pp. 1185-1192).
- (2006) , vol.18 , pp. 1185-1192
- Schraudolph, N.N.¹ Yu, J.² Aberdeen, D.³

28
- 0001694484
- Acceleration techniques for the backpropagation algorithm
- Silva, F.M., & Almeida, L.B. (1990). Acceleration techniques for the backpropagation algorithm. In Neural networks EURASIP workshop. Sesim.
- (1990) In Neural networks EURASIP workshop. Sesim.
- Silva, F.M.¹ Almeida, L.B.²

29
- 85132026293
- Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
- Morgan Kaufmann
- Sutton R.S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proc. of the 7th ICML 1990, 216-224. Morgan Kaufmann.
- (1990) Proc. of the 7th ICML , pp. 216-224
- Sutton, R.S.¹

30
- 84875887627
- Gain adaptation beats least squares? In YWALS
- Sutton, R.R. (1992a). Gain adaptation beats least squares? In YWALS (pp. 161-166).
- (1992) , pp. 161-166
- Sutton, R.R.¹

31
- 0026971570
- Adapting bias by gradient descent: an incremental version of delta-bar-delta.
- Sutton, R.S. (1992b). Adapting bias by gradient descent: an incremental version of delta-bar-delta. In Proc. of the 10th NCAI (pp. 171-176).
- (1992) In Proc. of the 10th NCAI , pp. 171-176
- Sutton, R.S.¹

32
- 0004102479
- MIT Press
- Sutton R.S., Barto A.G. Reinforcement learning: an introduction 1998, MIT Press.
- (1998) Reinforcement learning: an introduction
- Sutton, R.S.¹ Barto, A.G.²

33
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- MIT Press
- Sutton R.S., McAllester D., Singh S., Mansour Y. Policy gradient methods for reinforcement learning with function approximation. Advances in NIPS, vol. 12 2000, 1057-1063. MIT Press.
- (2000) Advances in NIPS, vol. 12 , pp. 1057-1063
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

34
- 71749106087
- Real-time reinforcement learning by sequential actor-critics and experience replay
- Wawrzyński P. Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks 2009, 22:1484-1497.
- (2009) Neural Networks , vol.22 , pp. 1484-1497
- Wawrzyński, P.¹

35
- 85180619113
- Fixed point method of step-size estimation for on-line neural network training. In IJCNN
- Wawrzyński, P. (2010). Fixed point method of step-size estimation for on-line neural network training. In IJCNN (pp. 2012-2017).
- (2010) , pp. 2012-2017
- Wawrzyński, P.¹

36
- 80052930723
- Fixed point method for autonomous on-line neural network training
- Wawrzyński P., Papis B. Fixed point method for autonomous on-line neural network training. Neurocomputing 2011, 74:2893-2905.
- (2011) Neurocomputing , vol.74 , pp. 2893-2905
- Wawrzyński, P.¹ Papis, B.²

37
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 1992, 229-256.
- (1992) Machine Learning , pp. 229-256
- Williams, R.J.¹

38
- 78149253222
- Evolving a single scalable controller for an octopus arm with a variable number of segments
- Springer
- Woolley B.G., Stanley K.O. Evolving a single scalable controller for an octopus arm with a variable number of segments. Proceedings of the 11th international conference on parallel problem solving from nature 2010, Springer.
- (2010) Proceedings of the 11th international conference on parallel problem solving from nature
- Woolley, B.G.¹ Stanley, K.O.²

39
- 23044435398
- Dynamic model of the octopus arm. i. biomechanics of the octopus reaching movement
- Yekutieli Y., Sagiv-Zohar R., Aharonov R., Engel Y., Hochner B., Flash T. Dynamic model of the octopus arm. i. biomechanics of the octopus reaching movement. Journal of Neurophysiology 2005, 94(2):1443-1458.
- (2005) Journal of Neurophysiology , vol.94 , Issue.2 , pp. 1443-1458
- Yekutieli, Y.¹ Sagiv-Zohar, R.² Aharonov, R.³ Engel, Y.⁴ Hochner, B.⁵ Flash, T.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.