메뉴 건너뛰기




Volumn 41, Issue , 2013, Pages 156-167

Autonomous reinforcement learning with experience replay

Author keywords

Actor critic; Autonomous learning; Reinforcement learning; Step size estimation

Indexed keywords

ACTOR CRITIC; AUTONOMOUS LEARNING; CONTROL POLICY; CONTROL TASK; FIXED-POINT ALGORITHMS; LEARNING CONTROL; ON-LINE NEURAL NETWORKS; STEP SIZE;

EID: 84875884428     PISSN: 08936080     EISSN: 18792782     Source Type: Journal    
DOI: 10.1016/j.neunet.2012.11.007     Document Type: Article
Times cited : (58)

References (39)
  • 1
    • 31844444663 scopus 로고    scopus 로고
    • Exploration and apprenticeship learning in reinforcement learning
    • ACM
    • Abbeel P., Ng A.Y. Exploration and apprenticeship learning in reinforcement learning. Proc. of the 22nd ICML 2005, 1-8. ACM.
    • (2005) Proc. of the 22nd ICML , pp. 1-8
    • Abbeel, P.1    Ng, A.Y.2
  • 4
    • 33750119752 scopus 로고    scopus 로고
    • On adaptive learning rate that guarantees convergence in feedforward networks
    • Behera L., Kumar S., Patnaik A. On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Transactions on Neural Networks 2006, 17(5):1116-1125.
    • (2006) IEEE Transactions on Neural Networks , vol.17 , Issue.5 , pp. 1116-1125
    • Behera, L.1    Kumar, S.2    Patnaik, A.3
  • 7
    • 0032649518 scopus 로고    scopus 로고
    • An analysis of experience replay in temporal difference learning
    • Cichosz P. An analysis of experience replay in temporal difference learning. Cybernetics and Systems 1999, 30:341-363.
    • (1999) Cybernetics and Systems , vol.30 , pp. 341-363
    • Cichosz, P.1
  • 8
    • 34548800682 scopus 로고    scopus 로고
    • Learning to control an octopus arm with gaussian process temporal difference methods.
    • Engel, Y., Szabó, P., & Volkinshtein, D. (2005). Learning to control an octopus arm with gaussian process temporal difference methods. In NIPS.
    • (2005) In NIPS
    • Engel, Y.1    Szabó, P.2    Volkinshtein, D.3
  • 9
    • 33748998787 scopus 로고    scopus 로고
    • Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming
    • George A.P., Powell W.B. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 2006, 65(1):167-198.
    • (2006) Machine Learning , vol.65 , Issue.1 , pp. 167-198
    • George, A.P.1    Powell, W.B.2
  • 10
    • 80053958320 scopus 로고    scopus 로고
    • Reward-weighted regression with sample reuse for direct policy search in reinforcement learning
    • Hachiya H., Peters J., Sugiyama M. Reward-weighted regression with sample reuse for direct policy search in reinforcement learning. Neural Computation 2011, 23(11):2798-2832.
    • (2011) Neural Computation , vol.23 , Issue.11 , pp. 2798-2832
    • Hachiya, H.1    Peters, J.2    Sugiyama, M.3
  • 11
    • 0024137490 scopus 로고
    • Increased rates of convergence through learning rate adaptation
    • Jacobs R.A. Increased rates of convergence through learning rate adaptation. Neural Networks 1988, 1(4):295-308.
    • (1988) Neural Networks , vol.1 , Issue.4 , pp. 295-308
    • Jacobs, R.A.1
  • 12
    • 69249225560 scopus 로고    scopus 로고
    • Neighborhood based modified backpropagation algorithm using adaptive learning parameters for training feedforward neural networks
    • Kathirvalavakumar T., Subavathi S.J. Neighborhood based modified backpropagation algorithm using adaptive learning parameters for training feedforward neural networks. Neurocomputing 2009, 72:3915-3921.
    • (2009) Neurocomputing , vol.72 , pp. 3915-3921
    • Kathirvalavakumar, T.1    Subavathi, S.J.2
  • 13
    • 0008336447 scopus 로고    scopus 로고
    • An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value functions.
    • Kimura, H., & Kobayashi, S. (1998). An analysis of actor/critic algorithm using eligibility traces: reinforcement learning with imperfect value functions. In Proc. of the 15th ICML (pp. 278-286).
    • (1998) In Proc. of the 15th ICML , pp. 278-286
    • Kimura, H.1    Kobayashi, S.2
  • 14
    • 78049390740 scopus 로고    scopus 로고
    • Policy search for motor primitives in robotics
    • Kober J., Peters J. Policy search for motor primitives in robotics. Machine Learning 2011, 84(1-2):171-203.
    • (2011) Machine Learning , vol.84 , Issue.1-2 , pp. 171-203
    • Kober, J.1    Peters, J.2
  • 17
    • 77955872473 scopus 로고    scopus 로고
    • Evolving neural networks in compressed weight space.
    • In Proceedings of the conference on genetic and evolutionary computation, GECCO-10
    • Koutnik, J., Gomez, F., & Schmidhuber, J. (2010). Evolving neural networks in compressed weight space. In Proceedings of the conference on genetic and evolutionary computation, GECCO-10 (pp. 619-626).
    • (2010) , pp. 619-626
    • Koutnik, J.1    Gomez, F.2    Schmidhuber, J.3
  • 20
    • 76649105482 scopus 로고    scopus 로고
    • Recursive adaptation of stepsize parameter for non-stationary environments
    • Noda I. Recursive adaptation of stepsize parameter for non-stationary environments. Principles of practice in multi-agent systems 2009, 525-533.
    • (2009) Principles of practice in multi-agent systems , pp. 525-533
    • Noda, I.1
  • 21
    • 84875903966 scopus 로고    scopus 로고
    • Octopus-sources
    • Octopus-sources (2006). http://www.cs.mcgill.ca/~dprecup/workshops/ICML06/Octopus/octopus-code-distribution.zip.
    • (2006)
  • 22
    • 34648858905 scopus 로고    scopus 로고
    • Bounds on sample size for policy evaluation in Markov environments.
    • on computational learning theory, Berlin: Springer.
    • Peshkin, L., & Mukherjee, S. (2001). Bounds on sample size for policy evaluation in Markov environments. In Proc. of the 14th annual conf. on computational learning theory, vol. 2111 (pp. 616-629). Berlin: Springer.
    • (2001) In Proc. of the 14th annual conf. , vol.2111 , pp. 616-629
    • Peshkin, L.1    Mukherjee, S.2
  • 23
    • 33646413135 scopus 로고    scopus 로고
    • Natural actor-critic
    • Springer-Verlag, Berlin Heidelberg
    • Peters J., Vijayakumar S., Schaal S. Natural actor-critic. Proc. of ECML 2005, 280-291. Springer-Verlag, Berlin Heidelberg.
    • (2005) Proc. of ECML , pp. 280-291
    • Peters, J.1    Vijayakumar, S.2    Schaal, S.3
  • 26
    • 84898942573 scopus 로고    scopus 로고
    • Online independent component analysis with local learning rate adaptation.
    • In Advances in NIPS
    • Schraudolph, N.N., & Giannakopoulos, X. (2000). Online independent component analysis with local learning rate adaptation. In Advances in NIPS, vol. 12 (pp. 789-795).
    • (2000) , vol.12 , pp. 789-795
    • Schraudolph, N.N.1    Giannakopoulos, X.2
  • 27
    • 52149116388 scopus 로고    scopus 로고
    • Fast online policy gradient learning with SMD gain vector adaptation.
    • In Advances in NIPS
    • Schraudolph, N.N., Yu, J., & Aberdeen, D. (2006). Fast online policy gradient learning with SMD gain vector adaptation. In Advances in NIPS, vol. 18 (pp. 1185-1192).
    • (2006) , vol.18 , pp. 1185-1192
    • Schraudolph, N.N.1    Yu, J.2    Aberdeen, D.3
  • 29
    • 85132026293 scopus 로고
    • Integrated architectures for learning, planning, and reacting based on approximating dynamic programming
    • Morgan Kaufmann
    • Sutton R.S. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. Proc. of the 7th ICML 1990, 216-224. Morgan Kaufmann.
    • (1990) Proc. of the 7th ICML , pp. 216-224
    • Sutton, R.S.1
  • 30
    • 84875887627 scopus 로고
    • Gain adaptation beats least squares? In YWALS
    • Sutton, R.R. (1992a). Gain adaptation beats least squares? In YWALS (pp. 161-166).
    • (1992) , pp. 161-166
    • Sutton, R.R.1
  • 31
    • 0026971570 scopus 로고
    • Adapting bias by gradient descent: an incremental version of delta-bar-delta.
    • Sutton, R.S. (1992b). Adapting bias by gradient descent: an incremental version of delta-bar-delta. In Proc. of the 10th NCAI (pp. 171-176).
    • (1992) In Proc. of the 10th NCAI , pp. 171-176
    • Sutton, R.S.1
  • 33
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • MIT Press
    • Sutton R.S., McAllester D., Singh S., Mansour Y. Policy gradient methods for reinforcement learning with function approximation. Advances in NIPS, vol. 12 2000, 1057-1063. MIT Press.
    • (2000) Advances in NIPS, vol. 12 , pp. 1057-1063
    • Sutton, R.S.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 34
    • 71749106087 scopus 로고    scopus 로고
    • Real-time reinforcement learning by sequential actor-critics and experience replay
    • Wawrzyński P. Real-time reinforcement learning by sequential actor-critics and experience replay. Neural Networks 2009, 22:1484-1497.
    • (2009) Neural Networks , vol.22 , pp. 1484-1497
    • Wawrzyński, P.1
  • 35
    • 85180619113 scopus 로고    scopus 로고
    • Fixed point method of step-size estimation for on-line neural network training. In IJCNN
    • Wawrzyński, P. (2010). Fixed point method of step-size estimation for on-line neural network training. In IJCNN (pp. 2012-2017).
    • (2010) , pp. 2012-2017
    • Wawrzyński, P.1
  • 36
    • 80052930723 scopus 로고    scopus 로고
    • Fixed point method for autonomous on-line neural network training
    • Wawrzyński P., Papis B. Fixed point method for autonomous on-line neural network training. Neurocomputing 2011, 74:2893-2905.
    • (2011) Neurocomputing , vol.74 , pp. 2893-2905
    • Wawrzyński, P.1    Papis, B.2
  • 37
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 1992, 229-256.
    • (1992) Machine Learning , pp. 229-256
    • Williams, R.J.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.