메뉴 건너뛰기




Volumn 23, Issue 4, 2010, Pages 551-559

Parameter-exploring policy gradients

Author keywords

Control; Policy gradients; Reinforcement learning; Robotics; Stochastic optimisation

Indexed keywords

COMPLEX CONTROL; CONTROL POLICY; GRADIENT ESTIMATES; HUMANOID ROBOT; INDIVIDUAL COMPONENTS; MODEL FREE; OTHER ALGORITHMS; PARAMETER SPACES; PARTIALLY OBSERVABLE MARKOV DECISION PROBLEMS; POLICY GRADIENT; POLICY GRADIENT METHODS; POLICY GRADIENTS; REINFORCEMENT LEARNING METHOD; STOCHASTIC OPTIMISATION;

EID: 77950297907     PISSN: 08936080     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.neunet.2009.12.004     Document Type: Article
Times cited : (272)

References (23)
  • 1
    • 14344253499 scopus 로고    scopus 로고
    • Policy-gradient algorithms for partially observable markov decision processes
    • Ph.D. thesis, Australian National University
    • Aberdeen, D. (2003). Policy-gradient algorithms for partially observable markov decision processes. Ph.D. thesis, Australian National University.
    • (2003)
    • Aberdeen, D.1
  • 2
    • 0003272616 scopus 로고    scopus 로고
    • Reinforcement learning in POMDPs via direct gradient ascent
    • Morgan Kaufmann, San Francisco, CA
    • Baxter J., Bartlett P.L. Reinforcement learning in POMDPs via direct gradient ascent. Proc. 17th international conf. on machine learning 2000, 41-48. Morgan Kaufmann, San Francisco, CA.
    • (2000) Proc. 17th international conf. on machine learning , pp. 41-48
    • Baxter, J.1    Bartlett, P.L.2
  • 4
    • 77950299438 scopus 로고    scopus 로고
    • Institute of Automatic Control Engineering
    • TU München, Germany
    • Buss, M., & Hirche, S. (2008). Institute of Automatic Control Engineering, TU München, Germany. http://www.lsr.ei.tum.de/.
    • (2008)
    • Buss, M.1    Hirche, S.2
  • 5
    • 0035377566 scopus 로고    scopus 로고
    • Completely derandomized self-adaptation in evolution strategies
    • Hansen N., Ostermeier A. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 2001, 9(2):159-195.
    • (2001) Evolutionary Computation , vol.9 , Issue.2 , pp. 159-195
    • Hansen, N.1    Ostermeier, A.2
  • 6
    • 0001887517 scopus 로고
    • Attractor dynamics and parallelism in a connectionist sequential machine
    • In Proc. of the eighth annual conference of the cognitive science society
    • Jordan, M. (1986). Attractor dynamics and parallelism in a connectionist sequential machine. In Proc. of the eighth annual conference of the cognitive science society (pp. 531-546) Vol. 8.
    • (1986) , vol.8 , pp. 531-546
    • Jordan, M.1
  • 7
    • 77950299711 scopus 로고    scopus 로고
    • Making a robot learn to play soccer
    • In Proceedings of the 30th annual German conference on artificial intelligence (KI-2007)
    • Müller, H., Lauer, M., Hafner, R., Lange, S., Merke, A., & Riedmiller, M. (2007). Making a robot learn to play soccer. In Proceedings of the 30th annual German conference on artificial intelligence (KI-2007).
    • (2007)
    • Müller, H.1    Lauer, M.2    Hafner, R.3    Lange, S.4    Merke, A.5    Riedmiller, M.6
  • 9
    • 34250635407 scopus 로고    scopus 로고
    • Policy gradient methods for robotics
    • In IROS-2006
    • Peters, J., & Schaal, S. (2006). Policy gradient methods for robotics. In IROS-2006(pp. 2219-2225).
    • (2006) , pp. 2219-2225
    • Peters, J.1    Schaal, S.2
  • 10
    • 40649106649 scopus 로고    scopus 로고
    • Natural actor-critic
    • Peters J., Schaal S. Natural actor-critic. Neurocomputing 2008, 71:1180-1190.
    • (2008) Neurocomputing , vol.71 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 11
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • Peters J., Schaal S. Reinforcement learning of motor skills with policy gradients. Neural Networks 2008, 682-697.
    • (2008) Neural Networks , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 13
    • 34548763245 scopus 로고    scopus 로고
    • Evaluation of policy gradient methods and variants on the cart-pole benchmark
    • In ADPRL-2007
    • Riedmiller, M., Peters, J., & Schaal, S. (2007). Evaluation of policy gradient methods and variants on the cart-pole benchmark. In ADPRL-2007.
    • (2007)
    • Riedmiller, M.1    Peters, J.2    Schaal, S.3
  • 14
    • 56049089041 scopus 로고    scopus 로고
    • State-dependent exploration for policy gradient methods
    • W.D. (Ed.) European conference on machine learning and principles and practice of knowledge discovery in databases 2008, Part II
    • Rückstieß T., Felder M., Schmidhuber J. State-dependent exploration for policy gradient methods. LNAI 2008, Vol. 5212:234-249. W.D. (Ed.).
    • (2008) LNAI , vol.5212 , pp. 234-249
    • Rückstieß, T.1    Felder, M.2    Schmidhuber, J.3
  • 15
    • 33745327217 scopus 로고    scopus 로고
    • Fast online policy gradient learning with smd gain vector adaptation
    • MIT Press, Cambridge, MA, Y. Weiss, B. Schölkopf, J. Platt (Eds.)
    • Schraudolph N., Yu J., Aberdeen D. Fast online policy gradient learning with smd gain vector adaptation. Advances in neural information processing systems 2006, Vol. 18. MIT Press, Cambridge, MA. Y. Weiss, B. Schölkopf, J. Platt (Eds.).
    • (2006) Advances in neural information processing systems , vol.18
    • Schraudolph, N.1    Yu, J.2    Aberdeen, D.3
  • 17
    • 77950299646 scopus 로고    scopus 로고
    • PGPE-Policy Gradients with Parameter-based exploration-demonstration video: Learning in robot simulatons
    • Sehnke, F. (2009). PGPE-Policy Gradients with Parameter-based exploration-demonstration video: Learning in robot simulatons. http://www.pybrain.org/videos/jnn10/.
    • (2009)
    • Sehnke, F.1
  • 18
    • 0000284219 scopus 로고    scopus 로고
    • An overview of the simultaneous perturbation method for efficient optimization
    • Spall J. An overview of the simultaneous perturbation method for efficient optimization. Johns Hopkins APL Technical Digest 1998, 19(4):482-492.
    • (1998) Johns Hopkins APL Technical Digest , vol.19 , Issue.4 , pp. 482-492
    • Spall, J.1
  • 19
    • 0032117046 scopus 로고    scopus 로고
    • Implementation of the simultaneous perturbation algorithm for stochastic optimization
    • Spall J. Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on Aerospace and Electronic Systems 1998, 34(3):817-823.
    • (1998) IEEE Transactions on Aerospace and Electronic Systems , vol.34 , Issue.3 , pp. 817-823
    • Spall, J.1
  • 20
    • 84898939480 scopus 로고    scopus 로고
    • Policy gradient methods for reinforcement learning with function approximation
    • In NIPS-1999
    • Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In NIPS-1999(pp. 1057-1063).
    • (2000) , pp. 1057-1063
    • Sutton, R.1    McAllester, D.2    Singh, S.3    Mansour, Y.4
  • 21
    • 58849139307 scopus 로고    scopus 로고
    • Institute of Applied Mechanics
    • TU München, Germany
    • Ulbrich, H. (2008). Institute of Applied Mechanics, TU München, Germany. http://www.amm.mw.tum.de/.
    • (2008)
    • Ulbrich, H.1
  • 23
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams R. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 1992, 8:229-256.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.