메뉴 건너뛰기




Volumn 18, Issue 5, 2009, Pages 620-634

Recurrent policy gradients

Author keywords

Partially Observable Markov Decision Problems (POMDPs); Policy gradient methods; Recurrent neural networks; Reinforcement learning

Indexed keywords

GRADIENT METHODS; RECURRENT NEURAL NETWORKS; STOCHASTIC SYSTEMS;

EID: 77957283019     PISSN: 13670751     EISSN: 13689894     Source Type: Journal    
DOI: 10.1093/jigpal/jzp049     Document Type: Article
Times cited : (113)

References (31)
  • 2
    • 2142812536 scopus 로고
    • Learning without state-estimation in partially observable Markovian decision processes
    • Morgan Kaufmann Publishers, San Francisco, CA
    • S. P. Singh, T. Jaakkola, M. I. Jordan, Learning without state-estimation in partially observable Markovian decision processes. In: International Conference on Machine Learning (ICML 1994), Morgan Kaufmann Publishers, San Francisco, CA, 1994, pp. 284-292.
    • (1994) International Conference on Machine Learning (ICML 1994) , pp. 284-292
    • Singh, S.P.1    Jaakkola, T.2    Jordan, M.I.3
  • 9
    • 44949241322 scopus 로고    scopus 로고
    • Reinforcement learning of motor skills with policy gradients
    • J. Peters, S. Schaal, Reinforcement learning of motor skills with policy gradients, Neural Networks 21 (4) (2008) 682-97.
    • (2008) Neural Networks , vol.21 , Issue.4 , pp. 682-697
    • Peters, J.1    Schaal, S.2
  • 10
    • 0031343491 scopus 로고    scopus 로고
    • Biped dynamic walking using reinforcement learning
    • H. Benbrahim, J. Franklin, Biped dynamic walking using reinforcement learning, Robotics and Autonomous Systems 22 (3-4) (1997) 283-302.
    • (1997) Robotics and Autonomous Systems , vol.22 , Issue.3-4 , pp. 283-302
    • Benbrahim, H.1    Franklin, J.2
  • 14
    • 40649106649 scopus 로고    scopus 로고
    • Natural Actor Critic
    • J. Peters, S. Schaal, Natural Actor Critic, Neurocomputing 71 (7-9) (2008) 1180-1190.
    • (2008) Neurocomputing , vol.71 , Issue.7-9 , pp. 1180-1190
    • Peters, J.1    Schaal, S.2
  • 15
    • 0025600638 scopus 로고
    • A stochastic reinforcement learning algorithm for learning real-valued functions
    • V. Gullapalli, A stochastic reinforcement learning algorithm for learning real-valued functions, Neural Networks 3 (6) (1990) 671-692.
    • (1990) Neural Networks , vol.3 , Issue.6 , pp. 671-692
    • Gullapalli, V.1
  • 17
    • 0025503558 scopus 로고
    • Back propagation through time: What it does and how to do it
    • P. Werbos, Back propagation through time: What it does and how to do it. In: Proceedings of the IEEE, Vol. 78, 1990, pp. 1550-1560.
    • (1990) Proceedings of the IEEE , vol.78 , pp. 1550-1560
    • Werbos, P.1
  • 20
    • 84899015857 scopus 로고    scopus 로고
    • Reinforcement learning with Long Short-Term Memory
    • MIT Press
    • B. Bakker, Reinforcement learning with Long Short-Term Memory. In: Advances in Neural Information Processing Systems 14, MIT Press, 2002, pp. 1475-1482.
    • (2002) Advances in Neural Information Processing Systems , vol.14 , pp. 1475-1482
    • Bakker, B.1
  • 22
    • 0001202594 scopus 로고
    • A learning algorithm for continually running fully recurrent networks
    • R. J. Williams, D. Zipser, A learning algorithm for continually running fully recurrent networks, Neural Computation 1 (2) (1989) 270-280.
    • (1989) Neural Computation , vol.1 , Issue.2 , pp. 270-280
    • Williams, R.J.1    Zipser, D.2
  • 23
    • 0028392483 scopus 로고
    • Learning long-term dependencies with gradient descent is difficult
    • Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks 5 (2) (1994) 157-166.
    • (1994) IEEE Transactions on Neural Networks , vol.5 , Issue.2 , pp. 157-166
    • Bengio, Y.1    Simard, P.2    Frasconi, P.3
  • 24
    • 0041914606 scopus 로고    scopus 로고
    • Gradient flow in recurrent nets: the difficulty of learning long-term dependencies
    • S. C. Kremer, J. F. Kolen (Eds.), IEEE Press
    • S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: S. C. Kremer, J. F. Kolen (Eds.), A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, 2001, pp. 237-244.
    • (2001) A Field Guide to Dynamical Recurrent Neural Networks , pp. 237-244
    • Hochreiter, S.1    Bengio, Y.2    Frasconi, P.3    Schmidhuber, J.4
  • 26
    • 77957276363 scopus 로고    scopus 로고
    • RNN overview
    • J. Schmidhuber, RNN overview, http://www.idsia.ch/~juergen/rnn.html (2004).
    • (2004)
    • Schmidhuber, J.1
  • 27
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning 8 (1992) 229-256.
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 29
    • 85138579181 scopus 로고
    • Learning policies for partially observable environments: Scaling up
    • A. Prieditis, S. Russell (Eds.), Morgan Kaufmann Publishers, San Francisco, CA
    • M. Littman, A. Cassandra, L. Kaelbling, Learning policies for partially observable environments: Scaling up. In: A. Prieditis, S. Russell (Eds.), Machine Learning. Proceedings of the Twelfth International Conference. Morgan Kaufmann Publishers, San Francisco, CA, 1995, pp. 362-370.
    • (1995) Machine Learning Proceedings of the Twelfth International Conference , pp. 362-370
    • Littman, M.1    Cassandra, A.2    Kaelbling, L.3
  • 31
    • 77957288806 scopus 로고    scopus 로고
    • Torcs, The open racing car simulator
    • Torcs, The open racing car simulator, http://torcs.sourceforge.net/ (2007).
    • (2007)


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.