메뉴 건너뛰기




Volumn 4668 LNCS, Issue PART 1, 2007, Pages 697-706

Solving deep memory POMDPs with Recurrent Policy gradients

Author keywords

[No Author keywords available]

Indexed keywords

BACKPROPAGATION; DECISION THEORY; REINFORCEMENT LEARNING; STOCHASTIC MODELS; STOCHASTIC SYSTEMS;

EID: 38149018611     PISSN: 03029743     EISSN: 16113349     Source Type: Book Series    
DOI: 10.1007/978-3-540-74690-4_71     Document Type: Conference Paper
Times cited : (139)

References (22)
  • 6
    • 0000337576 scopus 로고
    • Simple statistical gradient-following algorithms for connectionist reinforcement learning
    • Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229-256 (1992)
    • (1992) Machine Learning , vol.8 , pp. 229-256
    • Williams, R.J.1
  • 7
    • 0025600638 scopus 로고
    • A stochastic reinforcement learning algorithm for learning real-valued functions
    • Gullapalli, V.: A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks 3(6), 671-692 (1990)
    • (1990) Neural Networks , vol.3 , Issue.6 , pp. 671-692
    • Gullapalli, V.1
  • 8
    • 33745327217 scopus 로고    scopus 로고
    • Fast online policy gradient learning with smd gain vector adaptation
    • Weiss, Y, Schölkopf, B, Platt, J, eds, MIT Press, Cambridge, MA
    • Schraudolph, N., Yu, J., Aberdeen, D.: Fast online policy gradient learning with smd gain vector adaptation. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems, vol. 18, MIT Press, Cambridge, MA (2006)
    • (2006) Advances in Neural Information Processing Systems , vol.18
    • Schraudolph, N.1    Yu, J.2    Aberdeen, D.3
  • 9
    • 33646413135 scopus 로고    scopus 로고
    • Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), 3720, pp. 280-291. Springer, Heidelberg (2005)
    • Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280-291. Springer, Heidelberg (2005)
  • 12
    • 0025503558 scopus 로고
    • Back propagation through time: What it does and how to do it
    • Werbos, P.: Back propagation through time: What it does and how to do it. Proceedings of the IEEE 78, 1550-1560 (1990)
    • (1990) Proceedings of the IEEE , vol.78 , pp. 1550-1560
    • Werbos, P.1
  • 13
  • 19
    • 0041914606 scopus 로고    scopus 로고
    • Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
    • Kremer, S.C, Kolen, J.F, eds, IEEE Press, NJ, New York
    • Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, NJ, New York (2001)
    • (2001) A Field Guide to Dynamical Recurrent Neural Networks
    • Hochreiter, S.1    Bengio, Y.2    Frasconi, P.3    Schmidhuber, J.4
  • 21
    • 0026626840 scopus 로고
    • Evolving neural network controllers for unstable systems
    • Seattle, WA, pp, IEEE Service Center, Piscataway, NJ
    • Wieland, A.: Evolving neural network controllers for unstable systems. In: Proceedings of the International Joint Conference on Neural Networks, Seattle, WA, pp. 667-673. IEEE Service Center, Piscataway, NJ (1991)
    • (1991) Proceedings of the International Joint Conference on Neural Networks , pp. 667-673
    • Wieland, A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.