SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 4668 LNCS, Issue PART 1, 2007, Pages 697-706

Solving deep memory POMDPs with Recurrent Policy gradients

(4) Wierstra, Daan a Foerster, Alexander a Peters, Jan b Schmidhuber, Jürgen a

a DALLE MOLLE INSTITUTE FOR ARTIFICIAL INTELLIGENCE IDSIA (Switzerland)

b UNIVERSITY OF SOUTHERN CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BACKPROPAGATION; DECISION THEORY; REINFORCEMENT LEARNING; STOCHASTIC MODELS; STOCHASTIC SYSTEMS;

DEEP MEMORY; LIMITED MEMORY; LONG TERM MEMORY; PARTIALLY OBSERVABLE MARKOV DECISION PROBLEMS; POLICY GRADIENT; RECURRENT NEURAL NETWORK (RNN); REINFORCEMENT LEARNING METHOD; STOCHASTIC POLICY;

RECURRENT NEURAL NETWORKS;

EID: 38149018611 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-540-74690-4_71 Document Type: Conference Paper

Times cited : (139)

References (22)

1
- 0031343491
- Robotics and Autonomous Systems Journal
- Benbrahim, H., Franklin, J.: Biped dynamic walking using reinforcement learning. Robotics and Autonomous Systems Journal (1997)
- (1997) Biped dynamic walking using reinforcement learning
- Benbrahim, H.¹ Franklin, J.²

2
- 0035391755
- Learning to Trade via Direct Reinforcement
- Moody, J., Saffell, M.: Learning to Trade via Direct Reinforcement. IEEE Transactions on Neural Networks 12(4), 875-889 (2001)
- (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.4 , pp. 875-889
- Moody, J.¹ Saffell, M.²

3
- 34548777734
- Toward effective combination of off-line and on-line training in adp framework
- IEEE Computer Society Press, Los Alamitos
- Prokhorov, D.: Toward effective combination of off-line and on-line training in adp framework. In: ADPRL. Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning, IEEE Computer Society Press, Los Alamitos (2007)
- (2007) ADPRL. Proceedings of the IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning
- Prokhorov, D.¹

4
- 0013495368
- Experiments with infinite-horizon, policy- gradient estimation
- Baxter, J., Bartlett, P., Weaver, L.: Experiments with infinite-horizon, policy- gradient estimation. Journal of Artificial Intelligence Research 15, 351-381 (2001)
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 351-381
- Baxter, J.¹ Bartlett, P.² Weaver, L.³

5
- 34250635407
- Policy gradient methods for robotics
- Beijing, China, pp
- Peters, J., Schaal, S.: Policy gradient methods for robotics. In: IROS. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, Beijing, China, pp. 2219-2225 (2006)
- (2006) IROS. Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems , pp. 2219-2225
- Peters, J.¹ Schaal, S.²

6
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229-256 (1992)
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

7
- 0025600638
- A stochastic reinforcement learning algorithm for learning real-valued functions
- Gullapalli, V.: A stochastic reinforcement learning algorithm for learning real-valued functions. Neural Networks 3(6), 671-692 (1990)
- (1990) Neural Networks , vol.3 , Issue.6 , pp. 671-692
- Gullapalli, V.¹

8
- 33745327217
- Fast online policy gradient learning with smd gain vector adaptation
- Weiss, Y, Schölkopf, B, Platt, J, eds, MIT Press, Cambridge, MA
- Schraudolph, N., Yu, J., Aberdeen, D.: Fast online policy gradient learning with smd gain vector adaptation. In: Weiss, Y., Schölkopf, B., Platt, J. (eds.) Advances in Neural Information Processing Systems, vol. 18, MIT Press, Cambridge, MA (2006)
- (2006) Advances in Neural Information Processing Systems , vol.18
- Schraudolph, N.¹ Yu, J.² Aberdeen, D.³

9
- 33646413135
- Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), 3720, pp. 280-291. Springer, Heidelberg (2005)
- Peters, J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds.) ECML 2005. LNCS (LNAI), vol. 3720, pp. 280-291. Springer, Heidelberg (2005)

10
- 33750244274
- Sutton, R., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation (2001)
- (2001) Policy gradient methods for reinforcement learning with function approximation
- Sutton, R.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

11
- 0003572965
- Gullapalli, V.: Reinforcement learning and its application to control (1992)
- (1992) Reinforcement learning and its application to control
- Gullapalli, V.¹

12
- 0025503558
- Back propagation through time: What it does and how to do it
- Werbos, P.: Back propagation through time: What it does and how to do it. Proceedings of the IEEE 78, 1550-1560 (1990)
- (1990) Proceedings of the IEEE , vol.78 , pp. 1550-1560
- Werbos, P.¹

13
- 2142812536
- Learning without state-estimation in partially observable markovian decision processes
- Singh, S.P., Jaakkola, T., Jordan, M.I.: Learning without state-estimation in partially observable markovian decision processes. In: International Conference on Machine Learning, pp. 284-292 (1994)
- (1994) International Conference on Machine Learning , pp. 284-292
- Singh, S.P.¹ Jaakkola, T.² Jordan, M.I.³

14
- 14344253499
- PhD thesis, Australian National University
- Aberdeen, D.: Policy-Gradient Algorithms for Partially Observable Markov Decision Processes. PhD thesis, Australian National University (2003)
- (2003) Policy-Gradient Algorithms for Partially Observable Markov Decision Processes
- Aberdeen, D.¹

15
- 0002103968
- Learning finite-state controllers for partially observable environments
- Morgan Kaufmann, San Francisco
- Meuleau, N., Peshkin, L., Kim, K.-E., Kaelbling, L.P.: Learning finite-state controllers for partially observable environments. In: UAI '99. Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence, pp. 427-436. Morgan Kaufmann, San Francisco (1999)
- (1999) UAI '99. Proc. Fifteenth Conference on Uncertainty in Artificial Intelligence , pp. 427-436
- Meuleau, N.¹ Peshkin, L.² Kim, K.-E.³ Kaelbling, L.P.⁴

16
- 0031573117
- Long short-term memory
- Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735-1780 (1997)
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

17
- 84899015857
- Reinforcement learning with long short-term memory
- Bakker, B.: Reinforcement learning with long short-term memory. In: Advances in Neural Information Processing Syst., vol. 14 (2002)
- (2002) Advances in Neural Information Processing Syst , vol.14
- Bakker, B.¹

18
- 0004142943
- Baxter, J., Bartlett, P.: Direct gradient-based reinforcement learning (1999)
- (1999) Direct gradient-based reinforcement learning
- Baxter, J.¹ Bartlett, P.²

19
- 0041914606
- Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
- Kremer, S.C, Kolen, J.F, eds, IEEE Press, NJ, New York
- Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks, IEEE Press, NJ, New York (2001)
- (2001) A Field Guide to Dynamical Recurrent Neural Networks
- Hochreiter, S.¹ Bengio, Y.² Frasconi, P.³ Schmidhuber, J.⁴

20
- 58849155146
- Schmidhuber, J.: RNN overview (2004), http://www.idsia.ch/~juergen/ran. html
- (2004) RNN overview
- Schmidhuber, J.¹

21
- 0026626840
- Evolving neural network controllers for unstable systems
- Seattle, WA, pp, IEEE Service Center, Piscataway, NJ
- Wieland, A.: Evolving neural network controllers for unstable systems. In: Proceedings of the International Joint Conference on Neural Networks, Seattle, WA, pp. 667-673. IEEE Service Center, Piscataway, NJ (1991)
- (1991) Proceedings of the International Joint Conference on Neural Networks , pp. 667-673
- Wieland, A.¹

22
- 38149069001
- Torcs: Torcs, the open racing car simulator (2007),http://tores. sourceforge.net/
- (2007) Torcs, the open racing car simulator
- Torcs¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.