SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 5163 LNCS, Issue PART 1, 2008, Pages 407-416

Episodic reinforcement learning by logistic reward-weighted regression

(4) Wierstra, Daan a Schaul, Tom a Peters, Jan b Schmidhuber, Juergen a,c

a DALLE MOLLE INSTITUTE FOR ARTIFICIAL INTELLIGENCE IDSIA (Switzerland)

b MAX PLANCK INSTITUTE FOR BIOLOGICAL CYBERNETICS (Germany)

c TECHNICAL UNIVERSITY OF MUNICH (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

ADAPTIVE CONTROL; EXPECTATION-MAXIMIZATION ALGORITHMS; LOGISTIC REGRESSIONS; NONSTATIONARY; PARTIALLY OBSERVABLE MARKOV DECISION PROBLEMS; POLICY SEARCH; TRAINING ALGORITHMS; VALUE FUNCTIONS; WEIGHTED REGRESSION;

BACKPROPAGATION; EDUCATION; LEARNING ALGORITHMS; REGRESSION ANALYSIS; REINFORCEMENT; REINFORCEMENT LEARNING;

RECURRENT NEURAL NETWORKS;

EID: 58849088597 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-540-87536-9_42 Document Type: Conference Paper

Times cited : (10)

References (15)

1
- 0032073263
- Planning and acting in partially observable stochastic domains
- Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence 101 (1998)
- (1998) Artificial Intelligence , vol.101
- Kaelbling, L.P.¹ Littman, M.L.² Cassandra, A.R.³

2
- 0004256310
- Academic Press, New York
- Aoki, M.: Optimization of Stochastic Systems. Academic Press, New York (1967)
- (1967) Optimization of Stochastic Systems
- Aoki, M.¹

3
- 0013535965
- Infinite-horizon policy-gradient estimation
- Baxter, J., Bartlett, P.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319-350 (2001)
- (2001) Journal of Artificial Intelligence Research , vol.15 , pp. 319-350
- Baxter, J.¹ Bartlett, P.²

4
- 38149018611
- Solving deep memory pomdps with recurrent policy gradients
- de Sá, J.M, Alexandre, L.A, Duch, W, Mandic, D.P, eds, ICANN 2007, Springer, Heidelberg
- Wierstra, D., Foerster, A., Peters, J., Schmidhuber, J.: Solving deep memory pomdps with recurrent policy gradients. In: de Sá, J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 697-706. Springer, Heidelberg (2007)
- (2007) LNCS , vol.4668 , pp. 697-706
- Wierstra, D.¹ Foerster, A.² Peters, J.³ Schmidhuber, J.⁴

5
- 34547964788
- Reinforcement learning by reward-weighted regression for operational space control
- Peters, J., Schaal, S.: Reinforcement learning by reward-weighted regression for operational space control. In: Proceedings of the International Conference on Machine Learning (ICML) (2007)
- (2007) Proceedings of the International Conference on Machine Learning (ICML)
- Peters, J.¹ Schaal, S.²

6
- 0346982426
- Using expectation-maximization for reinforcement learning
- Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271-278 (1997)
- (1997) Neural Computation , vol.9 , Issue.2 , pp. 271-278
- Dayan, P.¹ Hinton, G.E.²

7
- 0031573117
- Long short-term memory
- Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Computation 9(8), 1735-1780 (1997)
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

8
- 0041914606
- Gradient flow in recurrent nets: The difficulty of learning long-term dependencies
- Kremer, S.C, Kolen, J.F, eds, IEEE Press, Los Alamitos
- Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kremer, S.C., Kolen, J.F. (eds.) A Field Guide to Dynamical Recurrent Neural Networks. IEEE Press, Los Alamitos (2001)
- (2001) A Field Guide to Dynamical Recurrent Neural Networks
- Hochreiter, S.¹ Bengio, Y.² Frasconi, P.³ Schmidhuber, J.⁴

9
- 58849155146
- Schmidhuber, J.: RNN overview (2004), http://www.idsia.ch/~juergen/rnn. html
- (2004) RNN overview
- Schmidhuber, J.¹

10
- 0025503558
- Back propagation through time: What it does and how to do it
- Werbos, P.: Back propagation through time: What it does and how to do it. Proceedings of the IEEE 78, 1550-1560 (1990)
- (1990) Proceedings of the IEEE , vol.78 , pp. 1550-1560
- Werbos, P.¹

11
- 0004198169
- Dover Publications
- Chernoff, H., Moses, L.E.: Elementary Decision Theory. Dover Publications (1987)
- (1987) Elementary Decision Theory
- Chernoff, H.¹ Moses, L.E.²

12
- 0004279188
- 2nd edn. Springer, Heidelberg
- Kleinbaum, D.G., Klein, M., Pryor, E.R.: Logistic Regression, 2nd edn. Springer, Heidelberg (2002)
- (2002) Logistic Regression
- Kleinbaum, D.G.¹ Klein, M.² Pryor, E.R.³

13
- 21244494809
- Planning with predictive state representations
- James, M.R., Singh, S., Littman, M.L.: Planning with predictive state representations. In: Proceedings 2004 International Conference on Machine Learning and Applications, pp. 304-311 (2004)
- (2004) Proceedings 2004 International Conference on Machine Learning and Applications , pp. 304-311
- James, M.R.¹ Singh, S.² Littman, M.L.³

14
- 33749258231
- Learning predictive state representations using non-blind policies
- ACM, New York
- Bowling, M., McCracken, P., James, M., Neufeld, J., Wilkinson, D.: Learning predictive state representations using non-blind policies. In: ICML 2006: Proceeding's of the 23rd international conference on Machine learning, pp. 129-136. ACM, New York (2006)
- (2006) ICML 2006: Proceeding's of the 23rd international conference on Machine learning , pp. 129-136
- Bowling, M.¹ McCracken, P.² James, M.³ Neufeld, J.⁴ Wilkinson, D.⁵

15
- 84899015857
- Reinforcement learning with long short-term memory
- Bakker, B.: Reinforcement learning with long short-term memory. In: Advances in Neural Information Processing Syst., vol. 14 (2002)
- (2002) Advances in Neural Information Processing Syst , vol.14
- Bakker, B.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.