SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 5212 LNAI, Issue PART 2, 2008, Pages 234-249

State-dependent exploration for policy gradient methods

(3) Rückstieß, Thomas a Felder, Martin a Schmidhuber, Jürgen a

a TECHNICAL UNIVERSITY OF MUNICH (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

DATABASE SYSTEMS; GRADIENT METHODS; LEARNING SYSTEMS; MAXIMUM LIKELIHOOD ESTIMATION; PROBLEM SOLVING; REINFORCEMENT; REINFORCEMENT LEARNING; ROBOT LEARNING; SOLUTIONS;

FASTER CONVERGENCES; GRADIENT ESTIMATORS; LIKELIHOOD RATIOS; POLICY GRADIENT METHODS; REINFORCEMENT LEARNING ALGORITHMS; TIME STEPS; TO MANY;

LEARNING ALGORITHMS;

EID: 56049089041 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-540-87481-2_16 Document Type: Conference Paper

Times cited : (61)

References (15)

1
- 34249833101
- Q-learning
- Watkins, C., Dayan, P.: Q-learning. Machine Learning 8(3), 279-292 (1992)
- (1992) Machine Learning , vol.8 , Issue.3 , pp. 279-292
- Watkins, C.¹ Dayan, P.²

2
- 0004102479
- MIT Press, Cambridge
- Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.¹ Barto, A.²

3
- 0029679044
- Reinforcement learning: A survey
- Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement learning: a survey. Journal of AI research 4, 237-285 (1996)
- (1996) Journal of AI research , vol.4 , pp. 237-285
- Kaelbling, L.P.¹ Littman, M.L.² Moore, A.W.³

4
- 0345161977
- PhD thesis, University of Amsterdam, IDSIA February
- Wiering, M.A.: Explorations in Efficient Reinforcement Learning. PhD thesis, University of Amsterdam / IDSIA (February 1999)
- (1999) Explorations in Efficient Reinforcement Learning
- Wiering, M.A.¹

5
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8. 229-256 (1992)
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

6
- 84898939480
- Policy gradient methods for reinforcement learning with function approximation
- Sutton. R.S., McAllester, D., Singh. S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems (2000)
- (2000) Advances in Neural Information Processing Systems
- Sutton, R.S.¹ McAllester, D.² Singh, S.³ Mansour, Y.⁴

7
- 34250635407
- Policy gradient methods for robotics
- Peters, J., Schaal, S.: Policy gradient methods for robotics. In: Proc. 2006 IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (2006)
- (2006) Proc. 2006 IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems
- Peters, J.¹ Schaal, S.²

8
- 0035391755
- Learning to trade via direct reinforcement
- Moody. J., Saffell, M.: Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks 12(4), 875-889 (2001)
- (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.4 , pp. 875-889
- Moody, J.¹ Saffell, M.²

9
- 0036082856
- Reinforcement learning for adaptive routing
- Peshkin, L., Savova, V.: Reinforcement learning for adaptive routing. In: Proc. 2002 Intl. Joint Conf. on Neural Networks (IJCNN 2002) (2002)
- (2002) Proc. 2002 Intl. Joint Conf. on Neural Networks (IJCNN
- Peshkin, L.¹ Savova, V.²

10
- 0003272616
- Reinforcement learning in POMDP's via direct gradient ascent
- Baxter, J., Bartlett, P.: Reinforcement learning in POMDP's via direct gradient ascent. In: Proc. 17th Intl. Conf. on Machine Learning, pp. 41-48 (2000)
- (2000) Proc. 17th Intl. Conf. on Machine Learning , pp. 41-48
- Baxter, J.¹ Bartlett, P.²

11
- 33646413135
- Natural actor-critic
- Peters. J., Vijayakumar, S., Schaal, S.: Natural actor-critic. In: Proceedings of the Sixteenth European Conference on Machine Learning (2005)
- (2005) Proceedings of the Sixteenth European Conference on Machine Learning
- Peters, J.¹ Vijayakumar, S.² Schaal, S.³

12
- 0032117046
- Implementation of the simultaneous perturbation algorithm for stochastic optimization
- Spall, J.C.: Implementation of the simultaneous perturbation algorithm for stochastic optimization. IEEE Transactions on Aerospace and Electronic Systems 34(3), 817-823 (1998)
- (1998) IEEE Transactions on Aerospace and Electronic Systems , vol.34 , Issue.3 , pp. 817-823
- Spall, J.C.¹

13
- 38149018611
- Solving deep memory POMDPs with recurrent policy gradients
- de Sá. J.M, Alexandre, L.A, Duch, W, Mandic, D.P, eds, ICANN 2007, Springer, Heidelberg
- Wierstra, D., Foerster, A., Peters, J., Schmidhuber, J.: Solving deep memory POMDPs with recurrent policy gradients. In: de Sá. J.M., Alexandre, L.A., Duch, W., Mandic, D.P. (eds.) ICANN 2007. LNCS, vol. 4668, pp. 697-706. Springer, Heidelberg (2007)
- (2007) LNCS , vol.4668 , pp. 697-706
- Wierstra, D.¹ Foerster, A.² Peters, J.³ Schmidhuber, J.⁴

14
- 0141819580
- PEGASUS: A policy search method for large MDPs and POMDPs
- Ng, A., Jordan, M.: PEGASUS: A policy search method for large MDPs and POMDPs. In: Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence, pp. 406-415 (2000)
- (2000) Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence , pp. 406-415
- Ng, A.¹ Jordan, M.²

15
- 14344253499
- Australian National University
- Aberdeen, D.: Policy-gradient Algorithms for Partially Observable Markov Decision Processes. Australian National University (2003)
- (2003) Policy-gradient Algorithms for Partially Observable Markov Decision Processes
- Aberdeen, D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.