SCOPUS 정보 검색 플랫폼

Volumn FS-15-06, Issue , 2015, Pages 29-37

Deep recurrent q-learning for partially observable MDPs

Author keywords

[No Author keywords available]

Indexed keywords

COMPLEX NETWORKS; DECISION MAKING; INTELLIGENT AGENTS; REINFORCEMENT LEARNING;

COMPLEX TASK; DECISION POINTS; INPUT LAYERS; LEARNING TO PLAY; LIMITED MEMORY; PARTIAL OBSERVATION; Q-LEARNING; SINGLE FRAMES;

QUALITY CONTROL;

EID: 84964687570 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1150)

References (14)

2
- 84879976780
- The arcade learning environment: An evaluation platform for general agents
- Bellemare, M. G.; Naddaf, Y.; Veness, J. and Bowling, M. 2013. The arcade learning environment: An evaluation platform for general agents. Journal of Artificial Intelligence Research 47:253-279.
- (2013) Journal of Artificial Intelligence Research , vol.47 , pp. 253-279
- Bellemare, M.G.¹ Naddaf, Y.² Veness, J.³ Bowling, M.⁴

4
- 0031573117
- Long short-term memory
- Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural Comput. 9(8): 1735-1780.
- (1997) Neural Comput , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

5
- 84913555165
- arXiv preprint arXiv:1408.5093
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; and Darrell, T. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

6
- 84959876313
- arXiv preprint
- Karpathy, A.; Johnson, J.; and Li, F.-F. 2015. Visualizing and understanding recurrent networks. arXiv preprint.
- (2015) Visualizing and Understanding Recurrent Networks
- Karpathy, A.¹ Johnson, J.² Li, F.-F.³

7
- 84924051598
- Human-level control through deep reinforcement learning
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A. A.; Veness, J.; Bellemare, M. G.; Graves, A.; Riedmiller, M.; Fidjeland, A. K.; Ostrovski, G.; Petersen, S.; Beattie, C.; Sadik, A.; Antonoglou, I.; King, H.; Kumaran, D.; Wierstra, D.; Legg, S.; and Hassabis, D. 2015. Human-level control through deep reinforcement learning. Nature 518(7540):529-533.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

8
- 84959861546
- CoRR abs/1506.08941
- Narasimhan, K.; Kulkarni, T.; and Barzilay, R. 2015. Language understanding for text-based games using deep reinforcement learning. CoRR abs/1506.08941.
- (2015) Language Understanding for Text-based Games Using Deep Reinforcement Learning
- Narasimhan, K.¹ Kulkarni, T.² Barzilay, R.³

9
- 0004102479
- MIT Press
- Sutton, R. S., and Barto, A. G. 1998. Reinforcement Learning: An Introduction. MIT Press.
- (1998) Reinforcement Learning: An Introduction
- Sutton, R.S.¹ Barto, A.G.²

10
- 84893343292
- Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude
- Tieleman, T., and Hinton, G. 2012. Lecture 6.5-RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning.
- (2012) COURSERA: Neural Networks for Machine Learning
- Tieleman, T.¹ Hinton, G.²

11
- 0031143730
- An analysis of temporal-difference learning with function approximation
- Tsitsiklis, J. N. and Roy, B. V. 1997. An analysis of temporal-difference learning with function approximation. IEEE Transactions on Automatic Control 42(5):674-690.
- (1997) IEEE Transactions on Automatic Control , vol.42 , Issue.5 , pp. 674-690
- Tsitsiklis, J.N.¹ Roy, B.V.²

12
- 34249833101
- Q-learning
- Watkins, C. J. C. H. and Dayan, P. 1992. Q-learning. Machine Learning 8(3-4):279-292.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 279-292
- Watkins, C.J.C.H.¹ Dayan, P.²

13
- 84964597919
- Wierstra, D.; Foerster, A.; Peters, J.; and Schmidthuber, J. 2007. Solving deep memory POMDPs with recurrent policy gradients.
- (2007) Solving Deep Memory POMDPs with Recurrent Policy Gradients
- Wierstra, D.¹ Foerster, A.² Peters, J.³ Schmidthuber, J.⁴

14
- 84893382981
- CoRR abs/1212.5701
- Zeiler, M. D. 2012. ADADELTA: An adaptive learning rate method. CoRR abs/1212.5701.
- (2012) ADADELTA: An Adaptive Learning Rate Method
- Zeiler, M.D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.