SCOPUS 정보 검색 플랫폼

4th International Conference on Learning Representations, ICLR 2016 - Conference Track Proceedings

Volumn , Issue , 2016, Pages

Policy distillation

(9) Rusu, Andrei A a Colmenarejo, Sergio Gómez a Gülçehre, Çaglar a,b Desjardins, Guillaume a Kirkpatrick, James a Pascanu, Razvan a Mnih, Volodymyr a Kavukcuoglu, Koray a Hadsell, Raia a

a DEEPMIND (United Kingdom)

b UNIVERSITÉ DE MONTRÉAL (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

DEEP LEARNING; DISTILLATION; MACHINE LEARNING; VISION;

MULTIPLE TASKS; REINFORCEMENT LEARNING AGENT; VISUAL TASKS;

REINFORCEMENT LEARNING;

EID: 85083952240 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (262)

References (27)

1
- 84937961091
- Do deep nets really need to be deep?
- Curran Associates, Inc
- Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems (NIPS), pages 2654–2662. Curran Associates, Inc., 2014.
- (2014) Advances in Neural Information Processing Systems (NIPS) , pp. 2654-2662
- Ba, J.¹ Caruana, R.²

2
- 84986214645
- Wiley-IEEE Press
- A. G. Barto and T. G. Dietterich. Handbook of learning and approximate dynamic programming. Wiley-IEEE Press, 2004.
- (2004) Handbook of Learning and Approximate Dynamic Programming
- Barto, A.G.¹ Dietterich, T.G.²

3
- 33749545215
- Model compression
- Cristian Bucila, Rich Caruana, and Alexandru Niculescu-Mizil. Model compression. In KDD, pages 535–541. ACM, 2006.
- (2006) KDD , pp. 535-541
- Bucila, C.¹ Caruana, R.² Niculescu-Mizil, A.³

4
- 0031189914
- Multitask learning
- July
- Rich Caruana. Multitask learning. Mach. Learn., 28(1):41–75, July 1997.
- (1997) Mach. Learn. , vol.28 , Issue.1 , pp. 41-75
- Caruana, R.¹

5
- 84973315473
- arXiv preprint
- William Chan, Nan Rosemary Ke, and Ian Lane. Transferring knowledge from a rnn to a dnn. arXiv preprint arXiv:1504.01483, 2015.
- (2015) Transferring Knowledge from A Rnn to A Dnn
- Chan, W.¹ Ke, N.R.² Lane, I.³

6
- 84906829249
- Generalized classification-based approximate policy iteration
- Amir-massoud Farahmand, Doina Precup, and Mohammad Ghavamzadeh. Generalized classification-based approximate policy iteration. In Tenth European Workshop on Reinforcement Learning (EWRL), volume 2, 2012.
- (2012) Tenth European Workshop on Reinforcement Learning (EWRL) , vol.2
- Farahmand, A.-M.¹ Precup, D.² Ghavamzadeh, M.³

7
- 0022681148
- How not to lie with statistics: The correct way to summarize benchmark results
- March
- Philip J. Fleming and John J. Wallace. How not to lie with statistics: The correct way to summarize benchmark results. Commun. ACM, 29(3):218–221, March 1986.
- (1986) Commun. ACM , vol.29 , Issue.3 , pp. 218-221
- Fleming, P.J.¹ Wallace, J.J.²

8
- 84897694817
- Variance reduction techniques for gradient estimates in reinforcement learning
- Evan Greensmith, Peter L. Bartlett, and Jonathan Baxter. Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research (JMLR), pages 1471–1530, 2004.
- (2004) Journal of Machine Learning Research (JMLR) , pp. 1471-1530
- Greensmith, E.¹ Bartlett, P.L.² Baxter, J.³

9
- 84937779024
- Deep learning for real-time atari game play using offline monte-carlo tree search planning
- Xiaoxiao Guo, Satinder P. Singh, Honglak Lee, Richard L. Lewis, and Xiaoshi Wang. Deep learning for real-time atari game play using offline monte-carlo tree search planning. In Advances in Neural Information Processing Systems (NIPS), pages 3338–3346, 2014.
- (2014) Advances in Neural Information Processing Systems (NIPS) , pp. 3338-3346
- Guo, X.¹ Singh, S.P.² Lee, H.³ Lewis, R.L.⁴ Wang, X.⁵

10
- 84959176782
- Distilling the knowledge in a neural network
- G. Hinton, O. Vinyals, and J. Dean. Distilling the Knowledge in a Neural Network. Deep Learning and Representation Learning Workshop, NIPS, 2014.
- (2014) Deep Learning and Representation Learning Workshop, NIPS
- Hinton, G.¹ Vinyals, O.² Dean, J.³

11
- 33750293964
- Bandit based monte-carlo planning
- Springer
- Levente Kocsis and Csaba Szepesvári. Bandit based monte-carlo planning. In Machine Learning: ECML 2006, pages 282–293. Springer, 2006.
- (2006) Machine Learning: ECML 2006 , pp. 282-293
- Kocsis, L.¹ Szepesvári, C.²

12
- 77956523230
- Analysis of a classification-based policy iteration algorithm
- Omni-press
- Alessandro Lazaric, Mohammad Ghavamzadeh, and Rémi Munos. Analysis of a classification-based policy iteration algorithm. In ICML-27th International Conference on Machine Learning, pages 607–614. Omni-press, 2010.
- (2010) ICML-27th International Conference on Machine Learning , pp. 607-614
- Lazaric, A.¹ Ghavamzadeh, M.² Munos, R.³

13
- 84910035297
- Learning small-size dnn with output-distribution-based criteria
- Jinyu Li, Rui Zhao, Jui-Ting Huang, and Yifan Gong. Learning small-size dnn with output-distribution-based criteria. In Proc. Interspeech, 2014.
- (2014) Proc. Interspeech
- Li, J.¹ Zhao, R.² Huang, J.-T.³ Gong, Y.⁴

14
- 56449106838
- Structure compilation: Trading structure for features
- Percy Liang, Hal Daum III, and Dan Klein. Structure compilation: trading structure for features. In Proceedings of International Conference on Machine Learning (ICML), 2008.
- (2008) Proceedings of International Conference on Machine Learning (ICML)
- Liang, P.¹ Daum, H.² Klein, D.³

15
- 65249159583
- Improving supervised learning by adapting the problem to the learner
- Joshua Menke and Tony Martinez. Improving supervised learning by adapting the problem to the learner. International Journal of Neural Systems, 19(01):1–9, 2009.
- (2009) International Journal of Neural Systems , vol.19 , Issue.1 , pp. 1-9
- Menke, J.¹ Martinez, T.²

16
- 84904867557
- Playing atari with deep reinforcement learning
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin A. Riedmiller. Playing atari with deep reinforcement learning. Deep Learning Workshop, NIPS, 2013.
- (2013) Deep Learning Workshop, NIPS
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Graves, A.⁴ Antonoglou, I.⁵ Wierstra, D.⁶ Riedmiller, M.A.⁷

17
- 84924051598
- Human-level control through deep reinforcement learning
- 02
- Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hass-abis. Human-level control through deep reinforcement learning. Nature, 518(7540):529–533, 02 2015.
- (2015) Nature , vol.518 , Issue.7540 , pp. 529-533
- Mnih, V.¹ Kavukcuoglu, K.² Silver, D.³ Rusu, A.A.⁴ Veness, J.⁵ Bellemare, M.G.⁶ Graves, A.⁷ Riedmiller, M.⁸ Fidjeland, A.K.⁹ Ostrovski, G.¹⁰ Petersen, S.¹¹ Beattie, C.¹² Sadik, A.¹³ Antonoglou, I.¹⁴ King, H.¹⁵ Kumaran, D.¹⁶ Wierstra, D.¹⁷ Legg, S.¹⁸ Hassabis, D.¹⁹

18
- 85007207440
- Massively parallel methods for deep reinforcement learning
- Arun Nair, Praveen Srinivasan, Sam Blackwell, Cagdas Alcicek, Rory Fearon, Alessandro De Maria, Vedavyas Panneershelvam, Mustafa Suleyman, Charles Beattie, Stig Petersen, Shane Legg, Volodymyr Mnih, Koray Kavukcuoglu, and David Silver. Massively parallel methods for deep reinforcement learning. CoRR, abs/1507.04296, 2015.
- (2015) CoRR
- Nair, A.¹ Srinivasan, P.² Blackwell, S.³ Alcicek, C.⁴ Fearon, R.⁵ De Maria, A.⁶ Panneershelvam, V.⁷ Suleyman, M.⁸ Beattie, C.⁹ Petersen, S.¹⁰ Legg, S.¹¹ Mnih, V.¹² Kavukcuoglu, K.¹³ Silver, D.¹⁴

19
- 84964544562
- arXiv preprint
- Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550, 2014.
- (2014) Fitnets: Hints for Thin Deep Nets
- Romero, A.¹ Ballas, N.² Kahou, S.E.³ Chassang, A.⁴ Gatta, C.⁵ Bengio, Y.⁶

20
- 84899437369
- arXiv preprint
- Stéphane Ross, Geoffrey J Gordon, and J Andrew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. arXiv preprint arXiv:1011.0686, 2010.
- (2010) A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
- Ross, S.¹ Gordon, G.J.² Andrew Bagnell, J.³

21
- 84986255124
- arXiv e-prints, November
- S. Shalev-Shwartz. SelfieBoost: A Boosting Algorithm for Deep Learning. ArXiv e-prints, November 2014.
- (2014) SelfieBoost: A Boosting Algorithm for Deep Learning
- Shalev-Shwartz, S.¹

22
- 0004102479
- MIT Press, Cambridge, MA, USA, 1st edition
- Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. MIT Press, Cambridge, MA, USA, 1st edition, 1998.
- (1998) Introduction to Reinforcement Learning
- Sutton, R.S.¹ Barto, A.G.²

23
- 84986192610
- arXiv preprint
- Zhiyuan Tang, Dong Wang, Yiqiao Pan, and Zhiyong Zhang. Knowledge transfer pre-training. arXiv preprint arXiv:1506.02256, 2015.
- (2015) Knowledge Transfer Pre-Training
- Tang, Z.¹ Wang, D.² Pan, Y.³ Zhang, Z.⁴

24
- 84893343292
- Lecture 6.5—rMSprop: Divide the gradient by a running average of its recent magnitude
- T. Tieleman and G. Hinton. Lecture 6.5—RmsProp: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 2012.
- (2012) COURSERA: Neural Networks for Machine Learning
- Tieleman, T.¹ Hinton, G.²

25
- 57249084011
- Visualizing high-dimensional data using t-sne
- L.J.P. van der Maaten and G.E. Hinton. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research (JMLR), 2008.
- (2008) Journal of Machine Learning Research (JMLR)
- Van Der Maaten, L.J.P.¹ Hinton, G.E.²

26
- 85053776595
- Deep reinforcement learning with double q-learning
- Hado van Hasselt, Arthur Guez, and David Silver. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI Conference on Artificial Intelligence.
- Proceedings of the AAAI Conference on Artificial Intelligence
- Van Hasselt, H.¹ Guez, A.² Silver, D.³

27
- 84986224754
- arXiv preprint
- Dong Wang, Chao Liu, Zhiyuan Tang, Zhiyong Zhang, and Mengyuan Zhao. Recurrent neural network training with dark knowledge transfer. arXiv preprint arXiv:1505.04630, 2015.
- (2015) Recurrent Neural Network Training with Dark Knowledge Transfer
- Wang, D.¹ Liu, C.² Tang, Z.³ Zhang, Z.⁴ Zhao, M.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.