SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Conference Track Proceedings

Volumn , Issue , 2017, Pages

Outrageously large neural networks: The sparsely-gated mixture-of-experts layer

(7) Shazeer, Noam a Mirhoseini, Azalia a Maziarz, Krzysztof b Davis, Andy a Le, Quoc a Hinton, Geoffrey a Dean, Jeff a

a GOOGLE INC (United States)

b JAGIELLONIAN UNIVERSITY (Poland)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL EFFICIENCY; COMPUTATIONAL LINGUISTICS; COMPUTER AIDED LANGUAGE TRANSLATION; MIXTURES; MODELING LANGUAGES; MULTILAYER NEURAL NETWORKS;

COMPUTATIONAL COSTS; LANGUAGE MODEL; MACHINE TRANSLATIONS; MIXTURE OF EXPERTS; MODEL ARCHITECTURE; PERFORMANCE CHALLENGES; STATE OF THE ART; TRAINING CORPUS;

LONG SHORT-TERM MEMORY;

EID: 85088226307 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1103)

References (44)

1
- 84958264664
- CoRR, abs/1603.04467
- Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Gregory S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian J. Good-fellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Józefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dan Mané, Rajat Monga, Sherry Moore, Derek Gordon Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul A. Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda B. Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. CoRR, abs/1603.04467, 2016. URL http://arxiv.org/abs/1603.04467.
- (2016) Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- Abadi, M.¹ Agarwal, A.² Barham, P.³ Brevdo, E.⁴ Chen, Z.⁵ Citro, C.⁶ Corrado, G.S.⁷ Davis, A.⁸ Dean, J.⁹ Devin, M.¹⁰ Ghemawat, S.¹¹ Good-Fellow, I.J.¹² Harp, A.¹³ Irving, G.¹⁴ Isard, M.¹⁵ Jia, Y.¹⁶ Józefowicz, R.¹⁷ Kaiser, L.¹⁸ Kudlur, M.¹⁹ Levenberg, J.²⁰ more..

2
- 85050983635
- CoRR, abs/1611.06194
- Rahaf Aljundi, Punarjay Chakravarty, and Tinne Tuytelaars. Expert gate: Lifelong learning with a network of experts. CoRR, abs/1611.06194, 2016. URL http://arxiv.org/abs/1611.06194.
- (2016) Expert Gate: Lifelong Learning with a Network of Experts
- Aljundi, R.¹ Chakravarty, P.² Tuytelaars, T.³

3
- 85020209763
- ArXiv e-prints, November
- A. Almahairi, N. Ballas, T. Cooijmans, Y. Zheng, H. Larochelle, and A. Courville. Dynamic Capacity Networks. ArXiv e-prints, November 2015.
- (2015) Dynamic Capacity Networks
- Almahairi, A.¹ Ballas, N.² Cooijmans, T.³ Zheng, Y.⁴ Larochelle, H.⁵ Courville, A.⁶

4
- 84971463350
- arXiv preprint
- Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jing-dong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, Erich Elsen, Jesse Engel, Linxi Fan, Christopher Fougner, Tony Han, Awni Y. Hannun, Billy Jun, Patrick LeGresley, Libby Lin, Sharan Narang, Andrew Y. Ng, Sherjil Ozair, Ryan Prenger, Jonathan Raiman, Sanjeev Satheesh, David Seetapun, Shubho Sengupta, Yi Wang, Zhiqian Wang, Chong Wang, Bo Xiao, Dani Yo-gatama, Jun Zhan, and Zhenyao Zhu. Deep speech 2: End-to-end speech recognition in english and mandarin. arXiv preprint arXiv:1512.02595, 2015.
- (2015) Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
- Amodei, D.¹ Anubhai, R.² Battenberg, E.³ Case, C.⁴ Casper, J.⁵ Catanzaro, B.⁶ Chen, J.-D.⁷ Chrzanowski, M.⁸ Coates, A.⁹ Diamos, G.¹⁰ Elsen, E.¹¹ Engel, J.¹² Fan, L.¹³ Fougner, C.¹⁴ Han, T.¹⁵ Hannun, A.Y.¹⁶ Jun, B.¹⁷ LeGresley, P.¹⁸ Lin, L.¹⁹ Narang, S.²⁰ more..

5
- 84922389693
- arXiv preprint
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014.
- (2014) Neural Machine Translation by Jointly Learning to Align and Translate
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

6
- 85015392848
- arXiv preprint
- Emmanuel Bengio, Pierre-Luc Bacon, Joelle Pineau, and Doina Precup. Conditional computation in neural networks for faster models. arXiv preprint arXiv:1511.06297, 2015.
- (2015) Conditional Computation in Neural Networks for Faster Models
- Bengio, E.¹ Bacon, P.-L.² Pineau, J.³ Precup, D.⁴

7
- 84919825195
- arXiv preprint
- Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv preprint arXiv:1308.3432, 2013.
- (2013) Estimating or Propagating Gradients through Stochastic Neurons for Conditional Computation
- Bengio, Y.¹ Léonard, N.² Courville, A.³

8
- 84943795466
- arXiv preprint
- Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, and Tony Robinson. One billion word benchmark for measuring progress in statistical language modeling. arXiv preprint arXiv:1312.3005, 2013.
- (2013) One Billion Word Benchmark for Measuring Progress in Statistical Language Modeling
- Chelba, C.¹ Mikolov, T.² Schuster, M.³ Ge, Q.⁴ Brants, T.⁵ Koehn, P.⁶ Robinson, T.⁷

9
- 85055128375
- ArXiv e-prints, June
- K. Cho and Y. Bengio. Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning. ArXiv e-prints, June 2014.
- (2014) Exponentially Increasing the Capacity-to-Computation Ratio for Conditional Computation in Deep Learning
- Cho, K.¹ Bengio, Y.²

10
- 0036583160
- A parallel mixture of SVMs for very large scale problems
- Ronan Collobert, Samy Bengio, and Yoshua Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computing, 2002.
- (2002) Neural Computing
- Collobert, R.¹ Bengio, S.² Bengio, Y.³

11
- 84996708851
- arXiv preprint
- Andrew Davis and Itamar Arel. Low-rank approximations for conditional feedforward computation in deep neural networks. arXiv preprint arXiv:1312.4461, 2013.
- (2013) Low-Rank Approximations for Conditional Feedforward Computation in Deep Neural Networks
- Davis, A.¹ Arel, I.²

12
- 84969832855
- Distributed Gaussian processes
- Marc Peter Deisenroth and Jun Wei Ng. Distributed Gaussian processes. In ICML, 2015.
- (2015) ICML
- Deisenroth, M.P.¹ Ng, J.W.²

13
- 80052393597
- John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization, 2010.
- (2010) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Duchi, J.¹ Hazan, E.² Singer, Y.³

14
- 85122622861
- Edinburgh's phrase-based machine translation systems for wmt-14
- Nadir Durrani, Barry Haddow, Philipp Koehn, and Kenneth Heafield. Edinburgh's phrase-based machine translation systems for wmt-14. In Proceedings of the Ninth Workshop on Statistical Machine Translation, 2014.
- (2014) Proceedings of the Ninth Workshop on Statistical Machine Translation
- Durrani, N.¹ Haddow, B.² Koehn, P.³ Heafield, K.⁴

15
- 85031916030
- arXiv preprint
- David Eigen, Marc'Aurelio Ranzato, and Ilya Sutskever. Learning factored representations in a deep mixture of experts. arXiv preprint arXiv:1312.4314, 2013.
- (2013) Learning Factored Representations in a Deep Mixture of Experts
- Eigen, D.¹ Ranzato, M.² Sutskever, I.³

16
- 85070944488
- staff.science.uva.nl/c.monz
- Ekaterina Garmash and Christof Monz. Ensemble learning for multi-source neural machine translation. In staff.science.uva.nl/c.monz, 2016.
- (2016) Ensemble Learning for Multi-Source Neural Machine Translation
- Garmash, E.¹ Monz, C.²

17
- 0034293152
- Learning to forget: Continual prediction with lstm
- Felix A. Gers, Jürgen A. Schmidhuber, and Fred A. Cummins. Learning to forget: Continual prediction with lstm. Neural Computation, 2000.
- (2000) Neural Computation
- Gers, F.A.¹ Schmidhuber, J.A.² Cummins, F.A.³

18
- 85044255396
- CoRR, abs/1606.03401
- Audrunas Gruslys, Rémi Munos, Ivo Danihelka, Marc Lanctot, and Alex Graves. Memory-efficient backpropagation through time. CoRR, abs/1606.03401, 2016. URL http://arxiv.org/abs/1606.03401.
- (2016) Memory-Efficient Backpropagation through Time
- Gruslys, A.¹ Munos, R.² Danihelka, I.³ Lanctot, M.⁴ Graves, A.⁵

19
- 84958589374
- Deep residual learning for image recognition
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 2015.
- (2015) IEEE Conference on Computer Vision and Pattern Recognition
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

20
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Geoffrey Hinton, Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine, 2012.
- (2012) IEEE Signal Processing Magazine
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

21
- 0031573117
- Long short-term memory
- Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural Computation, 1997.
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

22
- 84964923476
- arXiv preprint
- Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
- (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Ioffe, S.¹ Szegedy, C.²

23
- 0001940458
- Adaptive mixtures of local experts
- Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive mixtures of local experts. Neural Computing, 1991.
- (1991) Neural Computing
- Jacobs, R.A.¹ Jordan, M.I.² Nowlan, S.J.³ Hinton, G.E.⁴

24
- 85014030168
- CoRR, abs/1611.04558
- Melvin Johnson, Mike Schuster, Quoc V. Le, Maxim Krikun, Yonghui Wu, Zhifeng Chen, Nikhil Thorat, Fernanda B. Viégas, Martin Wattenberg, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google's multilingual neural machine translation system: Enabling zero-shot translation. CoRR, abs/1611.04558, 2016. URL http://arxiv.org/abs/1611.04558.
- (2016) Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
- Johnson, M.¹ Schuster, M.² Le, Q.V.³ Krikun, M.⁴ Wu, Y.⁵ Chen, Z.⁶ Thorat, N.⁷ Viégas, F.B.⁸ Wattenberg, M.⁹ Corrado, G.¹⁰ Hughes, M.¹¹ Dean, J.¹²

25
- 0000262562
- Hierarchical mixtures of experts and the EM algorithm
- Michael I. Jordan and Robert A. Jacobs. Hierarchical mixtures of experts and the EM algorithm. Neural Computing, 1994.
- (1994) Neural Computing
- Jordan, M.I.¹ Jacobs, R.A.²

26
- 84978840213
- arXiv preprint
- Rafal Jozefowicz, Oriol Vinyals, Mike Schuster, Noam Shazeer, and Yonghui Wu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
- (2016) Exploring the Limits of Language Modeling
- Jozefowicz, R.¹ Vinyals, O.² Schuster, M.³ Shazeer, N.⁴ Wu, Y.⁵

27
- 85083951076
- ADaM: A method for stochastic optimization
- Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- (2015) ICLR
- Kingma, D.¹ Ba, J.²

28
- 84889316678
- Reinhard Kneser and Hermann. Ney. Improved backingoff for m-gram language modeling., 1995.
- (1995) Improved Backingoff for M-Gram Language Modeling
- Kneser, R.¹ Hermann, N.²

29
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

30
- 84867135575
- Building high-level features using large scale unsupervised learning
- Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng. Building high-level features using large scale unsupervised learning. In ICML, 2012.
- (2012) ICML
- Le, Q.V.¹ Ranzato, M.² Monga, R.³ Devin, M.⁴ Chen, K.⁵ Corrado, G.S.⁶ Dean, J.⁷ Ng, A.Y.⁸

31
- 84994129100
- arXiv preprint
- Patrick Gallinari Ludovic Denoyer. Deep sequential neural network. arXiv preprint arXiv:1410.0510, 2014.
- (2014) Deep Sequential Neural Network
- Denoyer, P.G.L.¹

32
- 84959874994
- Effective approaches to attention-based neural machine translation
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. Effective approaches to attention-based neural machine translation. EMNLP, 2015a.
- (2015) EMNLP
- Luong, M.-T.¹ Pham, H.² Manning, C.D.³

33
- 84943804979
- Addressing the rare word problem in neural machine translation
- Minh-Thang Luong, Ilya Sutskever, Quoc V. Le, Oriol Vinyals, and Wojciech Zaremba. Addressing the rare word problem in neural machine translation. ACL, 2015b.
- (2015) ACL
- Luong, M.-T.¹ Sutskever, I.² Le, Q.V.³ Vinyals, O.⁴ Zaremba, W.⁵

34
- 84896062664
- Infinite mixtures of Gaussian process experts
- Carl Edward Rasmussen and Zoubin Ghahramani. Infinite mixtures of Gaussian process experts. NIPS, 2002.
- (2002) NIPS
- Rasmussen, C.E.¹ Ghahramani, Z.²

35
- 84910046405
- Long short-term memory recurrent neural network architectures for large scale acoustic modeling
- Hasim Sak, Andrew W Senior, and Françoise Beaufays. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In INTERSPEECH, pp. 338-342, 2014.
- (2014) INTERSPEECH , pp. 338-342
- Sak, H.¹ Senior, A.W.² Beaufays, F.³

36
- 84867608922
- Japanese and Korean voice search
- Mike Schuster and Kaisuke Nakajima. Japanese and Korean voice search. ICASSP, 2012.
- (2012) ICASSP
- Schuster, M.¹ Nakajima, K.²

37
- 70349425847
- Nonlinear models using dirichlet process mixtures
- Babak Shahbaba and Radford Neal. Nonlinear models using dirichlet process mixtures. JMLR, 2009.
- (2009) JMLR
- Shahbaba, B.¹ Neal, R.²

38
- 84928547704
- Sequence to sequence learning with neural networks
- Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
- (2014) NIPS
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

39
- 84965111399
- Generative image modeling using spatial LSTMs
- Lucas Theis and Matthias Bethge. Generative image modeling using spatial LSTMs. In NIPS, 2015.
- (2015) NIPS
- Theis, L.¹ Bethge, M.²

40
- 84898983832
- Mixtures of Gaussian processes
- Volker Tresp. Mixtures of Gaussian Processes. In NIPS, 2001.
- (2001) NIPS
- Tresp, V.¹

41
- 85018271332
- arXiv preprint
- Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, and Jeffrey Dean. Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
- (2016) Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
- Wu, Y.¹ Schuster, M.² Chen, Z.³ Le, Q.V.⁴ Norouzi, M.⁵ Macherey, W.⁶ Krikun, M.⁷ Cao, Y.⁸ Gao, Q.⁹ Macherey, K.¹⁰ Klingner, J.¹¹ Shah, A.¹² Johnson, M.¹³ Liu, X.¹⁴ Kaiser, Ł.¹⁵ Gouws, S.¹⁶ Kato, Y.¹⁷ Kudo, T.¹⁸ Kazawa, H.¹⁹ Stevens, K.²⁰ more..

42
- 84858727499
- Hierarchical mixture of classification experts uncovers interactions between brain regions
- Bangpeng Yao, Dirk Walther, Diane Beck, and Li Fei-fei. Hierarchical mixture of classification experts uncovers interactions between brain regions. In NIPS. 2009.
- (2009) NIPS
- Yao, B.¹ Walther, D.² Beck, D.³ Li, F.-F.⁴

43
- 84944053926
- arXiv preprint
- Wojciech Zaremba, Ilya Sutskever, and Oriol Vinyals. Recurrent neural network regularization. arXiv preprint arXiv:1409.2329, 2014.
- (2014) Recurrent Neural Network Regularization
- Zaremba, W.¹ Sutskever, I.² Vinyals, O.³

44
- 85040594930
- arXiv preprint
- Jie Zhou, Ying Cao, Xuguang Wang, Peng Li, and Wei Xu. Deep recurrent models with fast-forward connections for neural machine translation. arXiv preprint arXiv:1606.04199, 2016.
- (2016) Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translation
- Zhou, J.¹ Cao, Y.² Wang, X.³ Li, P.⁴ Xu, W.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.