SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 1058-1062

1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs

(5) Seide, Frank a Fu, Hao a,b Droppo, Jasha c Li, Gang a Yu, Dong c

a MICROSOFT RESEARCH ASIA (China)

b TSINGHUA UNIVERSITY (China)

c MICROSOFT RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ION BEAMS; PROGRAM PROCESSORS; STOCHASTIC SYSTEMS;

COMPUTATION SPEED; DEEP NEURAL NETWORKS; DOUBLE BUFFERING; FRAMES PER SECONDS; ITS APPLICATIONS; LOSS OF ACCURACY; QUANTIZATION ERRORS; STOCHASTIC GRADIENT DESCENT;

SPEECH COMMUNICATION;

EID: 84910069984 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1108)

References (31)

1
- 84865713025
- Roles of pretraining and fine-tuning in context-dependent DNN-HMMS for real-world speech recognition
- Dec
- D. Yu, L. Deng, and G. Dahl, "Roles of Pretraining and Fine-Tuning in Context-Dependent DNN-HMMs for Real-World Speech Recognition, " NIPS Workshop on Deep Learning and Unsupervised Feature Learning, Dec. 2010.
- (2010) NIPS Workshop on Deep Learning and Unsupervised Feature Learning
- Yu, D.¹ Deng, L.² Dahl, G.³

2
- 84865801985
- Conversational speech transcription using context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, "Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, " Interspeech, 2011.
- (2011) Interspeech
- Seide, F.¹ Li, G.² Yu, D.³

3
- 84867135575
- Building high- level features using large scale unsupervised learning
- Q.-V. Le, M.-A. Ranzato, R. Monga, M. Devin, K. Chen, G.-S. Corrado, J. Dean, and A.-Y. Ng, "Building High- Level Features Using Large Scale Unsupervised Learning, " ICML, 2012.
- (2012) ICML
- Le, Q.-V.¹ Ranzato, M.-A.² Monga, R.³ Devin, M.⁴ Chen, K.⁵ Corrado, G.-S.⁶ Dean, J.⁷ Ng, A.-Y.⁸

4
- 85162467517
- arXiv preprint arXiv:1106.5730
- F. Niu, B. Recht, C. Re, and S. J. Wright, "Hogwild!: A lock-free approach to parallelizing stochastic gradient descent, " arXiv preprint arXiv:1106.5730, 2011.
- (2011) Hogwild!: A Lock-free Approach to Parallelizing Stochastic Gradient Descent
- Niu, F.¹ Recht, B.² Re, C.³ Wright, S.J.⁴

5
- 84877760312
- Large scale distributed deep networks
- J. Dean, G. S. Corrado, R. Monga, K. Chen, M. Devin, Q. V. Le, M. Z. Mao, M.-A. Ranzato, A. Senior, P. Tucker, K. Yang, A. Y. Ng, "Large Scale Distributed Deep Networks, " NIPS, 2012.
- (2012) NIPS
- Dean, J.¹ Corrado, G.S.² Monga, R.³ Chen, K.⁴ Devin, M.⁵ Le, Q.V.⁶ Mao, M.Z.⁷ Ranzato, M.-A.⁸ Senior, A.⁹ Tucker, P.¹⁰ Yang, K.¹¹ Ng, A.Y.¹²

6
- 84897484337
- Deep learning with COTS HPC systems
- A. Coates, B. Huval, T. Wang, D.-J. Wu, and A.-Y. Ng, "Deep Learning with COTS HPC Systems, " ICML, 2013.
- (2013) ICML
- Coates, A.¹ Huval, B.² Wang, T.³ Wu, D.-J.⁴ Ng, A.-Y.⁵

7
- 84890512601
- Asynchronous stochastic gradient descent for DNN training
- S. Zhang, C. Zhang, Z. You, R. Zheng, and B. Xu, "Asynchronous Stochastic Gradient Descent for DNN Training, " ICASSP, 2013.
- (2013) ICASSP
- Zhang, S.¹ Zhang, C.² You, Z.³ Zheng, R.⁴ Xu, B.⁵

8
- 84905269646
- On parallelizability of stochastic gradient descent for speech DNNs
- F. Seide, H. Fu, J. Droppo, G. Li, D. Yu, "On Parallelizability of Stochastic Gradient Descent for Speech DNNs, " ICASSP 2014.
- (2014) ICASSP
- Seide, F.¹ Fu, H.² Droppo, J.³ Li, G.⁴ Yu, D.⁵

9
- 84910105867
- "Delta-Sigma Modulation, " Wikipedia, http://en.wikipedia.org/wiki/Delta-sigmamodulation.
- Delta-Sigma Modulation

10
- 0028464214
- Context-dependent connectionist probabilty estimatation in a hybrid hidden markov model- neural net speech recognition system
- H. Franco et al., "Context-Dependent Connectionist Probabilty Estimatation in a Hybrid Hidden Markov Model- Neural Net Speech Recognition System, " Computer Speech and Language, vol. 8, pp. 211-222, 1994.
- (1994) Computer Speech and Language , vol.8 , pp. 211-222
- Franco, H.¹

11
- 84890496567
- A cluster- based multiple deep neural networks method for large vocabulary continuous speech recognition
- P. Zhou, C. Liu, Q. Liu, L. Dai, and H. Jiang, "A Cluster- Based Multiple Deep Neural Networks Method for Large Vocabulary Continuous Speech Recognition, " ICASSP, 2013.
- (2013) ICASSP
- Zhou, P.¹ Liu, C.² Liu, Q.³ Dai, L.⁴ Jiang, H.⁵

12
- 84886829539
- Optimization techniques to improve training speed of deep neural networks for large speech tasks
- Nov
- T.-N. Sainath, B. Kingsbury, H. Soltau, and B. Ramabhadran, "Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks, " IEEE Trans. on Audio, Speech, and Language Processing, Vol. 21, No. 11, Nov. 2013.
- (2013) IEEE Trans. on Audio, Speech, and Language Processing , vol.21 , Issue.11
- Sainath, T.-N.¹ Kingsbury, B.² Soltau, H.³ Ramabhadran, B.⁴

13
- 84906227589
- Restructuring of deep neural network acoustic models with singular value decomposition
- J. Xue, J. Li, and Y. Gong, "Restructuring of Deep Neural Network Acoustic Models with Singular Value Decomposition, " Interspeech 2013.
- (2013) Interspeech
- Xue, J.¹ Li, J.² Gong, Y.³

14
- 84906237512
- Investigations on hessian- free optimization for cross-entropy training of deep neural networks
- S. Wiesler, J. Li, and J. Xue, "Investigations on Hessian- Free Optimization for Cross-Entropy Training of Deep Neural Networks, " Interspeech, 2013.
- (2013) Interspeech
- Wiesler, S.¹ Li, J.² Xue, J.³

15
- 84878379108
- Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
- B. Kingsbury, T. Sainath, and H. Soltau, "Scalable Minimum Bayes Risk Training of Deep Neural Network Acoustic Models Using Distributed Hessian-free Optimization, " Interspeech, 2012.
- (2012) Interspeech
- Kingsbury, B.¹ Sainath, T.² Soltau, H.³

16
- 84905239342
- Improving deep neural network acoustic models using generalized maxout networks
- X. Zhang, J. Trmal, D. Povey, S. Khudanpur, "Improving Deep Neural Network Acoustic Models Using Generalized Maxout Networks, " ICASSP 2014.
- (2014) ICASSP
- Zhang, X.¹ Trmal, J.² Povey, D.³ Khudanpur, S.⁴

17
- 80051762104
- Distributed optimization and statistical learning via the alternating direction method of multipliers
- S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, "Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers, " in Foundations and Trends in Machine Learning, Vol. 3, No. 1 (2010) 1-122.
- (2010) Foundations and Trends in Machine Learning , vol.3 , Issue.1 , pp. 1-122
- Boyd, S.¹ Parikh, N.² Chu, E.³ Peleato, B.⁴ Eckstein, J.⁵

18
- 84910105013
- U.S. Patent Application, filed on 4/8/2014
- Q. Huo, Z. Yan, K. Chen, "Deep Learning Using Alternating Direction Method of Multipliers, " U.S. Patent Application, filed on 4/8/2014.
- Deep Learning Using Alternating Direction Method of Multipliers
- Huo, Q.¹ Yan, Z.² Chen, K.³

19
- 84878397276
- Pipelined back-propagation for context-dependent deep neural networks
- X. Chen, A. Eversole, G. Li, D. Yu, and F. Seide, "Pipelined Back-Propagation for Context-Dependent Deep Neural Networks, " Interspeech, 2012.
- (2012) Interspeech
- Chen, X.¹ Eversole, A.² Li, G.³ Yu, D.⁴ Seide, F.⁵

20
- 0003626284
- Spartan Books, Wash. DC
- F. Rosenblatt, "Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms", Spartan Books, Wash. DC, 1961.
- (1961) Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms
- Rosenblatt, F.¹

21
- 84865744276
- Context- dependent pre-trained deep neural networks for large vocabulary speech recognition
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Context- Dependent Pre-Trained Deep Neural Networks for Large Vocabulary Speech Recognition, " IEEE Trans. Speech and Audio Proc., Special Issue on Deep Learning for Speech and Language Processing, 2011.
- (2011) IEEE Trans. Speech and Audio Proc., Special Issue on Deep Learning for Speech and Language Processing
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

22
- 33745805403
- A fast learning algorithm for deep belief nets
- G. Hinton, S. Osindero, and Y. Teh, "A Fast Learning Algorithm for Deep Belief Nets", Neural Computation, vol. 18, pp. 1527-1554, 2006.
- (2006) Neural Computation , vol.18 , pp. 1527-1554
- Hinton, G.¹ Osindero, S.² Teh, Y.³

23
- 84858976070
- Feature engineering in context-dependent deep neural networks for conversational speech transcription
- F. Seide, G. Li, X. Chen, and D. Yu, "Feature Engineering in Context-Dependent Deep Neural Networks for Conversational Speech Transcription, " Proc. ASRU, Waikoloa Village, 2011.
- (2011) Proc. ASRU, Waikoloa Village
- Seide, F.¹ Li, G.² Chen, X.³ Yu, D.⁴

24
- 70349213445
- Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
- B. Kingsbury, "Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling, " ICASSP, 2009.
- (2009) ICASSP
- Kingsbury, B.¹

25
- 0022471098
- Learning representations by back-propagating errors
- Oct
- D. Rumelhart, G. Hinton, and R. Williams, "Learning Representations By Back-Propagating Errors, " Nature, vol. 323, Oct. 1986.
- (1986) Nature , vol.323
- Rumelhart, D.¹ Hinton, G.² Williams, R.³

26
- 77956541496
- Deep learning via Hessian-free optimization
- J. Martens, "Deep learning via Hessian-free optimization, " ICML, 2010.
- (2010) ICML
- Martens, J.¹

27
- 84943274699
- A direct adaptive method for faster backpropagation learning: The Rprop algorithm
- M. Riedmiller and H. Braun, "A direct adaptive method for faster backpropagation learning: The Rprop algorithm, " International Conference on Neural Networks, 1993.
- (1993) International Conference on Neural Networks
- Riedmiller, M.¹ Braun, H.²

28
- 84859053384
- Switchboard-1 release 2
- Philadelphia
- J. Godfrey and E. Holliman, "Switchboard-1 Release 2, " Linguistic Data Consortium, Philadelphia, 1997.
- (1997) Linguistic Data Consortium
- Godfrey, J.¹ Holliman, E.²

29
- 84910105866
- "Evaluation Campaign, " IWSLT 2013, http://www.iwslt2013.org/59.php.
- (2013) Evaluation Campaign

30
- 80052393597
- J. Duchi, E. Hazan, and Y. Singer, "Adaptive Subgradient Methods for Online Learning and Stochastic Optimization, " http://www.cs.berkeley.edu/-jduchi/projects/DuchiHaSi10.pdf, 2010.
- (2010) Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
- Duchi, J.¹ Hazan, E.² Singer, Y.³

31
- 84887388950
- An empirical study of learning rates in deep neural networks for speech recognition
- A. Senior, G. Heigold, M.-A. Ranzato, K. Yang, "An Empirical Study of Learning Rates in Deep Neural Networks for Speech Recognition, " ICASSP, 2013.
- (2013) ICASSP
- Senior, A.¹ Heigold, G.² Ranzato, M.-A.³ Yang, K.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.