SCOPUS 정보 검색 플랫폼

Volumn 64, Issue , 2015, Pages 39-48

Deep Convolutional Neural Networks for Large-scale Speech Tasks

(7) Sainath, Tara N a Kingsbury, Brian a Saon, George a Soltau, Hagen a Mohamed, Abdel rahman b Dahl, George b Ramabhadran, Bhuvana a

a IBM T J WATSON RESEARCH CENTER (United States)

b UNIVERSITY OF TORONTO (Canada)

Author keywords

Deep learning; Neural networks; Speech recognition

Indexed keywords

CONTINUOUS SPEECH RECOGNITION; CONVOLUTION; DEEP LEARNING; NEURAL NETWORKS; RHENIUM COMPOUNDS; SPEECH; SPEECH RECOGNITION;

CONVOLUTIONAL NEURAL NETWORK; DEEP CONVOLUTIONAL NEURAL NETWORKS; HIDDEN UNITS; LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION; SPECTRAL CORRELATION; SPECTRAL VARIATION; SPEECH SIGNALS; STATE OF THE ART;

DEEP NEURAL NETWORKS;

ACOUSTICS; ARTICLE; ARTIFICIAL NEURAL NETWORK; CONTROLLED STUDY; CONVOLUTIONAL NEURAL NETWORK; DEEP NEURAL NETWORK; FREQUENCY DISCRIMINATION; LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION TASK; PSYCHOMOTOR PERFORMANCE; SPEECH DISCRIMINATION; TASK PERFORMANCE; AUTOMATIC SPEECH RECOGNITION; SPEECH;

NEURAL NETWORKS (COMPUTER); SPEECH; SPEECH RECOGNITION SOFTWARE;

EID: 84922343800 PISSN: 08936080 EISSN: 18792782 Source Type: Journal
DOI: 10.1016/j.neunet.2014.08.005 Document Type: Article

Times cited : (461)

References (34)

1
- 84867605836
- Applying convolutional neural network concepts to hybrid NN-HMM model for speech recognition
- Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G. (2012). Applying convolutional neural network concepts to hybrid NN-HMM model for speech recognition, In Proc. ICASSP.
- (2012) Proc. ICASSP.
- Abdel-Hamid, O.¹ Mohamed, A.² Jiang, H.³ Penn, G.⁴

2
- 0003573244
- Kluwer Academic Publishers
- Bourlard H., Morgan N. Connectionist speech recognition: a hybrid approach 1993, Kluwer Academic Publishers.
- (1993) Connectionist speech recognition: a hybrid approach
- Bourlard, H.¹ Morgan, N.²

3
- 84890527827
- Improving deep neural networks for LVCSR using rectified linear units and dropout
- Dahl, G., Sainath, T., Hinton, G. (2013), Improving deep neural networks for LVCSR using rectified linear units and dropout. In Proc. ICASSP.
- (2013) Proc. ICASSP.
- Dahl, G.¹ Sainath, T.² Hinton, G.³

4
- 84055222005
- Context-dependent pre-trained deep neural networks for large vocabulary speech recognition
- Dahl G., Yu D., Deng L., Acero A. Context-dependent pre-trained deep neural networks for large vocabulary speech recognition. IEEE Transactions on Audio, Speech, and Language Processing 2012, 20(1):30-42.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

5
- 84890545163
- A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion
- Deng, L., Abdel-Hamid, O., Yu, D. (2013). A deep convolutional neural network using heterogeneous pooling for trading acoustic invariance with phonetic confusion. In Proc. ICASSP.
- (2013) Proc. ICASSP.
- Deng, L.¹ Abdel-Hamid, O.² Yu, D.³

6
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- Gales M. Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 1998, 12(2):75-98.
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.¹

7
- 0032638856
- Semi-tied covariance matrices for hidden markov models
- Gales M. Semi-tied covariance matrices for hidden markov models. IEEE Transactions on Speech and Audio Processing 1999, 7:272-281.
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , pp. 272-281
- Gales, M.¹

8
- 79951563340
- Understanding the difficulty of training deep feedforward neural networks
- Glorot, X., Bengio, Y. Understanding the difficulty of training deep feedforward neural networks. In Proc. AI Stats.
- Proc. AI Stats.
- Glorot, X.¹ Bengio, Y.²

9
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- Hinton G., Deng L., Yu D., Dahl G., Mohamed A., Jaitly N., Senior A., Vanhoucke V., Nguyen P., Sainath T.N., Kingsbury B. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine 2012, 29(6):82-97.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

10
- 84890466217
- Improving neural networks by preventing co-adaptation of feature detectors
- Hinton G., Srivastava N., Krizhevsky A., Sutskever I., Salakhutdinov R. Improving neural networks by preventing co-adaptation of feature detectors. The Computing Research Repository (CoRR) 1207 2012, 0580.
- (2012) The Computing Research Repository (CoRR) 1207 , pp. 0580
- Hinton, G.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

11
- 84878539964
- Application of pretrained deep neural networks to large vocabulary speech recognition
- Jaitly, N., Nguyen, P., Senior, A. W., Vanhoucke, V. (2012). Application of pretrained deep neural networks to large vocabulary speech recognition, In Proc. Interspeech.
- (2012) Proc. Interspeech
- Jaitly, N.¹ Nguyen, P.² Senior, A.W.³ Vanhoucke, V.⁴

12
- 70349213445
- Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
- Kingsbury, B. (2009). Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In Proc. ICASSP.
- (2009) Proc. ICASSP.
- Kingsbury, B.¹

13
- 84878379108
- Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
- Kingsbury, B., Sainath, T. N., Soltau, H. (2012). Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In Proc. Interspeech.
- (2012) Proc. Interspeech.
- Kingsbury, B.¹ Sainath, T.N.² Soltau, H.³

14
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Krizhevsky A., Sutskever I., Hinton G. Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 2012.
- (2012) Advances in Neural Information Processing Systems
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.³

15
- 0030737097
- Face recognition: A convolutional neural-network approach
- Lawrence S. Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks 1997, 8(1):98-113.
- (1997) IEEE Transactions on Neural Networks , vol.8 , Issue.1 , pp. 98-113
- Lawrence, S.¹

16
- 0002263996
- Convolutional networks for images, speech, and time-series
- MIT Press
- LeCun Y., Bengio Y. Convolutional networks for images, speech, and time-series. The handbook of brain theory and neural networks 1995, MIT Press.
- (1995) The handbook of brain theory and neural networks
- LeCun, Y.¹ Bengio, Y.²

17
- 0032203257
- Gradient-based learning applied to document recognition
- Lecun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition. Proceedings of the IEEE 1998.
- (1998) Proceedings of the IEEE
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

18
- 5044231640
- Learning methods for generic object recognition with invariance to pose and lighting
- LeCun, Y., Huang, F., Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting, Proc. CVPR.
- (2004) Proc. CVPR.
- LeCun, Y.¹ Huang, F.² Bottou, L.³

19
- 0029747183
- Speaker normalization using efficient frequency warping procedures
- Lee, L., Rose, R. C. (1996). Speaker normalization using efficient frequency warping procedures, In Proc. ICASSP.
- (1996) Proc. ICASSP.
- Lee, L.¹ Rose, R.C.²

20
- 77956541496
- Deep learning via Hessian-free optimization
- Martens, J. (2010) Deep learning via Hessian-free optimization, In Proc. intl. conf. on machine learnbeing (ICML).
- (2010) Proc. intl. conf. on machine learnbeing (ICML)
- Martens, J.¹

21
- 84867585919
- Understanding how deep belief networks perform acoustic modelling
- Mohamed, A., Hinton, G., Penn, G. (2012). Understanding how deep belief networks perform acoustic modelling, In ICASSP.
- (2012) ICASSP.
- Mohamed, A.¹ Hinton, G.² Penn, G.³

22
- 51449120120
- Boosted MMI for model and feature-space discriminative training
- Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K. (2008). Boosted MMI for model and feature-space discriminative training, In Proc. ICASSP (pp. 4057-4060).
- (2008) Proc. ICASSP , pp. 4057-4060
- Povey, D.¹ Kanevsky, D.² Kingsbury, B.³ Ramabhadran, B.⁴ Saon, G.⁵ Visweswariah, K.⁶

23
- 84890525984
- Deep Convolutional Neural Networks for LVCSR
- Sainath, T., Mohamed, A., Kingsbury, B., Ramabhadran, B. 2013 Deep Convolutional Neural Networks for LVCSR, In: Proc. ICASSP.
- (2013) Proc. ICASSP
- Sainath, T.¹ Mohamed, A.² Kingsbury, B.³ Ramabhadran, B.⁴

24
- 84893654379
- Improvements to Deep Convolutional Neural Networks for LVCSR
- Sainath, T. N., Kingsbury, B., Mohamed, A., Dahl, G., Saon, G., Soltau, H., Beran, T., Aravkin, A. Y., Ramabhadran, B. 2013 Improvements to Deep Convolutional Neural Networks for LVCSR, In: Proc. ASRU.
- (2013) Proc. ASRU.
- Sainath, T.N.¹ Kingsbury, B.² Mohamed, A.³ Dahl, G.⁴ Saon, G.⁵ Soltau, H.⁶ Beran, T.⁷ Aravkin, A.Y.⁸ Ramabhadran, B.⁹

25
- 84858972572
- Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition
- Sainath, T. N., Kingsbury, B., Ramabhadran, B., Fousek, P., Novak, P., Mohamed, A. 2011 Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition, In: Proc. ASRU.
- (2011) Proc. ASRU.
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³ Fousek, P.⁴ Novak, P.⁵ Mohamed, A.⁶

26
- 84893659646
- Sainath, T. N., Ramabhadran, B., Picheny, M., Nahamoo, D., Kanevsky, D. (2011). Exemplar-based sparse representation features: from TIMIT to LVCSR.
- (2011) Exemplar-based sparse representation features: from TIMIT to LVCSR
- Sainath, T.N.¹ Ramabhadran, B.² Picheny, M.³ Nahamoo, D.⁴ Kanevsky, D.⁵

27
- 84893691530
- Speaker adaptation of neural network acoustic models using I-vectors
- (in preperation)
- Saon, G., Soltau, H., Picheny, M., Nahamoo, D. 2013 Speaker adaptation of neural network acoustic models using I-vectors. In Proc. ASRU (in preperation).
- (2013) Proc. ASRU
- Saon, G.¹ Soltau, H.² Picheny, M.³ Nahamoo, D.⁴

28
- 84865801985
- Conversational Speech Transcription Using Context-Dependent Deep Neural Networks
- Seide, F., Li, G., Yu, D. 2011 Conversational Speech Transcription Using Context-Dependent Deep Neural Networks, In: Proc. Interspeech.
- (2011) Proc. Interspeech.
- Seide, F.¹ Li, G.² Yu, D.³

29
- 84874575248
- Convolutional neural networks applied to house numbers digit classification
- Sermanet, P., Chintala, S., LeCun, Y. 2012 Convolutional neural networks applied to house numbers digit classification, In: Pattern Recognition (ICPR), 2012 21st International Conference on.
- (2012) Pattern Recognition (ICPR), 2012 21st International Conference on
- Sermanet, P.¹ Chintala, S.² LeCun, Y.³

30
- 79951796005
- The IBM Attila Speech Recognition Toolkit
- Soltau, H., Saon, G., Kingsbury, B. 2010 The IBM Attila Speech Recognition Toolkit, In: Proc. SLT.
- (2010) Proc. SLT
- Soltau, H.¹ Saon, G.² Kingsbury, B.³

31
- 0024634603
- Phoneme recognition using time-delay neural networks
- Waibel A., Hanazawa T., Hinton G., Shikano K., Lang K. Phoneme recognition using time-delay neural networks. IEEE Transactions on Acoustics, Speech and Signal Processing 1989, 37(3):328-339.
- (1989) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.⁵

32
- 0003571976
- University of Cambridge
- Young S.J., Evermann G., Gales M.J.F., Hain T., Kershaw D., Liu X., Moore G., Odell J., Ollason D., Povey D., Valtchev V., Woodland P.C. The HTK Book (for HTK Version 3.4) 2006, University of Cambridge.
- (2006) The HTK Book (for HTK Version 3.4)
- Young, S.J.¹ Evermann, G.² Gales, M.J.F.³ Hain, T.⁴ Kershaw, D.⁵ Liu, X.⁶ Moore, G.⁷ Odell, J.⁸ Ollason, D.⁹ Povey, D.¹⁰ Valtchev, V.¹¹ Woodland, P.C.¹²

33
- 0002144369
- Tree-based State Tying for High Accuracy Acoustic Modelling
- Young, S. J., Odell, J., Woodland, P. 1994 Tree-based State Tying for High Accuracy Acoustic Modelling, In: Proc. HLT. pp. 307-312.
- (1994) Proc. HLT , pp. 307-312
- Young, S.J.¹ Odell, J.² Woodland, P.³

34
- 85083954484
- Stochastic Pooling for Regularization of Deep Convolutional Neural Networks
- Zeiler, M., Fergus, R. 2013 Stochastic Pooling for Regularization of Deep Convolutional Neural Networks, In: Proc. of the International Conference on Representaiton Learning (ICLR).
- (2013) Proc. of the International Conference on Representaiton Learning (ICLR)
- Zeiler, M.¹ Fergus, R.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.