SCOPUS 정보 검색 플랫폼

IEEE Signal Processing Letters

Volumn 21, Issue 9, 2014, Pages 1120-1124

Convolutional neural networks for distant speech recognition

(3) Swietojanski, Pawel a Ghoshal, Arnab a,b Renals, Steve a

a UNIVERSITY OF EDINBURGH (United Kingdom)

b APPLE INC (United States)

Author keywords

AMI corpus; convolutional neural networks; deep neural networks; distant speech recognition; meetings

Indexed keywords

CONVOLUTION; MICROPHONES; NEURAL NETWORKS; SPEECH RECOGNITION;

AMI CORPUS; CONVOLUTIONAL NEURAL NETWORK; DEEP NEURAL NETWORKS; DISTANT SPEECH RECOGNITION; MEETINGS;

SPACE DIVISION MULTIPLE ACCESS;

EID: 84901999583 PISSN: 10709908 EISSN: None Source Type: Journal
DOI: 10.1109/LSP.2014.2325781 Document Type: Article

Times cited : (235)

References (43)

1
- 50449083999
- Hoboken, NJ, USA: Wiley
- M. Wölfel and J. McDonough, Distant Speech Recognition. Hoboken, NJ, USA: Wiley, 2009
- (2009) Distant Speech Recognition
- Wölfel, M.¹ McDonough, J.²

2
- 80051654520
- Making the most from multiple microphones in meeting recognition
- A. Stolcke, "Making the most from multiple microphones in meeting recognition," in Proc. IEEE ICASSP, 2011
- (2011) Proc. IEEE ICASSP
- Stolcke, A.¹

3
- 85032750883
- Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors
- K. Kumatani, J. McDonough, and B. Raj, "Microphone array processing for distant speech recognition: From close-talking microphones to far-field sensors," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 127-140, 2012
- (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 127-140
- Kumatani, K.¹ McDonough, J.² Raj, B.³

4
- 85008520364
- Transcribing meetings with the AMIDA systems
- T. Hain, L. Burget, J. Dines, P. N. Garner, F. Grezl, A. E. Hannani, M. Huijbregts, M. Karafiat, M. Lincoln, and V. Wan, "Transcribing meetings with the AMIDA systems," in IEEE Trans. Audio, Speech, Lang. Process., 2012, vol. 20, no. 2, pp. 486-498
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.2 , pp. 486-498
- Hain, T.¹ Burget, L.² Dines, J.³ Garner, P.N.⁴ Grezl, F.⁵ Hannani, A.E.⁶ Huijbregts, M.⁷ Karafiat, M.⁸ Lincoln, M.⁹ Wan, V.¹⁰

5
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, 2012
- (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

6
- 0003573244
- Norwell, MA, USA: Kluwer
- H. Bourlard and N. Morgan, Connectionist Speech Recognition: A Hybrid Approach. Norwell, MA, USA: Kluwer, 1994
- (1994) Connectionist Speech Recognition: A Hybrid Approach
- Bourlard, H.¹ Morgan, N.²

7
- 0028194709
- Connectionist probability estimators in HMM speech recognition
- S. Renals, N. Morgan, H. Bourlard, M. Cohen, and H. Franco, "Connectionist probability estimators in HMM speech recognition," IEEE Trans. Speech Audio Process., vol. 2, no. 1, pp. 161-174, 1994
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.1 , pp. 161-174
- Renals, S.¹ Morgan, N.² Bourlard, H.³ Cohen, M.⁴ Franco, H.⁵

8
- 0029308753
- Neural networks for statistical recognition of continuous speech
- N. Morgan and H. Bourlard, "Neural networks for statistical recognition of continuous speech," in Proc. IEEE, 1995, vol. 83, no. 5, pp. 742-772
- (1995) Proc. IEEE , vol.83 , Issue.5 , pp. 742-772
- Morgan, N.¹ Bourlard, H.²

9
- 0036567797
- Connectionist speech recognition of Broadcast News
- DOI 10.1016/S0167-6393(01)00058-9, PII S0167639301000589
- A. J. Robinson, G. D. Cook, D. PW. Ellis, E. Fosler-Lussier, S. J. Renals, and D. AG. Williams, "Connectionist speech recognition of broadcast news," Speech Commun., vol. 37, no. 1-2, pp. 27-45, 2002 (Pubitemid 34222536)
- (2002) Speech Communication , vol.37 , Issue.1-2 , pp. 27-45
- Robinson, A.J.¹ Cook, G.D.² Ellis, D.P.W.³ Fosler-Lussier, E.⁴ Renals, S.J.⁵ Williams, D.A.G.⁶

10
- 84858972572
- Making deep belief networks effective for large vocabulary continuous speech recognition
- T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak, and A. Mohamed, "Making deep belief networks effective for large vocabulary continuous speech recognition," in Proc. IEEE ASRU, 2011
- (2011) Proc. IEEE ASRU
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³ Fousek, P.⁴ Novak, P.⁵ Mohamed, A.⁶

11
- 84055222005
- Context-dependent pretrained deep neural networks for large-vocabulary speech recognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech Lang. Process., vol. 20, no. 1, pp. 30-42, 2012
- (2012) IEEE Trans. Audio, Speech Lang. Process. , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

12
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D. PW. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. IEEE ICASSP, 2000, pp. 1635-1638
- (2000) Proc. IEEE ICASSP , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

13
- 33745528628
- Using MLP features in SRI's conversational speech recognition system
- Q. Zhu, A. Stolcke, B. Y. Chen, and N. Morgan, "Using MLP features in SRI's conversational speech recognition system," in Proc. Eurospeech, 2005
- (2005) Proc. Eurospeech
- Zhu, Q.¹ Stolcke, A.² Chen, B.Y.³ Morgan, N.⁴

14
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- DOI 10.1109/ICASSP.2007.367023, 4218211, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '07
- F. Grézl, M. Karafiát, S. Kontár, and J. Ernocký, "Probabilistic and bottle-neck features for LVCSR of meetings," Proc. IEEE ICASSP, vol. 4, pp. IV-757-IV-760, 2007 (Pubitemid 47178482)
- (2007) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.4
- Grezl, F.¹ Karafiat, M.² Kontar, S.³ Cernocky, J.⁴

15
- 84867593213
- Auto-encoder bottleneck features using deep belief networks
- T. N. Sainath, B. Kingsbury, and B. Ramabhadran, "Auto-encoder bottleneck features using deep belief networks," in Proc. IEEE ICASSP, 2012
- (2012) Proc. IEEE ICASSP
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³

16
- 84893704659
- Hybrid acoustic models for distant and multichannel large vocabulary speech recognition
- Dec.
- P. Swietojanski, A. Ghoshal, and S. Renals, "Hybrid acoustic models for distant and multichannel large vocabulary speech recognition," in Proc. IEEE ASRU, Dec. 2013
- (2013) Proc. IEEE ASRU
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

17
- 84874282188
- Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
- J. Li, D. Yu, J.-T. Huang, and Y. Gong, "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM," in Proc. IEEE SLT, 2012, pp. 131-136
- (2012) Proc. IEEE SLT , pp. 131-136
- Li, J.¹ Yu, D.² Huang, J.-T.³ Gong, Y.⁴

18
- 0002263996
- Convolutional networks for images, speech and time series
- Cambridge, MA, USA: MIT Press
- Y. LeCun and Y. Bengio, "Convolutional networks for images, speech and time series," in The Handbook of Brain Theory and Neural Networks. Cambridge, MA, USA: MIT Press, 1995, pp. 255-258
- (1995) The Handbook of Brain Theory and Neural Networks , pp. 255-258
- Lecun, Y.¹ Bengio, Y.²

19
- 0032203257
- Gradient-based learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, 1998
- (1998) Proc. IEEE , vol.86 , Issue.11 , pp. 2278-2324
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

20
- 0024634603
- Phoneme recognition using time-delay neural networks
- DOI 10.1109/29.21701
- A. Waibel, T. Hanazawa, G. Hinton, K. Shikano, and K. J. Lang, "Phoneme recognition using time-delay neural networks," IEEE Trans. Audio, Speech Lang. Process., vol. 37, no. 3, pp. 328-339, 1989 (Pubitemid 19065785)
- (1989) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.37 , Issue.3 , pp. 328-339
- Waibel, A.¹ Hanazawa, T.² Hinton, G.³ Shikano, K.⁴ Lang, K.J.⁵

21
- 0025254722
- A time-delay neural network architecture for isolated word recognition
- K. J. Lang, A. H. Waibel, and G. E. Hinton, "A time-delay neural network architecture for isolated word recognition," Neural Netw., vol. 3, no. 1, pp. 23-43, 1990
- (1990) Neural Netw. , vol.3 , Issue.1 , pp. 23-43
- Lang, K.J.¹ Waibel, A.H.² Hinton, G.E.³

22
- 0027151530
- Improving the MS-TDNN for word spotting
- T. Zeppenfeld, R. Houghton, and A. Waibel, "Improving the MS-TDNN for word spotting," Proc. IEEE ICASSP, vol. 2, pp. 475-478, 1993
- (1993) Proc. IEEE ICASSP , vol.2 , pp. 475-478
- Zeppenfeld, T.¹ Houghton, R.² Waibel, A.³

23
- 79551521906
- Convolutional networks for speech detection
- S. Sukittanon, A. C. Surendran, J. C. Platt, and C. JC. Burges, "Convolutional networks for speech detection," in Proc. ICSLP, 2004
- (2004) Proc. ICSLP
- Sukittanon, S.¹ Surendran, A.C.² Platt, J.C.³ Burges, C.J.C.⁴

24
- 84906273908
- Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
- D. Palaz, R. Collobert, and M. Magimai-Doss, "Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks," in Proc. Interspeech, 2013
- (2013) Proc. Interspeech
- Palaz, D.¹ Collobert, R.² Magimai-Doss, M.³

25
- 84867605836
- Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
- O. Abdel-Hamid,A.-R. Mohamed, J. Hui, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition.," in Proc. IEEE ICASSP, 2012, pp. 4277-4280
- (2012) Proc. IEEE ICASSP , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.-R.² Hui, J.³ Penn, G.⁴

26
- 84890525984
- Deep convolutional neural networks for LVCSR
- T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran, "Deep convolutional neural networks for LVCSR," in Proc. IEEE ICASSP, 2013
- (2013) Proc. IEEE ICASSP
- Sainath, T.N.¹ Mohamed, A.² Kingsbury, B.³ Ramabhadran, B.⁴

27
- 84893654379
- Improvements to deep convolutional neural networks for LVCSR
- T. N. Sainath, B. Kingsbury, A. Mohamed, G. E. Dahl, G. Saon, H. Soltau, T. Beran, A. Y. Aravkin, and B. Ramabhadran, "Improvements to deep convolutional neural networks for LVCSR," in Proc. IEEE ASRU, 2013
- (2013) Proc. IEEE ASRU
- Sainath, T.N.¹ Kingsbury, B.² Mohamed, A.³ Dahl, G.E.⁴ Saon, G.⁵ Soltau, H.⁶ Beran, T.⁷ Aravkin, A.Y.⁸ Ramabhadran, B.⁹

28
- 35948981862
- Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus
- J. Carletta, "Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus," Lang. Res. Eval. J., vol. 41, no. 2, pp. 181-190, 2007
- (2007) Lang. Res. Eval. J. , vol.41 , Issue.2 , pp. 181-190
- Carletta, J.¹

29
- 0001595997
- Neural network classifiers estimate Bayesian a posteriori probabilities
- M. D. Richard and R. P. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities," Neural Comput., vol. 3, no. 4, pp. 461-483, 1991
- (1991) Neural Comput. , vol.3 , Issue.4 , pp. 461-483
- Richard, M.D.¹ Lippmann, R.P.²

30
- 51249118803
- Unsupervised learning of invariant feature hierarchies with applications to object recognition
- M. A. Ranzato, F. J. Huang, Y.-L. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in IEEE CVPR, 2007
- (2007) IEEE CVPR
- Ranzato, M.A.¹ Huang, F.J.² Boureau, Y.-L.³ Lecun, Y.⁴

31
- 84902000293
- Mar., [Online; accessed 27-March-2014]
- "NumPy Reference," Mar. 2014 [Online]. Available: http://docs. scipy. org/doc/numpy/numpy-ref-1. 8. 1. pdf, [Online; accessed 27-March-2014]
- (2014) NumPy Reference

32
- 84906214784
- Exploring convolutional neural network structures and optimisation techniques for speech recognition
- ICSA
- O. Abdel-Hamid, L. Deng, and D. Yu, "Exploring convolutional neural network structures and optimisation techniques for speech recognition," in Proc. Interspeech, 2013, ICSA
- (2013) Proc. Interspeech
- Abdel-Hamid, O.¹ Deng, L.² Yu, D.³

33
- 0033316361
- Hierarchical models of object recognition in cortex
- DOI 10.1038/14819
- M. Riesenhuber and T. Poggio, "Hierarchical models of object recognition in cortex," Nature Neurosci., vol. 2, pp. 1019-1025, 1999 (Pubitemid 30599567)
- (1999) Nature Neuroscience , vol.2 , Issue.11 , pp. 1019-1025
- Riesenhuber, M.¹ Poggio, T.²

34
- 84903707061
- Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech
- J. G. Fiscus, J. Ajot, N. Radde, and C. Laprun, "Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech," in Proc. LREC, 2006
- (2006) Proc. LREC
- Fiscus, J.G.¹ Ajot, J.² Radde, N.³ Laprun, C.⁴

35
- 84874276847
- The Kaldi speech recognition toolkit
- Dec.
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlíek, Y. Qian, P. Schwarz, J. Silovský, G. Stemmer, and K. Veselý, "The Kaldi speech recognition toolkit," in Proc. IEEE ASRU, Dec. 2011
- (2011) Proc. IEEE ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlíek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovský, J.¹¹ Stemmer, G.¹² Veselý, K.¹³

36
- 84893401626
- arXiv preprint arXiv:1308. 4214
- I. J. Goodfellow, D. Warde-Farley, P. Lamblin,V. Dumoulin,M. Mirza, R. Pascanu, J. Bergstra, F. Bastien, and Y. Bengio, "Pylearn2: A machine learning research library," arXiv preprint arXiv:1308. 4214 2013
- (2013) Pylearn2: A Machine Learning Research Library
- Goodfellow, I.J.¹ Warde-Farley, D.² Lamblin, P.³ Dumoulin, V.⁴ Mirza, M.⁵ Pascanu, R.⁶ Bergstra, J.⁷ Bastien, F.⁸ Bengio, Y.⁹

37
- 85009224911
- From switchboard to fisher: Telephone collection protocols, their uses and yields
- C. Cieri, D. Miller, and K. Walker, "From switchboard to fisher: Telephone collection protocols, their uses and yields," in Proc. Eurospeech, 2003
- (2003) Proc. Eurospeech
- Cieri, C.¹ Miller, D.² Walker, K.³

38
- 0033329799
- An empirical study of smoothing techniques for language modeling
- S. F. Chen and J. Goodman, "An empirical study of smoothing techniques for language modeling," Comput. Speech Lang., vol. 13, no. 4, pp. 359-393, 1999
- (1999) Comput. Speech Lang. , vol.13 , Issue.4 , pp. 359-393
- Chen, S.F.¹ Goodman, J.²

39
- 51449120120
- BoostedMMI for model and feature-space discriminative training
- D. Povey, D. Kanevsky, B. Kingsbury, B. Ramabhadran, G. Saon, and K. Visweswariah, "BoostedMMI for model and feature-space discriminative training," in Proc. IEEE ICASSP, 2008, pp. 4057-4060
- (2008) Proc. IEEE ICASSP , pp. 4057-4060
- Povey, D.¹ Kanevsky, D.² Kingsbury, B.³ Ramabhadran, B.⁴ Saon, G.⁵ Visweswariah, K.⁶

40
- 0032638856
- Semi-tied covariance matrices for hidden Markov models
- MJF Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 272-281, 1999
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.3 , pp. 272-281
- Gales, M.¹

41
- 33745805403
- A fast learning algorithm for deep belief nets
- DOI 10.1162/neco.2006.18.7.1527
- G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, pp. 1527-1554, 2006 (Pubitemid 44024729)
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

42
- 50449086237
- Acoustic beamforming for speaker diarization of meetings
- X. Anguera, C. Wooters, and J. Hernando, "Acoustic beamforming for speaker diarization of meetings," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2011-2021, 2007
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 2011-2021
- Anguera, X.¹ Wooters, C.² Hernando, J.³

43
- 84863380535
- Unsupervised feature learning for audio classification using convolutional deep belief networks
- H. Lee, P. Pham, Y. Largman, and A. Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," Adv. Neural Inf. Process. Syst. 22, pp. 1096-1104, 2009.
- (2009) Adv. Neural Inf. Process. Syst. , vol.22 , pp. 1096-1104
- Lee, H.¹ Pham, P.² Largman, Y.³ Ng, A.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.