SCOPUS 정보 검색 플랫폼

2014 4th Joint Workshop on Hands-Free Speech Communication and Microphone Arrays, HSCMA 2014

Volumn , Issue , 2014, Pages 172-176

Neural networks for distant speech recognition

(2) Renals, Steve a Swietojanski, Pawel a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

AMI corpus; beam forming; convolutional neural networks; distant speech recognition; ICSI corpus; maxout networks; meetings; rectifier unit

Indexed keywords

MARKOV PROCESSES; MICROPHONES; NEURAL NETWORKS; REVERBERATION; SIGNAL PROCESSING;

AMI CORPUS; CONVOLUTIONAL NEURAL NETWORK; DISTANT SPEECH RECOGNITION; ICSI CORPUS; MEETINGS; RECTIFIER UNIT;

SPEECH RECOGNITION;

EID: 84904512262 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/HSCMA.2014.6843274 Document Type: Conference Paper

Times cited : (43)

References (50)

1
- 50449083999
- Wiley
- M W?olfel and J McDonough, Distant Speech Recognition, Wiley, 2009.
- (2009) Distant Speech Recognition
- Wolfel, M.¹ McDonough, J.²

2
- 0025543907
- Speech recognition in noisy environments with the aid of microphone arrays
- D Van Compernolle,WMa, F Xie, and M Van Diest, "Speech recognition in noisy environments with the aid of microphone arrays," Speech Commun., vol. 9, pp. 433-442, 1990.
- (1990) Speech Commun. , vol.9 , pp. 433-442
- Van Compernolle, D.¹ Ma, W.² Xie, F.³ Van Diest, M.⁴

3
- 0029725933
- Microphonearray speech recognition via incremental MAP training
- JE Adcock, Y Gotoh, DJ Mashao, and HF Silverman, "Microphonearray speech recognition via incremental MAP training," in Proc IEEE ICASSP, 1996, pp. 897-900.
- (1996) Proc IEEE ICASSP , pp. 897-900
- Adcock, J.E.¹ Gotoh, Y.² Mashao, D.J.³ Silverman, H.F.⁴

4
- 0030676367
- Microphone array based speech recognition with different talker-array positions
- M Omologo, M Matassoni, P Svaizer, and D Giuliani, "Microphone array based speech recognition with different talker-array positions," in Proc IEEE ICASSP, 1997, pp. 227-230.
- (1997) Proc IEEE ICASSP , pp. 227-230
- Omologo, M.¹ Matassoni, M.² Svaizer, P.³ Giuliani, D.⁴

5
- 33846217002
- The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments
- M Lincoln, I McCowan, J Vepa, and HK Maganti, "The multi-channel Wall Street Journal audio visual corpus (MC-WSJ-AV): Specification and initial experiments," in Proc IEEE ASRU, 2005.
- (2005) Proc IEEE ASRU
- Lincoln, M.¹ McCowan, I.² Vepa, J.³ Maganti, H.K.⁴

6
- 84890443591
- Recognition of overlapping speech using digital MEMS microphone arrays
- E Zwyssig, F Faubel, S Renals, and M Lincoln, "Recognition of overlapping speech using digital MEMS microphone arrays," in Proc IEEE ICASSP, 2013.
- (2013) Proc IEEE ICASSP
- Zwyssig, E.¹ Faubel, F.² Renals, S.³ Lincoln, M.⁴

7
- 85032751613
- Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition
- T Yoshioka, A Sehr,MDelcroix, K Kinoshita, R Maas, T Nakatani, and WKellermann, "Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition.," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 114-126, 2012.
- (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 114-126
- Yoshioka, T.¹ Sehr, A.² Delcroix, M.³ Kinoshita, K.⁴ Maas, R.⁵ Nakatani, T.⁶ Kellermann, W.⁷

8
- 85032750883
- Microphone array processing for distant speech recognition: From close-talking microphones to farfield sensors
- K Kumatani, J McDonough, and B Raj, "Microphone array processing for distant speech recognition: From close-talking microphones to farfield sensors," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 127-140, 2012.
- (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 127-140
- Kumatani, K.¹ McDonough, J.² Raj, B.³

9
- 0141814662
- The ICSI meeting corpus
- A Janin, D Baron, J Edwards, D Ellis, D Gelbart, N Morgan, B Peskin, T Pfau, E Shriberg, A Stolcke, and C Wooters, "The ICSI meeting corpus," in Proc IEEE ICASSP, 2003, pp. I364-I367.
- (2003) Proc IEEE ICASSP
- Janin, A.¹ Baron, D.² Edwards, J.³ Ellis, D.⁴ Gelbart, D.⁵ Morgan, N.⁶ Peskin, B.⁷ Pfau, T.⁸ Shriberg, E.⁹ Stolcke, A.¹⁰ Wooters, C.¹¹

10
- 35948981862
- Unleashing the killer corpus: Experiences in creating the multi-everything AMI Meeting Corpus
- J Carletta, "Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus," Language Resources & Evaluation, vol. 41, pp. 181-190, 2007.
- (2007) Language Resources & Evaluation , vol.41 , pp. 181-190
- Carletta, J.¹

11
- 84893665400
- The SRI-ICSI Spring 2007 meeting and lecture recognition system in LNCS
- R Stiefelhagen, R Bowers, and J Fiscus, Eds. Springer
- A Stolcke, X Anguera, K Boakye, O Cetin, A Janin, M Magimai-Doss, C Wooters, and J Zheng, "The SRI-ICSI Spring 2007 meeting and lecture recognition system," in Multimodal Technologies for Perception of Humans, R Stiefelhagen, R Bowers, and J Fiscus, Eds., number 4625 in LNCS, pp. 373-389. Springer, 2008.
- (2008) Multimodal Technologies for Perception of Humans , Issue.4625 , pp. 373-389
- Stolcke, A.¹ Anguera, X.² Boakye, K.³ Cetin, O.⁴ Janin, A.⁵ Magimai-Doss, M.⁶ Wooters, C.⁷ Zheng, J.⁸

12
- 85008520364
- Transcribing meetings with the AMIDA systems
- T Hain, L Burget, J Dines, PN Garner, F Grezl, AE Hannani, M Huijbregts, M Karafiat, M Lincoln, and V Wan, "Transcribing meetings with the AMIDA systems," IEEE Trans. Audio, Speech, & Language Process., vol. 20, pp. 486-498, 2012.
- (2012) IEEE Trans. Audio, Speech, & Language Process. , vol.20 , pp. 486-498
- Hain, T.¹ Burget, L.² Dines, J.³ Garner, P.N.⁴ Grezl, F.⁵ Hannani, A.E.⁶ Huijbregts, M.⁷ Karafiat, M.⁸ Lincoln, M.⁹ Wan, V.¹⁰

13
- 0036296863
- Minimum phone error and I-smoothing for improved discriminative training
- D Povey and PC Woodland, "Minimum phone error and I-smoothing for improved discriminative training," in Proc IEEE ICASSP, 2002, pp. 105-108.
- (2002) Proc IEEE ICASSP , pp. 105-108
- Povey, D.¹ Woodland, P.C.²

14
- 0030362995
- A compact model for speaker-adaptive training
- T Anastasakos, J McDonough, R Schwartz, and J Makhoul, "A compact model for speaker-adaptive training," in Proc ICSLP, 1996, pp. 1137-1140.
- (1996) Proc ICSLP , pp. 1137-1140
- Anastasakos, T.¹ McDonough, J.² Schwartz, R.³ Makhoul, J.⁴

15
- 34547548235
- Probabilistic and bottle-neck features for LVCSR of meetings
- F Gre?zl, M Karafia?t, S Konta?r, and J Ci ernocky?, "Probabilistic and bottle-neck features for LVCSR of meetings," in Proc IEEE ICASSP, 2007, vol. 4, pp. IV-757-IV-760.
- (2007) Proc IEEE ICASSP , vol.4
- Grezl, F.¹ Karafiat, M.² Kontar, S.³ Ciernocky, J.⁴

16
- 50449092852
- Bridging the gap: Towards a unified framework for handsfree speech recognition using microphone arrays
- ML Seltzer, "Bridging the gap: Towards a unified framework for handsfree speech recognition using microphone arrays," in Proc HSCMA, 2008.
- (2008) Proc HSCMA
- Seltzer, M.L.¹

17
- 4344607755
- Likelihood-maximizing beamforming for robust hands-free speech recognition
- M Seltzer, B Raj, and R Stern, "Likelihood-maximizing beamforming for robust hands-free speech recognition," IEEE Trans. Speech, & Audio Process., vol. 12, pp. 489-498, 2004.
- (2004) IEEE Trans. Speech, & Audio Process. , vol.12 , pp. 489-498
- Seltzer, M.¹ Raj, B.² Stern, R.³

18
- 50449096811
- Subband likelihood-maximizing beamforming for speech recognition in reverberant environments
- M Seltzer and R Stern, "Subband likelihood-maximizing beamforming for speech recognition in reverberant environments," IEEE Trans. Audio, Speech, & Lang. Process., vol. 14, pp. 2109-2121, 2006.
- (2006) IEEE Trans. Audio, Speech, & Lang. Process. , vol.14 , pp. 2109-2121
- Seltzer, M.¹ Stern, R.²

19
- 84865729496
- An analysis of automatic speech recognition with multiple microphones
- D Marino and T Hain, "An analysis of automatic speech recognition with multiple microphones," in Proc Interspeech, 2011, pp. 1281-1284.
- (2011) Proc Interspeech , pp. 1281-1284
- Marino, D.¹ Hain, T.²

20
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G Hinton, L Deng, D Yu, GE Dahl, A-R Mohamed, N Jaitly, A Senior, V Vanhoucke, P Nguyen, TN Sainath, and B Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

21
- 0003573244
- Kluwer
- H Bourlard and N Morgan, Connectionist Speech Recognition: A Hybrid Approach, Kluwer, 1994.
- (1994) Connectionist Speech Recognition: A Hybrid Approach
- Bourlard, H.¹ Morgan, N.²

22
- 0028194709
- Connectionist probability estimators in HMM speech recognition
- S Renals, N Morgan, H Bourlard, M Cohen, and H Franco, "Connectionist probability estimators in HMM speech recognition," IEEE Trans Speech & Audio Process., vol. 2, pp. 161-174, 1994.
- (1994) IEEE Trans Speech & Audio Process , vol.2 , pp. 161-174
- Renals, S.¹ Morgan, N.² Bourlard, H.³ Cohen, M.⁴ Franco, H.⁵

23
- 0029308753
- Neural networks for statistical recognition of continuous speech
- N Morgan and H Bourlard, "Neural networks for statistical recognition of continuous speech," Proc IEEE, vol. 83, pp. 742-772, 1995.
- (1995) Proc IEEE , vol.83 , pp. 742-772
- Morgan, N.¹ Bourlard, H.²

24
- 0036567797
- Connectionist speech recognition of broadcast news
- AJ Robinson, GD Cook, DPW Ellis, E Fosler-Lussier, SJ Renals, and DAGWilliams, "Connectionist speech recognition of broadcast news," Speech Commun., vol. 37, pp. 27-45, 2002.
- (2002) Speech Commun. , vol.37 , pp. 27-45
- Robinson, A.J.¹ Cook, G.D.² Ellis, D.P.W.³ Fosler-Lussier, E.⁴ Renals, S.J.⁵ Williams, D.A.G.⁶

25
- 84858972572
- Making deep belief networks effective for large vocabulary continuous speech recognition
- TN Sainath, B Kingsbury, B Ramabhadran, P Fousek, P Novak, and A Mohamed, "Making deep belief networks effective for large vocabulary continuous speech recognition," in Proc IEEE ASRU, 2011.
- (2011) Proc IEEE ASRU
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³ Fousek, P.⁴ Novak, P.⁵ Mohamed, A.⁶

26
- 84055222005
- Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
- GE Dahl, D Yu, L Deng, and A Acero, "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition," IEEE Trans Audio, Speech & Lang. Process., vol. 20, pp. 30-42, 2012.
- (2012) IEEE Trans Audio, Speech & Lang Process , vol.20 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

27
- 84893704659
- Hybrid acoustic models for distant and multichannel large vocabulary speech recognition
- P Swietojanski, A Ghoshal, and S Renals, "Hybrid acoustic models for distant and multichannel large vocabulary speech recognition," in Proc IEEE ASRU, 2013.
- (2013) Proc IEEE ASRU
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

28
- 84874282188
- Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
- J Li, D Yu, J-T Huang, and Y Gong, "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM," in Proc IEEE SLT, 2012, pp. 131-136.
- (2012) Proc IEEE SLT , pp. 131-136
- Li, J.¹ Yu, D.² Huang, J.-T.³ Gong, Y.⁴

29
- 84890471125
- On rectified linear units for speech processing
- MD Zeiler, M Ranzato, R Monga, M Mao, K Yang, QV Le, P Nguyen, A Senior, V Vanhoucke, J Dean, and GE Hinton, "On rectified linear units for speech processing," in Proc IEEE ICASSP, 2013.
- (2013) Proc IEEE ICASSP
- Zeiler, M.D.¹ Ranzato, M.² Monga, R.³ Mao, M.⁴ Yang, K.⁵ Le, Q.V.⁶ Nguyen, P.⁷ Senior, A.⁸ Vanhoucke, V.⁹ Dean, J.¹⁰ Hinton, G.E.¹¹

30
- 84893651518
- Deep maxout neural networks for speech recognition
- M Cai, Y Shi, and J Liu, "Deep maxout neural networks for speech recognition," in Proc ASRU, 2013.
- (2013) Proc ASRU
- Cai, M.¹ Shi, Y.² Liu, J.³

31
- 84893701756
- Deep maxout networks for lowresource speech recognition
- Y Miao, F Metze, and S Rawat, "Deep maxout networks for lowresource speech recognition," in Proc. IEEE ASRU, 2013.
- (2013) Proc. IEEE ASRU
- Miao, Y.¹ Metze, F.² Rawat, S.³

32
- 84905270524
- Investigation of maxout networks for speech recognition
- P Swietojanski, J Li, and J-T Huang, "Investigation of maxout networks for speech recognition," in Proc IEEE ICASSP, 2014.
- (2014) Proc IEEE ICASSP
- Swietojanski, P.¹ Li, J.² Huang, J.-T.³

33
- 0002263996
- Convolutional networks for images, speech and time series
- MIT Press
- Y LeCun and Y Bengio, "Convolutional networks for images, speech and time series," in The Handbook of Brain Theory and Neural Networks, pp. 255-258. MIT Press, 1995.
- (1995) The Handbook of Brain Theory and Neural Networks , pp. 255-258
- Lecun, Y.¹ Bengio, Y.²

34
- 84867605836
- Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
- O Abdel-Hamid, A-R Mohamed, J Hui, and G Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition," in Proc IEEE ICASSP, 2012, pp. 4277-4280.
- (2012) Proc IEEE ICASSP , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.-R.² Hui, J.³ Penn, G.⁴

35
- 0032203257
- Gradient-based learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," Proc IEEE, vol. 86, pp. 2278-2324, 1998.
- (1998) Proc IEEE , vol.86 , pp. 2278-2324
- Lecun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

36
- 84893654379
- Improvements to deep convolutional neural networks for LVCSR
- TN Sainath, B Kingsbury, A Mohamed, GE Dahl, G Saon, H Soltau, T Beran, AY Aravkin, and B Ramabhadran, "Improvements to deep convolutional neural networks for LVCSR," in Proc IEEE ASRU, 2013.
- (2013) Proc IEEE ASRU
- Sainath, T.N.¹ Kingsbury, B.² Mohamed, A.³ Dahl, G.E.⁴ Saon, G.⁵ Soltau, H.⁶ Beran, T.⁷ Aravkin, A.Y.⁸ Ramabhadran, B.⁹

37
- 0025254722
- A time-delay neural network architecture for isolated
- word recognition
- KJ Lang, AH Waibel, and GE Hinton, "A time-delay neural network architecture for isolated word recognition," Neural Networks, vol. 3, pp. 23-43, 1990.
- (1990) Neural Networks , vol.3 , pp. 23-43
- Lang, K.J.¹ Waibel, A.H.² Hinton, G.E.³

38
- 84990059834
- Rectified linear units improve restricted Boltzmann machines
- V Nair and G Hinton, "Rectified linear units improve restricted Boltzmann machines," in Proc ICML, 2010, pp. 131-136.
- (2010) Proc ICML , pp. 131-136
- Nair, V.¹ Hinton, G.²

39
- 84897543523
- Maxout networks
- IJ Goodfellow, D Warde-Farley, M Mirza, A Courville, and Y Bengio, "Maxout networks," in Proc ICML, 2013.
- (2013) Proc ICML
- Goodfellow, I.J.¹ Warde-Farley, D.² Mirza, M.³ Courville, A.⁴ Bengio, Y.⁵

40
- 84901999583
- Convolutional neural networks for distant speech recognition
- To appear
- P Swietojanski, A Ghoshal, and S Renals, "Convolutional neural networks for distant speech recognition," IEEE Signal Process. Letters, 2014, To appear.
- (2014) IEEE Signal Process Letters
- Swietojanski, P.¹ Ghoshal, A.² Renals, S.³

41
- 84903707061
- Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech
- JG Fiscus, J Ajot, N Radde, and C Laprun, "Multiple dimension Levenshtein edit distance calculations for evaluating ASR systems during simultaneous speech," in Proc LREC, 2006.
- (2006) Proc LREC
- Fiscus, J.G.¹ Ajot, J.² Radde, N.³ Laprun, C.⁴

42
- 84874276847
- The Kaldi speech recognition toolkit
- D Povey, A Ghoshal, G Boulianne, L Burget, O Glembek, N Goel, M Hannemann, P Motlicek, Y Qian, P Schwarz, J Silovsk?y, G Stemmer, and K Vesel?y, "The Kaldi speech recognition toolkit," in Proc IEEE ASRU, 2011.
- (2011) Proc IEEE ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰ Silovsky, J.¹¹ Stemmer, G.¹² Vesely, K.¹³

43
- 84893401626
- ArXiv: 1308.4214
- IJ Goodfellow, D Warde-Farley, P Lamblin, V Dumoulin, M Mirza, R Pascanu, J Bergstra, F Bastien, and Y Bengio, "Pylearn2: a machine learning research library," arXiv:1308.4214, 2013.
- (2013) Pylearn2: A Machine Learning Research Library
- Goodfellow, I.J.¹ Warde-Farley, D.² Lamblin, P.³ Dumoulin, V.⁴ Mirza, M.⁵ Pascanu, R.⁶ Bergstra, J.⁷ Bastien, F.⁸ Bengio, Y.⁹

44
- 79951563340
- Understanding the difficulty of training deep feedforward neural networks
- X Glorot and Y Bengio, "Understanding the difficulty of training deep feedforward neural networks," in Proc AISTATS, 2010.
- (2010) Proc AISTATS
- Glorot, X.¹ Bengio, Y.²

45
- 33745805403
- A fast learning algorithm for deep belief nets
- G Hinton, S Osindero, and Y Teh, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18, pp. 1527-1554, 2006.
- (2006) Neural Computation , vol.18 , pp. 1527-1554
- Hinton, G.¹ Osindero, S.² Teh, Y.³

46
- 84863380535
- Unsupervised feature learning for audio classification using convolutional deep belief networks
- H Lee, P Pham, Y Largman, and A Ng, "Unsupervised feature learning for audio classification using convolutional deep belief networks," in Proc NIPS 22, 2009, pp. 1096-1104.
- (2009) Proc NIPS , vol.22 , pp. 1096-1104
- Lee, H.¹ Pham, P.² Largman, Y.³ Ng, A.⁴

47
- 84864073449
- Greedy layerwise training of deep networks
- Y Bengio, P Lamblin, D Popovici, and H Larochelle, "Greedy layerwise training of deep networks," in Proc NIPS 19, 2007, pp. 153-160.
- (2007) Proc NIPS , vol.19 , pp. 153-160
- Bengio, Y.¹ Lamblin, P.² Popovici, D.³ Larochelle, H.⁴

48
- 51449120120
- Boosted MMI for model and feature-space discriminative training
- D Povey, D Kanevsky, B Kingsbury, B Ramabhadran, G Saon, and K Visweswariah, "Boosted MMI for model and feature-space discriminative training," in Proc IEEE ICASSP, 2008, pp. 4057-4060.
- (2008) Proc IEEE ICASSP , pp. 4057-4060
- Povey, D.¹ Kanevsky, D.² Kingsbury, B.³ Ramabhadran, B.⁴ Saon, G.⁵ Visweswariah, K.⁶

49
- 0032638856
- Semi-tied covariance matrices for hidden Markov models
- MJF Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans Speech and Audio Process., vol. 7, pp. 272-281, 1999.
- (1999) IEEE Trans Speech and Audio Process , vol.7 , pp. 272-281
- Gales, M.J.F.¹

50
- 50449086237
- Acoustic beamforming for speaker diarization of meetings
- X Anguera, C Wooters, and J Hernando, "Acoustic beamforming for speaker diarization of meetings," IEEE Trans. Audio, Speech, & Lang. Process., vol. 15, pp. 2011-2021, 2007.
- (2007) IEEE Trans. Audio, Speech, & Lang. Process. , vol.15 , pp. 2011-2021
- Anguera, X.¹ Wooters, C.² Hernando, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.