SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 2449-2453

Combating reverberation in large vocabulary continuous speech recognition

(7) Mitra, Vikramjit a Van Hout, Julien a McLaren, Mitchell a Wang, Wen a Graciarena, Martin a Vergyri, Dimitra a Franco, Horacio a

a SRI INTERNATIONAL (United States)

Author keywords

Deep neural networks; Reverberation robustness; Robust features; Robust speech recognition

Indexed keywords

CONTINUOUS SPEECH RECOGNITION; MODELING LANGUAGES; REVERBERATION; SPEECH; SPEECH COMMUNICATION; VOCABULARY CONTROL;

AUTOMATIC SPEECH RECOGNITION SYSTEM; DEEP NEURAL NETWORKS; GAUSSIAN MIXTURE MODEL (GMMS); LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION; REVERBERATION ROBUSTNESS; ROBUST ACOUSTIC FEATURES; ROBUST FEATURES; ROBUST SPEECH RECOGNITION;

SPEECH RECOGNITION;

EID: 84959111702 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (3)

References (33)

1
- 84893622444
- The REVERB challenge: A common evaluation framework for dereverberation and recognition of reverberant speech
- K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, S. Gannot and B. Raj, "The REVERB Challenge: A Common Evaluation Framework for Dereverberation and Recognition of Reverberant Speech, " Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013.
- (2013) Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- Kinoshita, K.¹ Delcroix, M.² Yoshioka, T.³ Nakatani, T.⁴ Habets, E.⁵ Haeb-Umbach, R.⁶ Leutnant, V.⁷ Sehr, A.⁸ Kellermann, W.⁹ Maas, R.¹⁰ Gannot, S.¹¹ Raj, B.¹²

2
- 0003980102
- New York: Springer Verlag
- M. S. Brandstein and D. B. Ward, Microphone Arrays: Signal Processing Techniques and Applications. New York: Springer Verlag, 2001.
- (2001) Microphone Arrays: Signal Processing Techniques and Applications
- Brandstein, M.S.¹ Ward, D.B.²

3
- 0028478507
- Combined acoustic echo cancellation, dereverberation and noise reduction: A two microphone approach
- R. Martin and P. Vary, "Combined Acoustic Echo Cancellation, Dereverberation and Noise Reduction: A Two Microphone Approach, " Journal of Annales des Telecommunications, Vol. 49, Iss. 7-8, pp. 429-438, 1994.
- (1994) Journal of Annales des Telecommunications , vol.49 , Issue.7-8 , pp. 429-438
- Martin, R.¹ Vary, P.²

4
- 84964511330
- Single channel blind dereverberation based on auto-correlation functions of frame-wise time sequences of frequency components
- K. Ohta and M. Yanagida, "Single Channel Blind Dereverberation Based on Auto-Correlation Functions of Frame-Wise Time Sequences of Frequency Components, " Proc. of IWAENC, pp. 1-4, 2006.
- (2006) Proc. of IWAENC , pp. 1-4
- Ohta, K.¹ Yanagida, M.²

5
- 33745761716
- A two-stage algorithm for one-microphone reverberant speech enhancement
- M. Wu and D. L. Wang, "A Two-Stage Algorithm for One-Microphone Reverberant Speech Enhancement, " IEEE Trans. Aud. Speech & Lang. Process., Vol. 14, No. 3, pp. 774-784, 2006.
- (2006) IEEE Trans. Aud. Speech & Lang. Process. , vol.14 , Issue.3 , pp. 774-784
- Wu, M.¹ Wang, D.L.²

6
- 4544336156
- Robust automatic speech recognition in reverberant environments by model selection
- L. Couvreur and C. Couvreur, "Robust Automatic Speech Recognition in Reverberant Environments by Model Selection, " Proc. of HSC, pp. 147-150, 2001.
- (2001) Proc. of HSC , pp. 147-150
- Couvreur, L.¹ Couvreur, C.²

7
- 34547517494
- A new concept for feature-domain dereverberation for robust distant-talking ASR
- A. Sehr and W. Kellermann, "A New Concept for Feature-Domain Dereverberation for Robust Distant-Talking ASR, " Proc. of ICASSP, pp. 369-372, 2007.
- (2007) Proc. of ICASSP , pp. 369-372
- Sehr, A.¹ Kellermann, W.²

8
- 70350450398
- Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing
- M. Delcroix and S. Watanabe, "Static and Dynamic Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing, " IEEE Trans. on Aud. Speech & Lang. Process., Vol. 17, No. 2, pp. 324-334, 2009.
- (2009) IEEE Trans. on Aud. Speech & Lang. Process. , vol.17 , Issue.2 , pp. 324-334
- Delcroix, M.¹ Watanabe, S.²

9
- 84928158251
- Use of multiple front-ends and i-vector-based speaker adaptation for robust speech recognition
- Md. J. Alam, V. Gupta, P. Kenny, P. Dumouchel, "Use Of Multiple Front-Ends And I-Vector-Based Speaker Adaptation For Robust Speech Recognition, " Proc. of REVERB Challenge, 2014.
- (2014) Proc. of REVERB Challenge
- Alam, M.J.¹ Gupta, V.² Kenny, P.³ Dumouchel, P.⁴

10
- 84933559263
- Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge
- M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, I. Nobutaka, K. Kinoshita, M. Espi, T. Hori and T. Nakatani, "Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge, " Proc. of REVERB Challenge, 2014.
- (2014) Proc. of REVERB Challenge
- Delcroix, M.¹ Yoshioka, T.² Ogawa, A.³ Kubo, Y.⁴ Fujimoto, M.⁵ Nobutaka, I.⁶ Kinoshita, K.⁷ Espi, M.⁸ Hori, T.⁹ Nakatani, T.¹⁰

11
- 84928158249
- Robust features and system fusion for reverberationrobust speech recognition
- V. Mitra, W. Wang, Y. Lei, A. Kathol, G. Sivaraman, C. Espy-Wilson, "Robust features and system fusion for reverberationrobust speech recognition, " Proc. of REVERB Challenge, 2014.
- (2014) Proc. of REVERB Challenge
- Mitra, V.¹ Wang, W.² Lei, Y.³ Kathol, A.⁴ Sivaraman, G.⁵ Espy-Wilson, C.⁶

12
- 84055211743
- Acoustic modeling using deep belief networks
- A. Mohamed, G. E. Dahl and G. Hinton, "Acoustic modeling using deep belief networks, " IEEE Trans. on ASLP, Vol. 20, no. 1, pp. 14-22, 2012.
- (2012) IEEE Trans. on ASLP , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.³

13
- 84858953286
- Tech. Rep. CMU-LTI-97-150. Carnegie Mellon University
- P. Zhan and A Waibel, "Vocal tract length normalization for LVCSR, " in Tech. Rep. CMU-LTI-97-150. Carnegie Mellon University, 1997.
- (1997) Vocal Tract Length Normalization for LVCSR
- Zhan, P.¹ Waibel, A.²

14
- 84910075252
- Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions
- V. Mitra, W. Wang, H. Franco, Y. Lei, C. Bartels, M. Graciarena, "Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions, " in Proc. of Interspeech, pp. 895-899, 2014.
- (2014) Proc. of Interspeech , pp. 895-899
- Mitra, V.¹ Wang, W.² Franco, H.³ Lei, Y.⁴ Bartels, C.⁵ Graciarena, M.⁶

15
- 84893691530
- Speaker adaptation of neural network acoustic models using ivectors
- G. Saon, H. Soltau, D. Nahamoo and M. Picheny, "Speaker Adaptation of Neural Network Acoustic Models using Ivectors, " Proc. ASRU, 2013.
- (2013) Proc. ASRU
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

16
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, "Front-end factor analysis for speaker verification, " IEEE Trans. on Speech and Audio Processing, 2011, 19, 788-798.
- (2011) IEEE Trans. on Speech and Audio Processing , vol.19 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

17
- 0028996854
- WSJCAM0: A british english speech corpus for large vocabulary continuous speech recognition
- T. Robinson, J. Fransen, D. Pye, J. Foote and S. Renals, "WSJCAM0: A British English Speech Corpus for Large Vocabulary Continuous Speech Recognition, " Proc. ICASSP, pp. 81-84, 1995.
- (1995) Proc. ICASSP , pp. 81-84
- Robinson, T.¹ Fransen, J.² Pye, D.³ Foote, J.⁴ Renals, S.⁵

18
- 33846217002
- The multi-channel wall street journal audio visual corpus (MCWSJ-AV): Specification and initial experiments
- M. Lincoln, I. McCowan, J. Vepa and H. K. Maganti, "The Multi-Channel Wall Street Journal Audio Visual Corpus (MCWSJ-AV): Specification and Initial Experiments, " proc. of IEEE Workshop on Automatic Speech Recognition and Understanding, 2005.
- (2005) Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding
- Lincoln, M.¹ McCowan, I.² Vepa, J.³ Maganti, H.K.⁴

19
- 84906260861
- Damped oscillator cepstral coefficients for robust speech recognition
- V. Mitra, H. Franco and M. Graciarena, "Damped Oscillator Cepstral Coefficients for Robust Speech Recognition, " Proc. of Interspeech, pp. 886-890, 2013.
- (2013) Proc. of Interspeech , pp. 886-890
- Mitra, V.¹ Franco, H.² Graciarena, M.³

20
- 84867589420
- Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
- V. Mitra, H. Franco, M. Graciarena, and A. Mandal, "Normalized Amplitude Modulation Features for Large Vocabulary Noise-Robust Speech Recognition, " Proc. of ICASSP, pp. 4117-4120, 2012.
- (2012) Proc. of ICASSP , pp. 4117-4120
- Mitra, V.¹ Franco, H.² Graciarena, M.³ Mandal, A.⁴

21
- 0027676955
- Energy separation in signal modulations with application to speech analysis
- P. Maragos, J. Kaiser and T. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis, " IEEE Trans. Signal Processing, Vol. 41, pp. 3024-3051, 1993.
- (1993) IEEE Trans. Signal Processing , vol.41 , pp. 3024-3051
- Maragos, P.¹ Kaiser, J.² Quatieri, T.³

22
- 84905269267
- Medium duration modulation cepstral feature for robust speech recognition
- Florence
- V. Mitra, H. Franco, M. Graciarena, D. Vergyri, "Medium duration modulation cepstral feature for robust speech recognition, " Proc. of ICASSP, Florence, 2014.
- (2014) Proc. of ICASSP
- Mitra, V.¹ Franco, H.² Graciarena, M.³ Vergyri, D.⁴

23
- 84906246749
- Modulation features for noise robust speaker identification
- V. Mitra, M. McLaren, H. Franco, M. Graciarena and N. Scheffer, "Modulation Features for Noise Robust Speaker Identification, " Proc. of Interspeech, pp. 3703-3707, 2013.
- (2013) Proc. of Interspeech , pp. 3703-3707
- Mitra, V.¹ McLaren, M.² Franco, H.³ Graciarena, M.⁴ Scheffer, N.⁵

24
- 0019075685
- Some observations on oral air flow during phonation
- H. Teager, "Some Observations on Oral Air Flow During Phonation, " in IEEE Trans. ASSP, pp. 599-601, 1980.
- (1980) IEEE Trans. ASSP , pp. 599-601
- Teager, H.¹

25
- 84906248945
- All for one: Feature combination for highly channel-degraded speech activity detection
- Lyon
- M. Graciarena, A. Alwan D. Ellis, H. Franco, L. Ferrer, J. H. L. Hansen, A. Janin, B-S. Lee, Y. Lei, V. Mitra, N. Morgan, S. O. Sadjadi, T. J. Tsai, N. Scheffer, L. N. Tan and B. Williams, "All for One: Feature Combination for Highly Channel-Degraded Speech Activity Detection, " Proc. of Interspeech, pp. 709-713, Lyon, 2013.
- (2013) Proc. of Interspeech , pp. 709-713
- Graciarena, M.¹ Alwan Ellis D, A.² Franco, H.³ Ferrer, L.⁴ Hansen, J.H.L.⁵ Janin, A.⁶ Lee, B.-S.⁷ Lei, Y.⁸ Mitra, V.⁹ Morgan, N.¹⁰ Sadjadi, S.O.¹¹ Tsai, T.J.¹² Scheffer, N.¹³ Tan, L.N.¹⁴ Williams, B.¹⁵

26
- 0030638031
- A post-processing system to yield reduced word error rates: Recognizer output voting error reduction. (ROVER)
- J. G. Fiscus, "A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction. (ROVER), " Proc. of ASRU, pp. 347-354, 1997.
- (1997) Proc. of ASRU , pp. 347-354
- Fiscus, J.G.¹

27
- 84867605836
- Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
- O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " Proc. of ICASSP, pp. 4277-4280, 2012.
- (2012) Proc. of ICASSP , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.² Jiang, H.³ Penn, G.⁴

28
- 84890525984
- Deep convolutional neural network for LVCSR
- T. Sainath, A. Mohamed, B. Kingsbury and B. Ramabhadran, "Deep convolutional neural network for LVCSR", Proc. of ICASSP, 2013.
- (2013) Proc. of ICASSP
- Sainath, T.¹ Mohamed, A.² Kingsbury, B.³ Ramabhadran, B.⁴

29
- 80051639873
- Gammatone sub-band magnitude-domain dereverberation for ASR
- K. Kumar, R. Singh, B. Raj, R. Stern, R., "Gammatone sub-band magnitude-domain dereverberation for ASR, " Proc. of ICASSP, pp. 4604-4607, 2011.
- (2011) Proc. of ICASSP , pp. 4604-4607
- Kumar, K.¹ Singh, R.² Raj, B.³ Stern, R.R.⁴

30
- 84891308106
- SRILM-an extensible language modeling toolkit
- A. Stolcke, "SRILM-An Extensible Language Modeling Toolkit, " Proc. of ICSLP 2002, pp. 901-904, 2002.
- (2002) Proc. of ICSLP 2002 , pp. 901-904
- Stolcke, A.¹

31
- 70349215697
- Data-driven lexicon expansion for Mandarin broadcast news and conversation speech recognition
- X. Lei and W. Wang and A. Stolcke, "Data-driven Lexicon Expansion for Mandarin Broadcast News and Conversation Speech Recognition, " Proc. of ICASSP, 2009.
- (2009) Proc. of ICASSP
- Lei, X.¹ Wang, W.² Stolcke, A.³

32
- 84901784231
- RNNLM-Recurrent neural network language modeling toolkit
- T. Mikolov, S. Kombrink, D. Anoop, L. Burget, and J. Cernocky, "RNNLM-Recurrent neural network language modeling toolkit, " Proc. of ASRU, 2011.
- (2011) Proc. of ASRU
- Mikolov, T.¹ Kombrink, S.² Anoop, D.³ Burget, L.⁴ Cernocky, J.⁵

33
- 0037519295
- The SRI March 2000 hub-5 conversational speech transcription system
- A. Stolcke, H. Bratt, J. Butzberger, H. Franco, V. R. Rao Gadde, M. Plauche, C. Richey, E. Shriberg, K. Sonmez, W. Weng, J. Zheng, "The SRI March 2000 Hub-5 Conversational Speech Transcription System, " Proc. of NIST Speech Transcription Workshop, 2000.
- (2000) Proc. of NIST Speech Transcription Workshop
- Stolcke, A.¹ Bratt, H.² Butzberger, J.³ Franco, H.⁴ Rao Gadde, V.R.⁵ Plauche, M.⁶ Richey, C.⁷ Shriberg, E.⁸ Sonmez, K.⁹ Weng, W.¹⁰ Zheng, J.¹¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.