SCOPUS 정보 검색 플랫폼

2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings

Volumn , Issue , 2014, Pages 548-553

Deep convolutional nets and robust features for reverberation-robust speech recognition

(3) Mitra, Vikramjit a Wang, Wen a Franco, Horacio a

a SRI INTERNATIONAL (United States)

Author keywords

Deep convolutional networks; Feature combination; Reverberation robustness; Robust features; Robust speech recognition

Indexed keywords

BATCH DATA PROCESSING; CONVOLUTION; REVERBERATION; SPEECH;

CONVOLUTIONAL NETWORKS; FEATURE COMBINATION; REVERBERATION ROBUSTNESS; ROBUST FEATURES; ROBUST SPEECH RECOGNITION;

SPEECH RECOGNITION;

EID: 84946693063 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/SLT.2014.7078633 Document Type: Conference Paper

Times cited : (9)

References (28)

1
- 84893622444
- The REVERB challenge: A common evaluation framework for dereverberation and recognition of reverberant speech
- K. Kinoshita, M. Delcroix, T. Yoshioka, T. Nakatani, E. Habets, R. Haeb-Umbach, V. Leutnant, A. Sehr, W. Kellermann, R. Maas, S. Gannot and B. Raj, "The REVERB Challenge: A Common Evaluation Framework for Dereverberation and Recognition of Reverberant Speech, " Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2013.
- (2013) Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)
- Kinoshita, K.¹ Delcroix, M.² Yoshioka, T.³ Nakatani, T.⁴ Habets, E.⁵ Haeb-Umbach, R.⁶ Leutnant, V.⁷ Sehr, A.⁸ Kellermann, W.⁹ Maas, R.¹⁰ Gannot, S.¹¹ Raj, B.¹²

2
- 0003980102
- New York: Springer Verlag
- M. S. Brandstein and D. B. Ward, Microphone Arrays: Signal Processing Techniques and Applications. New York: Springer Verlag, 2001.
- (2001) Microphone Arrays: Signal Processing Techniques and Applications
- Brandstein, M.S.¹ Ward, D.B.²

3
- 0028478507
- Combined acoustic echo cancellation, dereverberation and noise reduction: A two microphone approach
- R. Martin and P. Vary, "Combined Acoustic Echo Cancellation, Dereverberation and Noise Reduction: A Two Microphone Approach, " Journal of Annales des Télécommunications, Vol. 49, Iss. 7-8, pp. 429-438, 1994.
- (1994) Journal of Annales des Télécommunications , vol.49 , Issue.7-8 , pp. 429-438
- Martin, R.¹ Vary, P.²

4
- 84964511330
- Single channel blind dereverberation based on auto-correlation functions of frame-wise time sequences of frequency components
- K. Ohta and M. Yanagida, "Single Channel Blind Dereverberation Based on Auto-Correlation Functions of Frame-Wise Time Sequences of Frequency Components, " Proc. of IWAENC, pp. 1-4, 2006.
- (2006) Proc. of IWAENC , pp. 1-4
- Ohta, K.¹ Yanagida, M.²

5
- 33745761716
- A two-stage algorithm for one-microphone reverberant speech enhancement
- M. Wu and D. L. Wang, "A Two-Stage Algorithm for One-Microphone Reverberant Speech Enhancement, " IEEE Trans. Aud. Speech & Lang. Process, Vol. 14, No. 3, pp. 774-784, 2006.
- (2006) IEEE Trans. Aud. Speech & Lang. Process , vol.14 , Issue.3 , pp. 774-784
- Wu, M.¹ Wang, D.L.²

6
- 4544336156
- Robust automatic speech recognition in reverberant environments by model selection
- L. Couvreur and C. Couvreur, "Robust Automatic Speech Recognition in Reverberant Environments by Model Selection, " Proc. of HSC, pp. 147-150, 2001.
- (2001) Proc. of HSC , pp. 147-150
- Couvreur, L.¹ Couvreur, C.²

7
- 34547517494
- A new concept for feature-domain dereverberation for robust distant-talking asr
- A. Sehr and W. Kellermann, "A New Concept for Feature-Domain Dereverberation for Robust Distant-Talking ASR, " Proc. of ICASSP, pp. 369-372, 2007.
- (2007) Proc. of ICASSP , pp. 369-372
- Sehr, A.¹ Kellermann, W.²

8
- 70350450398
- Static and dynamic variance compensation for recognition of reverberant speech with dereverberation preprocessing
- M. Delcroix and S. Watanabe, "Static and Dynamic Variance Compensation for Recognition of Reverberant Speech with Dereverberation Preprocessing, " IEEE Trans. on Aud. Speech & Lang. Process, Vol. 17, No. 2, pp. 324-334, 2009.
- (2009) IEEE Trans. on Aud. Speech & Lang. Process , vol.17 , Issue.2 , pp. 324-334
- Delcroix, M.¹ Watanabe, S.²

9
- 84928158251
- Use of multiple front-ends and i-vector-based speaker adaptation for robust speech recognition
- Md. J. Alam, V. Gupta, P. Kenny, P. Dumouchel, "Use Of Multiple Front-Ends And I-Vector-Based Speaker Adaptation For Robust Speech Recognition, " in Proc. of REVERB Challenge, 2014.
- (2014) Proc. of REVERB Challenge
- Alam, M.J.¹ Gupta, V.² Kenny, P.³ Dumouchel, P.⁴

10
- 84933559263
- Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge
- M. Delcroix, T. Yoshioka, A. Ogawa, Y. Kubo, M. Fujimoto, I. Nobutaka, K. Kinoshita, M. Espi, T. Hori, T. Nakatani, "Linear prediction-based dereverberation with advanced speech enhancement and recognition technologies for the REVERB challenge, " in Proc. of REVERB Challenge, 2014.
- (2014) Proc. of REVERB Challenge
- Delcroix, M.¹ Yoshioka, T.² Ogawa, A.³ Kubo, Y.⁴ Fujimoto, M.⁵ Nobutaka, I.⁶ Kinoshita, K.⁷ Espi, M.⁸ Hori, T.⁹ Nakatani, T.¹⁰

11
- 84928158249
- Robust features and system fusion for reverberation-robust speech recognition
- V. Mitra, W. Wang, Y. Lei, A. Kathol, G. Sivaraman, C. Espy-Wilson, "Robust features and system fusion for reverberation-robust speech recognition, " in Proc. of REVERB Challenge, 2014.
- (2014) Proc. of REVERB Challenge
- Mitra, V.¹ Wang, W.² Lei, Y.³ Kathol, A.⁴ Sivaraman, G.⁵ Espy-Wilson, C.⁶

12
- 84055211743
- Acoustic modeling using deep belief networks
- A. Mohamed, G. E. Dahl and G. Hinton, "Acoustic modeling using deep belief networks, " IEEE Trans. on ASLP, Vol. 20, no. 1, pp. 14-22, 2012.
- (2012) IEEE Trans. on ASLP , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.³

13
- 84858953286
- Tech. Rep. CMU-LTI-97-150. Carnegie Mellon University
- P. Zhan and A Waibel, "Vocal tract length normalization for LVCSR, " in Tech. Rep. CMU-LTI-97-150. Carnegie Mellon University, 1997
- (1997) Vocal Tract Length Normalization for LVCSR
- Zhan, P.¹ Waibel, A.²

14
- 84910075252
- Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions
- V. Mitra, W. Wang, H. Franco, Y. Lei, C. Bartels, M. Graciarena, "Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions, " in Proc. of Interspeech, 2014.
- (2014) Proc. of Interspeech
- Mitra, V.¹ Wang, W.² Franco, H.³ Lei, Y.⁴ Bartels, C.⁵ Graciarena, M.⁶

15
- 84893691530
- Speaker adaptation of neural network acoustic models using i-vectors
- G. Saon, H. Soltau, D. Nahamoo and M. Picheny, "Speaker Adaptation of Neural Network Acoustic Models using I-vectors, " in Proc. ASRU, 2013.
- (2013) Proc. ASRU
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

16
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, "Front-end factor analysis for speaker verification, " IEEE Trans. on Speech and Audio Processing, 2011, 19, 788-798.
- (2011) IEEE Trans. on Speech and Audio Processing , vol.19 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

17
- 0028996854
- WSJCAM0: A british english speech corpus for large vocabulary continuous speech recognition
- T. Robinson, J. Fransen, D. Pye, J. Foote and S. Renals, "WSJCAM0: A British English Speech Corpus for Large Vocabulary Continuous Speech Recognition, " Proc. ICASSP, pp. 81-84, 1995.
- (1995) Proc. ICASSP , pp. 81-84
- Robinson, T.¹ Fransen, J.² Pye, D.³ Foote, J.⁴ Renals, S.⁵

18
- 33846217002
- The multi-channel wall street journal audio visual corpus (MCWSJ-AV): Specification and initial experiments
- M. Lincoln, I. McCowan, J. Vepa and H. K. Maganti, "The Multi-Channel Wall Street Journal Audio Visual Corpus (MCWSJ-AV): Specification and Initial Experiments, " proc. of IEEE Workshop on Automatic Speech Recognition and Understanding, 2005.
- (2005) Proc. of IEEE Workshop on Automatic Speech Recognition and Understanding
- Lincoln, M.¹ McCowan, I.² Vepa, J.³ Maganti, H.K.⁴

19
- 84906260861
- Damped oscillator cepstral coefficients for robust speech recognition
- V. Mitra, H. Franco and M. Graciarena, "Damped Oscillator Cepstral Coefficients for Robust Speech Recognition, " Proc. of Interspeech, pp. 886-890, 2013.
- (2013) Proc. of Interspeech , pp. 886-890
- Mitra, V.¹ Franco, H.² Graciarena, M.³

20
- 84867589420
- Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
- V. Mitra, H. Franco, M. Graciarena, and A. Mandal, "Normalized Amplitude Modulation Features for Large Vocabulary Noise-Robust Speech Recognition, " Proc. of ICASSP, pp. 4117-4120, 2012.
- (2012) Proc. of ICASSP , pp. 4117-4120
- Mitra, V.¹ Franco, H.² Graciarena, M.³ Mandal, A.⁴

21
- 0027676955
- Energy separation in signal modulations with application to speech analysis
- P. Maragos, J. Kaiser and T. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis, " IEEE Trans. Signal Processing, Vol. 41, pp. 3024-3051, 1993.
- (1993) IEEE Trans. Signal Processing , vol.41 , pp. 3024-3051
- Maragos, P.¹ Kaiser, J.² Quatieri, T.³

22
- 84905269267
- Medium duration modulation cepstral feature for robust speech recognition
- Florence
- V. Mitra, H. Franco, M. Graciarena, D. Vergyri, "Medium duration modulation cepstral feature for robust speech recognition, " Proc. of ICASSP, Florence, 2014.
- (2014) Proc. of ICASSP
- Mitra, V.¹ Franco, H.² Graciarena, M.³ Vergyri, D.⁴

23
- 84906246749
- Modulation features for noise robust speaker identification
- V. Mitra, M. McLaren, H. Franco, M. Graciarena and N. Scheffer, "Modulation Features for Noise Robust Speaker Identification, " Proc. of Interspeech, pp. 3703-3707, 2013.
- (2013) Proc. of Interspeech , pp. 3703-3707
- Mitra, V.¹ McLaren, M.² Franco, H.³ Graciarena, M.⁴ Scheffer, N.⁵

24
- 0019075685
- Some observations on oral air flow during phonation
- H. Teager, "Some Observations on Oral Air Flow During Phonation, " in IEEE Trans. ASSP, pp. 599-601, 1980.
- (1980) IEEE Trans. ASSP , pp. 599-601
- Teager, H.¹

25
- 84928164944
- The design for the wall street journal-based CSR corpus
- D. B. Paul and J. M. Baker, "The Design for the Wall Street Journal-based CSR Corpus, " Proc. of HLT, pp 3
- Proc. of HLT , pp. 3
- Paul, D.B.¹ Baker, J.M.²

26
- 0030638031
- A post-processing system to yield reduced word error rates: Recognizer output voting error reduction.(ROVER)
- J. G. Fiscus, "A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction. (ROVER), " Proc. of ASRU, pp. 347-354, 1997.
- (1997) Proc. of ASRU , pp. 347-354
- Fiscus, J.G.¹

27
- 84867605836
- Applying convolutional neural networks concepts to hybrid NNHMM model for speech recognition
- O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NNHMM model for speech recognition, " Proc. of ICASSP, pp. 4277-4280, 2012.
- (2012) Proc. of ICASSP , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.² Jiang, H.³ Penn, G.⁴

28
- 84890525984
- Deep convolutional neural network for LVCSR
- T. Sainath, A. Mohamed, B. Kingsbury and B. Ramabhadran, "Deep convolutional neural network for LVCSR", Proc. of ICASSP, 2013.
- (2013) Proc. of ICASSP
- Sainath, T.¹ Mohamed, A.² Kingsbury, B.³ Ramabhadran, B.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.