SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 895-899

Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions

(6) Mitra, Vikramjit a Wang, Wen a Franco, Horacio a Lei, Yun a Bartels, Chris a Graciarena, Martin a

Author keywords

Continuous speech recognition; Convolutional neural networks; Damped oscillators; Deep neural networks; Modulation features; Noise robust speech recognition

Indexed keywords

ACOUSTIC NOISE; CONTINUOUS SPEECH RECOGNITION; CONVOLUTION; NEURAL NETWORKS; SPEECH PROCESSING;

CONVOLUTIONAL NEURAL NETWORK; DAMPED OSCILLATORS; DEEP NEURAL NETWORKS; MODULATION FEATURES; NOISE ROBUST SPEECH RECOGNITION;

SPEECH COMMUNICATION;

EID: 84910075252 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (43)

References (34)

1
- 84055211743
- Acoustic modeling using deep belief networks
- A. Mohamed, G.E. Dahl and G. Hinton, "Acoustic modeling using deep belief networks, " IEEE Trans. on ASLP, Vol. 20, no. 1, pp. 14 -22, 2012.
- (2012) IEEE Trans. on ASLP , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.³

2
- 84865801985
- Conversational speech transcription using context-dependent deep neural networks
- F. Seide, G. Li and D. Yu, "Conversational speech transcription using context-dependent deep neural networks, " Proc. of Interspeech, 2011.
- (2011) Proc. of Interspeech
- Seide, F.¹ Li, G.² Yu, D.³

3
- 84878379108
- Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
- B. Kingsbury, T. N. Sainath, and H. Soltau, "Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization, " Proc. of Interspeech, 2012.
- (2012) Proc. of Interspeech
- Kingsbury, B.¹ Sainath, T.N.² Soltau, H.³

4
- 0033097443
- Single channel speech enhancement based on masking properties of the human auditory system
- N. Virag, "Single channel speech enhancement based on masking properties of the human auditory system", IEEE Trans. Speech Audio Process., 7(2), pp. 126-137, 1999.
- (1999) IEEE Trans. Speech Audio Process , vol.7 , Issue.2 , pp. 126-137
- Virag, N.¹

5
- 56249136428
- Transforming binary uncertainties for robust speech recognition
- S. Srinivasan and D. L. Wang, "Transforming binary uncertainties for robust speech recognition", IEEE Trans Audio, Speech, Lang. Process., 15(7), pp. 2130-2140, 2007.
- (2007) IEEE Trans Audio, Speech, Lang. Process , vol.15 , Issue.7 , pp. 2130-2140
- Srinivasan, S.¹ Wang, D.L.²

6
- 0442317754
- ETSI ES 202 050 Ver. 1.1.5
- Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Adv. Front-end Feature Extraction Algorithm; Compression Algorithms, ETSI ES 202 050 Ver. 1.1.5, 2007.
- (2007) Speech Processing, Transmission and Quality Aspects (STQ); Distributed Speech Recognition; Adv. Front-end Feature Extraction Algorithm; Compression Algorithms

7
- 78049398950
- Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring
- C. Kim and R. M. Stern, "Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring", in Proc. ICASSP, pp. 4574- 4577, 2010.
- (2010) Proc. ICASSP , pp. 4574-4577
- Kim, C.¹ Stern, R.M.²

8
- 84867613224
- Fepstrum features: Design and application to conversational speech recognition
- 11009
- V. Tyagi, "Fepstrum features: Design and application to conversational speech recognition", IBM Research Report, 11009, 2011.
- (2011) IBM Research Report
- Tyagi, V.¹

9
- 84867589420
- Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
- Japan
- V. Mitra, H. Franco, M. Graciarena and A. Mandal, "Normalized amplitude modulation features for large vocabulary noise-robust speech recognition", in Proc. of ICASSP, pp. 4117-4120, Japan, 2012.
- (2012) Proc. of ICASSP , pp. 4117-4120
- Mitra, V.¹ Franco, H.² Graciarena, M.³ Mandal, A.⁴

10
- 0030638031
- A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
- J. G. Fiscus, "A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction. (ROVER), " Proc. of ASRU, pp. 347-354, 1997.
- (1997) Proc. of ASRU , pp. 347-354
- Fiscus, J.G.¹

11
- 17344389852
- Robust speech recognition in noisy environments: The 2001 IBM SPIN Eevaluation system
- FL
- B. Kingsbury, G. Saon, L. Mangu, M. Padmanabhan and R. Sarikaya, "Robust speech recognition in noisy environments: The 2001 IBM SPIN Eevaluation system", In Proc. of ICASSP, Vol.1, pp.I53-I56, FL, 2002.
- (2002) Proc. of ICASSP , vol.1 , pp. I53-I56
- Kingsbury, B.¹ Saon, G.² Mangu, L.³ Padmanabhan, M.⁴ Sarikaya, R.⁵

12
- 0036291381
- Digit recognition in noisy environments via a sequential GMM/SVM system
- FL
- S. Fine, G. Saon, and R.A. Gopinath, "Digit recognition in noisy environments via a sequential GMM/SVM system", In Proc. of ICASSP, Vol.1, pp.I49-I52, FL, 2002.
- (2002) Proc. of ICASSP , vol.1 , pp. I49-I52
- Fine, S.¹ Saon, G.² Gopinath, R.A.³

13
- 0035342414
- Robust automatic speech recognition with missing and unreliable acoustic data
- M. Cooke, P. Green, L. Josifovski and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data", Speech Comm., 34(3), pp.267-285, 2001.
- (2001) Speech Comm , vol.34 , Issue.3 , pp. 267-285
- Cooke, M.¹ Green, P.² Josifovski, L.³ Vizinho, A.⁴

14
- 85083953021
- Feature learning in deep neural networks - Studies on speech recognition tasks
- D. Yu, M. Seltzer, J. Li, J-T. Huang and Frank Seide, "Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks", ICLR 2013.
- (2013) ICLR
- Yu, D.¹ Seltzer, M.² Li, J.³ Huang, J.-T.⁴ Seide, F.⁵

15
- 84858953286
- Vocal tract length normalization for LVCSR
- Carnegie Mellon University
- P. Zhan and A Waibel, "Vocal tract length normalization for LVCSR, " in Tech. Rep. CMU-LTI-97-150. Carnegie Mellon University, 1997.
- (1997) Tech. Rep. CMU-LTI-97-150
- Zhan, P.¹ Waibel, A.²

16
- 78649390043
- Retrieving tract variables from acoustics: A comparison of different machine learning strategies
- V. Mitra, H. Nam, C. Espy-Wilson, E. Saltzman and L. Goldstein, "Retrieving Tract Variables from Acoustics: A comparison of different Machine Learning strategies, " IEEE Journal of Selected Topics on Signal Processing, Sp. Iss. on Statistical Learning Methods for Speech and Language Processing, Vol. 4, Iss. 6, pp. 1027-1045, 2010.
- (2010) IEEE Journal of Selected Topics on Signal Processing, Sp. Iss. on Statistical Learning Methods for Speech and Language Processing , vol.4 , Issue.6 , pp. 1027-1045
- Mitra, V.¹ Nam, H.² Espy-Wilson, C.³ Saltzman, E.⁴ Goldstein, L.⁵

17
- 84890492030
- An investigation of deep neural networks for noise robust speech recognition
- M. Seltzer, D. Yu, and Y. Wang, "An Investigation Of Deep Neural Networks For Noise Robust Speech Recognition", Proc of ICASSP, 2013.
- (2013) Proc of ICASSP
- Seltzer, M.¹ Yu, D.² Wang, Y.³

18
- 84867605836
- Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
- O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " Proc. of ICASSP, pp. 4277 -4280, 2012.
- (2012) Proc. of ICASSP , pp. 4277-4280
- Abdel-Hamid, O.¹ Mohamed, A.² Jiang, H.³ Penn, G.⁴

19
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath and B. Kingsbury, "Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups, " IEEE Signal Proc. Mag., 29(6), pp.82-97, 2012.
- (2012) IEEE Signal Proc. Mag. , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

20
- 84906214784
- Exploring convolutional neural network structures and optimization techniques for speech recognition
- O. Abdel-Hamid, L. Deng and D. Yu, "Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition, " Proc. of Interspeech, pp. 3366-3370, 2013.
- (2013) Proc. of Interspeech , pp. 3366-3370
- Abdel-Hamid, O.¹ Deng, L.² Yu, D.³

21
- 33646677283
- ETSI STQ-Aurora DSR Working Group, June 4
- G. Hirsch, "Experimental framework for the performance evaluation of speech recognition front-ends on a large vocabulary task", ETSI STQ-Aurora DSR Working Group, June 4, 2001.
- (2001) Experimental Framework for the Performance Evaluation of Speech Recognition Front-ends on a Large Vocabulary Task
- Hirsch, G.¹

22
- 84906260861
- Damped oscillator cepstral coefficients for robust speech recognition
- V. Mitra, H. Franco and M. Graciarena, "Damped Oscillator Cepstral Coefficients for Robust Speech Recognition, " Proc. of Interspeech, pp. 886-890, 2013.
- (2013) Proc. of Interspeech , pp. 886-890
- Mitra, V.¹ Franco, H.² Graciarena, M.³

23
- 84867589420
- Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
- V. Mitra, H. Franco, M. Graciarena, and A. Mandal, "Normalized Amplitude Modulation Features for Large Vocabulary Noise-Robust Speech Recognition, " Proc. of ICASSP, pp. 4117-4120, 2012.
- (2012) Proc. of ICASSP , pp. 4117-4120
- Mitra, V.¹ Franco, H.² Graciarena, M.³ Mandal, A.⁴

24
- 0028287770
- Effect of reducing slow temporal modulations on speech reception
- R. Drullman, J. M. Festen and R. Plomp, "Effect of Reducing Slow Temporal Modulations on Speech Reception, " J. Acoust. Soc. of Am., Vol. 95, No. 5, pp. 2670-2680, 1994.
- (1994) J. Acoust. Soc. of Am , vol.95 , Issue.5 , pp. 2670-2680
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

25
- 0034844903
- On the upper cutoff frequency of auditory critical- band envelope detectors in the context of speech perception
- O. Ghitza, "On the Upper Cutoff Frequency of Auditory Critical- Band Envelope Detectors in the Context of Speech Perception, " J. Acoust. Soc. of America, vol. 110, no. 3, pp. 1628-1640, 2001.
- (2001) J. Acoust. Soc. of America , vol.110 , Issue.3 , pp. 1628-1640
- Ghitza, O.¹

26
- 0027676955
- Energy separation in signal modulations with application to speech analysis
- P. Maragos, J. Kaiser and T. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis, " IEEE Trans. Signal Processing, Vol. 41, pp. 3024-3051, 1993.
- (1993) IEEE Trans. Signal Processing , vol.41 , pp. 3024-3051
- Maragos, P.¹ Kaiser, J.² Quatieri, T.³

27
- 84906246749
- Modulation features for noise robust speaker identification
- V. Mitra, M. McLaren, H. Franco, M. Graciarena and N. Scheffer, "Modulation Features for Noise Robust Speaker Identification, " Proc. of Interspeech, pp. 3703-3707, 2013.
- (2013) Proc. of Interspeech , pp. 3703-3707
- Mitra, V.¹ Mclaren, M.² Franco, H.³ Graciarena, M.⁴ Scheffer, N.⁵

28
- 0019075685
- Some observations on oral air flow during phonation
- H. Teager, "Some Observations on Oral Air Flow During Phonation, " in IEEE Trans. ASSP, pp. 599-601, 1980.
- (1980) IEEE Trans. ASSP , pp. 599-601
- Teager, H.¹

29
- 84905269267
- Medium duration modulation cepstral feature for robust speech recognition
- Florence
- V. Mitra, H. Franco, M. Graciarena, D. Vergyri, "Medium duration modulation cepstral feature for robust speech recognition, " Proc. of ICASSP, Florence, 2014.
- (2014) Proc. of ICASSP
- Mitra, V.¹ Franco, H.² Graciarena, M.³ Vergyri, D.⁴

30
- 84858953642
- The kaldi speech recognition toolkit
- D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al., "The kaldi speech recognition toolkit, " in Proc. ASRU, 2011.
- (2011) Proc. ASRU
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶ Hannemann, M.⁷ Motlicek, P.⁸ Qian, Y.⁹ Schwarz, P.¹⁰

31
- 84890525984
- Deep convolutional neural network for LVCSR
- T. Sainath, A. Mohamed, B. Kingsbury and B. Ramabhadran, "Deep convolutional neural network for LVCSR", Proc. of ICASSP, 2013.
- (2013) Proc. of ICASSP
- Sainath, T.¹ Mohamed, A.² Kingsbury, B.³ Ramabhadran, B.⁴

32
- 84890526837
- New types of deep neural network learning for speech recognition and related applications: An overview
- L. Deng, G. Hinton, and B. Kingsbury, "New types of deep neural network learning for speech recognition and related applications: An overview, " proc. of ICASSP, 2013.
- (2013) Proc. of ICASSP
- Deng, L.¹ Hinton, G.² Kingsbury, B.³

33
- 0021892216
- Speech enhancement using a minimum mean square error log-spectral amplitude estimator
- Apr
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error log-spectral amplitude estimator, " IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP- 33, no. 2, pp. 443-445, Apr. 1985.
- (1985) IEEE Trans. on Acoust., Speech, Signal Processing , vol.ASSP-33 , Issue.2 , pp. 443-445
- Ephraim, Y.¹ Malah, D.²

34
- 51449089990
- A Minimum-mean-square-error noise reduction algorithm on melfrequency cepstra for robust speech recognition
- Las Vegas, NV
- D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, "A Minimum-mean-square-error noise reduction algorithm on melfrequency cepstra for robust speech recognition, " in Proc. of ICASSP, Las Vegas, NV, 2008.
- (2008) Proc. of ICASSP
- Yu, D.¹ Deng, L.² Droppo, J.³ Wu, J.⁴ Gong, Y.⁵ Acero, A.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.