메뉴 건너뛰기




Volumn , Issue , 2014, Pages 895-899

Evaluating robust features on Deep Neural Networks for speech recognition in noisy and channel mismatched conditions

Author keywords

Continuous speech recognition; Convolutional neural networks; Damped oscillators; Deep neural networks; Modulation features; Noise robust speech recognition

Indexed keywords

ACOUSTIC NOISE; CONTINUOUS SPEECH RECOGNITION; CONVOLUTION; NEURAL NETWORKS; SPEECH PROCESSING;

EID: 84910075252     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (43)

References (34)
  • 1
    • 84055211743 scopus 로고    scopus 로고
    • Acoustic modeling using deep belief networks
    • A. Mohamed, G.E. Dahl and G. Hinton, "Acoustic modeling using deep belief networks, " IEEE Trans. on ASLP, Vol. 20, no. 1, pp. 14 -22, 2012.
    • (2012) IEEE Trans. on ASLP , vol.20 , Issue.1 , pp. 14-22
    • Mohamed, A.1    Dahl, G.E.2    Hinton, G.3
  • 2
    • 84865801985 scopus 로고    scopus 로고
    • Conversational speech transcription using context-dependent deep neural networks
    • F. Seide, G. Li and D. Yu, "Conversational speech transcription using context-dependent deep neural networks, " Proc. of Interspeech, 2011.
    • (2011) Proc. of Interspeech
    • Seide, F.1    Li, G.2    Yu, D.3
  • 3
    • 84878379108 scopus 로고    scopus 로고
    • Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization
    • B. Kingsbury, T. N. Sainath, and H. Soltau, "Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization, " Proc. of Interspeech, 2012.
    • (2012) Proc. of Interspeech
    • Kingsbury, B.1    Sainath, T.N.2    Soltau, H.3
  • 4
    • 0033097443 scopus 로고    scopus 로고
    • Single channel speech enhancement based on masking properties of the human auditory system
    • N. Virag, "Single channel speech enhancement based on masking properties of the human auditory system", IEEE Trans. Speech Audio Process., 7(2), pp. 126-137, 1999.
    • (1999) IEEE Trans. Speech Audio Process , vol.7 , Issue.2 , pp. 126-137
    • Virag, N.1
  • 5
    • 56249136428 scopus 로고    scopus 로고
    • Transforming binary uncertainties for robust speech recognition
    • S. Srinivasan and D. L. Wang, "Transforming binary uncertainties for robust speech recognition", IEEE Trans Audio, Speech, Lang. Process., 15(7), pp. 2130-2140, 2007.
    • (2007) IEEE Trans Audio, Speech, Lang. Process , vol.15 , Issue.7 , pp. 2130-2140
    • Srinivasan, S.1    Wang, D.L.2
  • 7
    • 78049398950 scopus 로고    scopus 로고
    • Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring
    • C. Kim and R. M. Stern, "Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring", in Proc. ICASSP, pp. 4574- 4577, 2010.
    • (2010) Proc. ICASSP , pp. 4574-4577
    • Kim, C.1    Stern, R.M.2
  • 8
    • 84867613224 scopus 로고    scopus 로고
    • Fepstrum features: Design and application to conversational speech recognition
    • 11009
    • V. Tyagi, "Fepstrum features: Design and application to conversational speech recognition", IBM Research Report, 11009, 2011.
    • (2011) IBM Research Report
    • Tyagi, V.1
  • 9
    • 84867589420 scopus 로고    scopus 로고
    • Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
    • Japan
    • V. Mitra, H. Franco, M. Graciarena and A. Mandal, "Normalized amplitude modulation features for large vocabulary noise-robust speech recognition", in Proc. of ICASSP, pp. 4117-4120, Japan, 2012.
    • (2012) Proc. of ICASSP , pp. 4117-4120
    • Mitra, V.1    Franco, H.2    Graciarena, M.3    Mandal, A.4
  • 10
    • 0030638031 scopus 로고    scopus 로고
    • A post-processing system to yield reduced word error rates: Recognizer output voting error reduction (ROVER)
    • J. G. Fiscus, "A Post-Processing System to Yield Reduced Word Error Rates: Recognizer Output Voting Error Reduction. (ROVER), " Proc. of ASRU, pp. 347-354, 1997.
    • (1997) Proc. of ASRU , pp. 347-354
    • Fiscus, J.G.1
  • 11
    • 17344389852 scopus 로고    scopus 로고
    • Robust speech recognition in noisy environments: The 2001 IBM SPIN Eevaluation system
    • FL
    • B. Kingsbury, G. Saon, L. Mangu, M. Padmanabhan and R. Sarikaya, "Robust speech recognition in noisy environments: The 2001 IBM SPIN Eevaluation system", In Proc. of ICASSP, Vol.1, pp.I53-I56, FL, 2002.
    • (2002) Proc. of ICASSP , vol.1 , pp. I53-I56
    • Kingsbury, B.1    Saon, G.2    Mangu, L.3    Padmanabhan, M.4    Sarikaya, R.5
  • 12
    • 0036291381 scopus 로고    scopus 로고
    • Digit recognition in noisy environments via a sequential GMM/SVM system
    • FL
    • S. Fine, G. Saon, and R.A. Gopinath, "Digit recognition in noisy environments via a sequential GMM/SVM system", In Proc. of ICASSP, Vol.1, pp.I49-I52, FL, 2002.
    • (2002) Proc. of ICASSP , vol.1 , pp. I49-I52
    • Fine, S.1    Saon, G.2    Gopinath, R.A.3
  • 13
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • M. Cooke, P. Green, L. Josifovski and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data", Speech Comm., 34(3), pp.267-285, 2001.
    • (2001) Speech Comm , vol.34 , Issue.3 , pp. 267-285
    • Cooke, M.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 14
    • 85083953021 scopus 로고    scopus 로고
    • Feature learning in deep neural networks - Studies on speech recognition tasks
    • D. Yu, M. Seltzer, J. Li, J-T. Huang and Frank Seide, "Feature Learning in Deep Neural Networks - Studies on Speech Recognition Tasks", ICLR 2013.
    • (2013) ICLR
    • Yu, D.1    Seltzer, M.2    Li, J.3    Huang, J.-T.4    Seide, F.5
  • 15
    • 84858953286 scopus 로고    scopus 로고
    • Vocal tract length normalization for LVCSR
    • Carnegie Mellon University
    • P. Zhan and A Waibel, "Vocal tract length normalization for LVCSR, " in Tech. Rep. CMU-LTI-97-150. Carnegie Mellon University, 1997.
    • (1997) Tech. Rep. CMU-LTI-97-150
    • Zhan, P.1    Waibel, A.2
  • 17
    • 84890492030 scopus 로고    scopus 로고
    • An investigation of deep neural networks for noise robust speech recognition
    • M. Seltzer, D. Yu, and Y. Wang, "An Investigation Of Deep Neural Networks For Noise Robust Speech Recognition", Proc of ICASSP, 2013.
    • (2013) Proc of ICASSP
    • Seltzer, M.1    Yu, D.2    Wang, Y.3
  • 18
    • 84867605836 scopus 로고    scopus 로고
    • Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition
    • O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, "Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition, " Proc. of ICASSP, pp. 4277 -4280, 2012.
    • (2012) Proc. of ICASSP , pp. 4277-4280
    • Abdel-Hamid, O.1    Mohamed, A.2    Jiang, H.3    Penn, G.4
  • 20
    • 84906214784 scopus 로고    scopus 로고
    • Exploring convolutional neural network structures and optimization techniques for speech recognition
    • O. Abdel-Hamid, L. Deng and D. Yu, "Exploring Convolutional Neural Network Structures and Optimization Techniques for Speech Recognition, " Proc. of Interspeech, pp. 3366-3370, 2013.
    • (2013) Proc. of Interspeech , pp. 3366-3370
    • Abdel-Hamid, O.1    Deng, L.2    Yu, D.3
  • 22
    • 84906260861 scopus 로고    scopus 로고
    • Damped oscillator cepstral coefficients for robust speech recognition
    • V. Mitra, H. Franco and M. Graciarena, "Damped Oscillator Cepstral Coefficients for Robust Speech Recognition, " Proc. of Interspeech, pp. 886-890, 2013.
    • (2013) Proc. of Interspeech , pp. 886-890
    • Mitra, V.1    Franco, H.2    Graciarena, M.3
  • 23
    • 84867589420 scopus 로고    scopus 로고
    • Normalized amplitude modulation features for large vocabulary noise-robust speech recognition
    • V. Mitra, H. Franco, M. Graciarena, and A. Mandal, "Normalized Amplitude Modulation Features for Large Vocabulary Noise-Robust Speech Recognition, " Proc. of ICASSP, pp. 4117-4120, 2012.
    • (2012) Proc. of ICASSP , pp. 4117-4120
    • Mitra, V.1    Franco, H.2    Graciarena, M.3    Mandal, A.4
  • 24
    • 0028287770 scopus 로고
    • Effect of reducing slow temporal modulations on speech reception
    • R. Drullman, J. M. Festen and R. Plomp, "Effect of Reducing Slow Temporal Modulations on Speech Reception, " J. Acoust. Soc. of Am., Vol. 95, No. 5, pp. 2670-2680, 1994.
    • (1994) J. Acoust. Soc. of Am , vol.95 , Issue.5 , pp. 2670-2680
    • Drullman, R.1    Festen, J.M.2    Plomp, R.3
  • 25
    • 0034844903 scopus 로고    scopus 로고
    • On the upper cutoff frequency of auditory critical- band envelope detectors in the context of speech perception
    • O. Ghitza, "On the Upper Cutoff Frequency of Auditory Critical- Band Envelope Detectors in the Context of Speech Perception, " J. Acoust. Soc. of America, vol. 110, no. 3, pp. 1628-1640, 2001.
    • (2001) J. Acoust. Soc. of America , vol.110 , Issue.3 , pp. 1628-1640
    • Ghitza, O.1
  • 26
    • 0027676955 scopus 로고
    • Energy separation in signal modulations with application to speech analysis
    • P. Maragos, J. Kaiser and T. Quatieri, "Energy Separation in Signal Modulations with Application to Speech Analysis, " IEEE Trans. Signal Processing, Vol. 41, pp. 3024-3051, 1993.
    • (1993) IEEE Trans. Signal Processing , vol.41 , pp. 3024-3051
    • Maragos, P.1    Kaiser, J.2    Quatieri, T.3
  • 28
    • 0019075685 scopus 로고
    • Some observations on oral air flow during phonation
    • H. Teager, "Some Observations on Oral Air Flow During Phonation, " in IEEE Trans. ASSP, pp. 599-601, 1980.
    • (1980) IEEE Trans. ASSP , pp. 599-601
    • Teager, H.1
  • 29
    • 84905269267 scopus 로고    scopus 로고
    • Medium duration modulation cepstral feature for robust speech recognition
    • Florence
    • V. Mitra, H. Franco, M. Graciarena, D. Vergyri, "Medium duration modulation cepstral feature for robust speech recognition, " Proc. of ICASSP, Florence, 2014.
    • (2014) Proc. of ICASSP
    • Mitra, V.1    Franco, H.2    Graciarena, M.3    Vergyri, D.4
  • 32
    • 84890526837 scopus 로고    scopus 로고
    • New types of deep neural network learning for speech recognition and related applications: An overview
    • L. Deng, G. Hinton, and B. Kingsbury, "New types of deep neural network learning for speech recognition and related applications: An overview, " proc. of ICASSP, 2013.
    • (2013) Proc. of ICASSP
    • Deng, L.1    Hinton, G.2    Kingsbury, B.3
  • 33
    • 0021892216 scopus 로고
    • Speech enhancement using a minimum mean square error log-spectral amplitude estimator
    • Apr
    • Y. Ephraim and D. Malah, "Speech enhancement using a minimum mean square error log-spectral amplitude estimator, " IEEE Trans. on Acoust., Speech, Signal Processing, vol. ASSP- 33, no. 2, pp. 443-445, Apr. 1985.
    • (1985) IEEE Trans. on Acoust., Speech, Signal Processing , vol.ASSP-33 , Issue.2 , pp. 443-445
    • Ephraim, Y.1    Malah, D.2
  • 34
    • 51449089990 scopus 로고    scopus 로고
    • A Minimum-mean-square-error noise reduction algorithm on melfrequency cepstra for robust speech recognition
    • Las Vegas, NV
    • D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero, "A Minimum-mean-square-error noise reduction algorithm on melfrequency cepstra for robust speech recognition, " in Proc. of ICASSP, Las Vegas, NV, 2008.
    • (2008) Proc. of ICASSP
    • Yu, D.1    Deng, L.2    Droppo, J.3    Wu, J.4    Gong, Y.5    Acero, A.6


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.