메뉴 건너뛰기




Volumn 131, Issue 5, 2012, Pages 4134-4151

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition

Author keywords

[No Author keywords available]

Indexed keywords

ADDITIVE NOISE; BANDPASS FILTERS; FEATURE EXTRACTION; FILTER BANKS; GABOR FILTERS; MODULATION; OBJECT RECOGNITION;

EID: 84863799482     PISSN: 00014966     EISSN: None     Source Type: Journal    
DOI: 10.1121/1.3699200     Document Type: Article
Times cited : (122)

References (37)
  • 2
    • 51449089975 scopus 로고    scopus 로고
    • Localized spectro-temporal cepstral analysis of speech
    • Bouvrie, J., Ezzat, T., and Poggio, T. (2008). "Localized spectro-temporal cepstral analysis of speech ", in Proceedings of ICASSP 2008, pp. 4733-4736.
    • (2008) Proceedings of ICASSP 2008 , pp. 4733-4736
    • Bouvrie, J.1    Ezzat, T.2    Poggio, T.3
  • 3
    • 23744508888 scopus 로고    scopus 로고
    • Multiresolution spectrotemporal analysis of complex sounds
    • Chi, T., Ru, P., and Shamma, S. (2005). "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am. 118, 887. 10.1121/1.1945807
    • (2005) J. Acoust. Soc. Am. , vol.118 , pp. 887
    • Chi, T.1    Ru, P.2    Shamma, S.3
  • 6
    • 0019053271 scopus 로고
    • Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences
    • Davis, S., and Mermelstein, P. (1980). "Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust. Speech Signal Process. 28, 357-366. 10.1109/TASSP.1980.1163420
    • (1980) IEEE Trans. Acoust. Speech Signal Process. , vol.28 , pp. 357-366
    • Davis, S.1    Mermelstein, P.2
  • 7
    • 51449087857 scopus 로고    scopus 로고
    • Hierarchical spectro-temporal features for robust speech recognition
    • Domont, X., Heckmann, M., Joublin, F., and Goerick, C. (2008). "Hierarchical spectro-temporal features for robust speech recognition," in Proceedings of ICASSP 2008, pp. 4417-4420.
    • (2008) Proceedings of ICASSP 2008 , pp. 4417-4420
    • Domont, X.1    Heckmann, M.2    Joublin, F.3    Goerick, C.4
  • 9
    • 84987770945 scopus 로고    scopus 로고
    • ETSI Standard 201 108 v1.1.3. It is available at the ETSI website
    • ETSI Standard 201 108 v1.1.3 (2003). It is available at the ETSI website: http://www.etsi.org/WebSite/Technologies/DistributedSpeechRecognition.aspx.
    • (2003)
  • 10
    • 34547552785 scopus 로고    scopus 로고
    • AM-FM demodulation of spectrograms using localized 2D max-Gabor analysis
    • Ezzat, T., Bouvrie, J., and Poggio, T. (2007a). "AM-FM demodulation of spectrograms using localized 2D max-Gabor analysis," in Proceedings of ICASSP 2007, Vol. 4, pp. 1061-1064.
    • (2007) Proceedings of ICASSP 2007 , vol.4 , pp. 1061-1064
    • Ezzat, T.1    Bouvrie, J.2    Poggio, T.3
  • 11
    • 67651044226 scopus 로고    scopus 로고
    • Spectro-temporal analysis of speech using 2-D gabor filters
    • Ezzat, T., Bouvrie, J., and Poggio, T. (2007b). "Spectro-temporal analysis of speech using 2-D gabor filters," in Proceedings of Interspeech 2007, pp. 506-509.
    • (2007) Proceedings of Interspeech 2007 , pp. 506-509
    • Ezzat, T.1    Bouvrie, J.2    Poggio, T.3
  • 12
    • 0024909979 scopus 로고
    • Some statistical issues in the comparison of speech recognition algorithms
    • Gillick, L., and Cox, S. (1989). "Some statistical issues in the comparison of speech recognition algorithms," in Proceedings of ICASSP 1989, Vol. 1, pp. 532-535.
    • (1989) Proceedings of ICASSP 1989 , vol.1 , pp. 532-535
    • Gillick, L.1    Cox, S.2
  • 15
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • Hermansky, H. (1990). "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am. 87, 1738-1752. 10.1121/1.399423
    • (1990) J. Acoust. Soc. Am. , vol.87 , pp. 1738-1752
    • Hermansky, H.1
  • 16
    • 0033709098 scopus 로고    scopus 로고
    • Tandem connectionist feature extraction for conventional HMM systems
    • Hermansky, H., Ellis, D. P. W., and Sharma, S. (2000). "Tandem connectionist feature extraction for conventional HMM systems," in Proceedings of ICASSP 2000, Vol. 3, pp. 1635-1638.
    • (2000) Proceedings of ICASSP 2000 , vol.3 , pp. 1635-1638
    • Hermansky, H.1    Ellis, D.P.W.2    Sharma, S.3
  • 18
    • 0032658253 scopus 로고    scopus 로고
    • Temporal patterns (TRAPS) in ASR of noisy speech
    • Hermansky, H., and Sharma, S. (1999). "Temporal patterns (TRAPS) in ASR of noisy speech," in Proceedings of ICASSP 1999, Vol. 1, pp. 289-292.
    • (1999) Proceedings of ICASSP 1999 , vol.1 , pp. 289-292
    • Hermansky, H.1    Sharma, S.2
  • 19
    • 0032676337 scopus 로고    scopus 로고
    • On the relative importance of various components of the modulation spectrum for automatic speech recognition
    • Kanedera, N., Arai, T., Hermansky, H., and Pavel, M. (1999). "On the relative importance of various components of the modulation spectrum for automatic speech recognition," Speech Commun. 28, 43-55. 10.1016/S0167-6393(99)00002-3
    • (1999) Speech Commun. , vol.28 , pp. 43-55
    • Kanedera, N.1    Arai, T.2    Hermansky, H.3    Pavel, M.4
  • 20
    • 85009227802 scopus 로고    scopus 로고
    • Localized spectro-temporal features for automatic speech recognition
    • Kleinschmidt, M. (2003). "Localized spectro-temporal features for automatic speech recognition," in Proceedings of Eurospeech 2003, pp. 2573-2576.
    • (2003) Proceedings of Eurospeech 2003 , pp. 2573-2576
    • Kleinschmidt, M.1
  • 22
    • 0031187171 scopus 로고    scopus 로고
    • Speech recognition by machines and humans
    • Lippmann, R. (1997). "Speech recognition by machines and humans," Speech Commun. 22, 1-15. 10.1016/S0167-6393(97)00021-6
    • (1997) Speech Commun. , vol.22 , pp. 1-15
    • Lippmann, R.1
  • 23
    • 38849119808 scopus 로고    scopus 로고
    • Phoneme representation and classification in primary auditory cortex
    • Mesgarani, N., David, S., Fritz, J., and Shamma, S. (2008). "Phoneme representation and classification in primary auditory cortex," J. Acoust. Soc. Am. 123, 899-909. 10.1121/1.2816572
    • (2008) J. Acoust. Soc. Am. , vol.123 , pp. 899-909
    • Mesgarani, N.1    David, S.2    Fritz, J.3    Shamma, S.4
  • 24
    • 34047272330 scopus 로고    scopus 로고
    • Discrimination of speech from non-speech based on multiscale spectro-temporal modulations
    • Mesgarani, N., Slaney, M., and Shamma, S. (2006). "Discrimination of speech from non-speech based on multiscale spectro-temporal modulations," IEEE Trans. Audio Speech Lang. Proc. 14, 920-930. 10.1109/TSA.2005.858055
    • (2006) IEEE Trans. Audio Speech Lang. Proc. , vol.14 , pp. 920-930
    • Mesgarani, N.1    Slaney, M.2    Shamma, S.3
  • 25
  • 26
    • 79551679242 scopus 로고    scopus 로고
    • Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
    • Meyer, B. T., Brand, T., and Kollmeier, B. (2011b). "Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes," J. Acoust. Soc. Am. 129, 388-403. 10.1121/1.3514525
    • (2011) J. Acoust. Soc. Am. , vol.129 , pp. 388-403
    • Meyer, B.T.1    Brand, T.2    Kollmeier, B.3
  • 27
    • 79953659090 scopus 로고    scopus 로고
    • Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
    • Meyer, B., and Kollmeier, B. (2011a). "Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition," Speech Commun. 53, 753-767. 10.1016/j.specom.2010.07.002
    • (2011) Speech Commun. , vol.53 , pp. 753-767
    • Meyer, B.1    Kollmeier, B.2
  • 28
    • 84987754323 scopus 로고    scopus 로고
    • Multiresolution spectrotemporal analysis of complex sounds
    • Nemala, S. K., and Elhilali, M. (2010). "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am. 127, 1817. 10.1121/1.3384192
    • (2010) J. Acoust. Soc. Am. , vol.127 , pp. 1817
    • Nemala, S.K.1    Elhilali, M.2
  • 29
    • 84987702417 scopus 로고    scopus 로고
    • The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
    • Pearce, D., and Hirsch, H. (2000). "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proceedings of ICSLP 2000, Vol. 4, pp. 29-32.
    • (2000) Proceedings of ICSLP 2000 , vol.4 , pp. 29-32
    • Pearce, D.1    Hirsch, H.2
  • 30
    • 0037824480 scopus 로고    scopus 로고
    • Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition
    • Qiu, A., Schreiner, C., and Escabi, M. (2003). "Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition," J. Neurophysiol. 90, 456-476. 10.1152/jn.00851.2002
    • (2003) J. Neurophysiol. , vol.90 , pp. 456-476
    • Qiu, A.1    Schreiner, C.2    Escabi, M.3
  • 33
    • 0027623210 scopus 로고
    • Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
    • Varga, A., and Steeneken, H. J. M. (1993). "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun. 12, 247-251. 10.1016/0167-6393(93)90095-3
    • (1993) Speech Commun. , vol.12 , pp. 247-251
    • Varga, A.1    Steeneken, H.J.M.2
  • 36
    • 84867220821 scopus 로고    scopus 로고
    • Multi-stream spectro-temporal features for robust speech recognition
    • Zhao, S., and Morgan, N. (2008). "Multi-stream spectro-temporal features for robust speech recognition," in Proceedings of Interspeech 2008, pp. 898-901.
    • (2008) Proceedings of Interspeech 2008 , pp. 898-901
    • Zhao, S.1    Morgan, N.2
  • 37
    • 70450216114 scopus 로고    scopus 로고
    • Multi-stream to many-stream: Using spectro-temporal features for ASR
    • Zhao, S., Ravuri, S., and Morgan, N. (2009). "Multi-stream to many-stream: Using spectro-temporal features for ASR," in Proceedings of Interspeech 2009, pp. 2951-2954.
    • (2009) Proceedings of Interspeech 2009 , pp. 2951-2954
    • Zhao, S.1    Ravuri, S.2    Morgan, N.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.