SCOPUS 정보 검색 플랫폼

Journal of the Acoustical Society of America

Volumn 131, Issue 5, 2012, Pages 4134-4151

Spectro-temporal modulation subspace-spanning filter bank features for robust automatic speech recognition

(3) Schädler, Marc René a Meyer, Bernd T a Kollmeier, Birger a

a UNIVERSITY OF OLDENBURG (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

ADDITIVE NOISE; BANDPASS FILTERS; FEATURE EXTRACTION; FILTER BANKS; GABOR FILTERS; MODULATION; OBJECT RECOGNITION;

AUTOMATIC SPEECH RECOGNITION SYSTEM; CEPSTRAL MEAN SUBTRACTION; INTRINSIC VARIABILITIES; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; ROBUST AUTOMATIC SPEECH RECOGNITION; SPECTRO-TEMPORAL MODULATIONS; STATE-OF-THE-ART SYSTEM; TWO-DIMENSIONAL FILTERS;

SPEECH RECOGNITION;

ALGORITHM; ARTICLE; AUTOMATIC SPEECH RECOGNITION; NOISE; SOUND DETECTION; SPEECH; STANDARD;

ALGORITHMS; NOISE; SOUND SPECTROGRAPHY; SPEECH ACOUSTICS; SPEECH RECOGNITION SOFTWARE;

EID: 84863799482 PISSN: 00014966 EISSN: None Source Type: Journal
DOI: 10.1121/1.3699200 Document Type: Article

Times cited : (122)

References (37)

1
- 34547941599
- Automatic speech recognition and speech variability: A review
- Benzeghiba, M., De Mori, R., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V., and Wellekens, C. (2007). "Automatic speech recognition and speech variability: A review," Speech Commun. 49, 763-786. 10.1016/j.specom.2007.02.006
- (2007) Speech Commun. , vol.49 , pp. 763-786
- Benzeghiba, M.¹ De Mori, R.² Deroo, O.³ Dupont, S.⁴ Erbes, T.⁵ Jouvet, D.⁶ Fissore, L.⁷ Laface, P.⁸ Mertins, A.⁹ Ris, C.¹⁰ Rose, R.¹¹ Tyagi, V.¹² Wellekens, C.¹³

2
- 51449089975
- Localized spectro-temporal cepstral analysis of speech
- Bouvrie, J., Ezzat, T., and Poggio, T. (2008). "Localized spectro-temporal cepstral analysis of speech ", in Proceedings of ICASSP 2008, pp. 4733-4736.
- (2008) Proceedings of ICASSP 2008 , pp. 4733-4736
- Bouvrie, J.¹ Ezzat, T.² Poggio, T.³

3
- 23744508888
- Multiresolution spectrotemporal analysis of complex sounds
- Chi, T., Ru, P., and Shamma, S. (2005). "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am. 118, 887. 10.1121/1.1945807
- (2005) J. Acoust. Soc. Am. , vol.118 , pp. 887
- Chi, T.¹ Ru, P.² Shamma, S.³

4
- 0000747781
- New telephone speech corpora at CSLU
- Cole, R. A., Noel, M., Lander, T., and Durham, T. (1995). "New telephone speech corpora at CSLU," in Proceedings of Eurospeech 1995, p. 95.
- (1995) Proceedings of Eurospeech 1995 , pp. 95
- Cole, R.A.¹ Noel, M.² Lander, T.³ Durham, T.⁴

5
- 70450178921
- The Interspeech 2008 consonant challenge
- Cooke, M., and Scharenborg, O. (2008). "The Interspeech 2008 consonant challenge," in Proceedings of Interspeech 2008, pp. 1781-1784.
- (2008) Proceedings of Interspeech 2008 , pp. 1781-1784
- Cooke, M.¹ Scharenborg, O.²

6
- 0019053271
- Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences
- Davis, S., and Mermelstein, P. (1980). "Comparison of parametric representations for mono-syllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust. Speech Signal Process. 28, 357-366. 10.1109/TASSP.1980.1163420
- (1980) IEEE Trans. Acoust. Speech Signal Process. , vol.28 , pp. 357-366
- Davis, S.¹ Mermelstein, P.²

7
- 51449087857
- Hierarchical spectro-temporal features for robust speech recognition
- Domont, X., Heckmann, M., Joublin, F., and Goerick, C. (2008). "Hierarchical spectro-temporal features for robust speech recognition," in Proceedings of ICASSP 2008, pp. 4417-4420.
- (2008) Proceedings of ICASSP 2008 , pp. 4417-4420
- Domont, X.¹ Heckmann, M.² Joublin, F.³ Goerick, C.⁴

8
- 36049044257
- (Last visited February 27, 2012)
- Ellis, D. (2005). "PLP and RASTA (and MFCC, and inversion) in MATLAB," available at http://www.ee.columbia.edu/~dpwe/resources/matlab/rastamat/ (Last visited February 27, 2012).
- (2005) PLP and RASTA (and MFCC, and inversion) in MATLAB
- Ellis, D.¹

9
- 84987770945
- ETSI Standard 201 108 v1.1.3. It is available at the ETSI website
- ETSI Standard 201 108 v1.1.3 (2003). It is available at the ETSI website: http://www.etsi.org/WebSite/Technologies/DistributedSpeechRecognition.aspx.
- (2003)

10
- 34547552785
- AM-FM demodulation of spectrograms using localized 2D max-Gabor analysis
- Ezzat, T., Bouvrie, J., and Poggio, T. (2007a). "AM-FM demodulation of spectrograms using localized 2D max-Gabor analysis," in Proceedings of ICASSP 2007, Vol. 4, pp. 1061-1064.
- (2007) Proceedings of ICASSP 2007 , vol.4 , pp. 1061-1064
- Ezzat, T.¹ Bouvrie, J.² Poggio, T.³

11
- 67651044226
- Spectro-temporal analysis of speech using 2-D gabor filters
- Ezzat, T., Bouvrie, J., and Poggio, T. (2007b). "Spectro-temporal analysis of speech using 2-D gabor filters," in Proceedings of Interspeech 2007, pp. 506-509.
- (2007) Proceedings of Interspeech 2007 , pp. 506-509
- Ezzat, T.¹ Bouvrie, J.² Poggio, T.³

12
- 0024909979
- Some statistical issues in the comparison of speech recognition algorithms
- Gillick, L., and Cox, S. (1989). "Some statistical issues in the comparison of speech recognition algorithms," in Proceedings of ICASSP 1989, Vol. 1, pp. 532-535.
- (1989) Proceedings of ICASSP 1989 , vol.1 , pp. 532-535
- Gillick, L.¹ Cox, S.²

13
- 0026292099
- Word recognition with the feature finding neural network (FFNN)
- Gramss, T. (1991). "Word recognition with the feature finding neural network (FFNN)," in Proceedings of the International Workshop Neural Networks Signal Process., pp. 289-298.
- (1991) Proceedings of the International Workshop Neural Networks Signal Process. , pp. 289-298
- Gramss, T.¹

14
- 84867227177
- A closer look on hierarchical spectro-temporal features (HIST)
- Heckmann, M., Domont, X., Joublin, F., and Goerick, C. (2008). "A closer look on hierarchical spectro-temporal features (HIST)," in Proceedings of Interspeech 2008, pp. 894-897.
- (2008) Proceedings of Interspeech 2008 , pp. 894-897
- Heckmann, M.¹ Domont, X.² Joublin, F.³ Goerick, C.⁴

15
- 0025041264
- Perceptual linear predictive (PLP) analysis of speech
- Hermansky, H. (1990). "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Am. 87, 1738-1752. 10.1121/1.399423
- (1990) J. Acoust. Soc. Am. , vol.87 , pp. 1738-1752
- Hermansky, H.¹

16
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- Hermansky, H., Ellis, D. P. W., and Sharma, S. (2000). "Tandem connectionist feature extraction for conventional HMM systems," in Proceedings of ICASSP 2000, Vol. 3, pp. 1635-1638.
- (2000) Proceedings of ICASSP 2000 , vol.3 , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.W.² Sharma, S.³

17
- 0028517164
- RASTA processing of speech
- Hermansky, H., and Morgan, N. (1994). "RASTA processing of speech," IEEE Trans. Speech, Audio Process. 2, 578-589. 10.1109/89.326616
- (1994) IEEE Trans. Speech, Audio Process , vol.2 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

18
- 0032658253
- Temporal patterns (TRAPS) in ASR of noisy speech
- Hermansky, H., and Sharma, S. (1999). "Temporal patterns (TRAPS) in ASR of noisy speech," in Proceedings of ICASSP 1999, Vol. 1, pp. 289-292.
- (1999) Proceedings of ICASSP 1999 , vol.1 , pp. 289-292
- Hermansky, H.¹ Sharma, S.²

19
- 0032676337
- On the relative importance of various components of the modulation spectrum for automatic speech recognition
- Kanedera, N., Arai, T., Hermansky, H., and Pavel, M. (1999). "On the relative importance of various components of the modulation spectrum for automatic speech recognition," Speech Commun. 28, 43-55. 10.1016/S0167-6393(99)00002-3
- (1999) Speech Commun. , vol.28 , pp. 43-55
- Kanedera, N.¹ Arai, T.² Hermansky, H.³ Pavel, M.⁴

20
- 85009227802
- Localized spectro-temporal features for automatic speech recognition
- Kleinschmidt, M. (2003). "Localized spectro-temporal features for automatic speech recognition," in Proceedings of Eurospeech 2003, pp. 2573-2576.
- (2003) Proceedings of Eurospeech 2003 , pp. 2573-2576
- Kleinschmidt, M.¹

21
- 85009233038
- Improving word accuracy with Gabor feature extraction
- Kleinschmidt, M., and Gelbart, D. (2002). "Improving word accuracy with Gabor feature extraction," in Proceedings of Interspeech 2002, pp. 25-28.
- (2002) Proceedings of Interspeech 2002 , pp. 25-28
- Kleinschmidt, M.¹ Gelbart, D.²

22
- 0031187171
- Speech recognition by machines and humans
- Lippmann, R. (1997). "Speech recognition by machines and humans," Speech Commun. 22, 1-15. 10.1016/S0167-6393(97)00021-6
- (1997) Speech Commun. , vol.22 , pp. 1-15
- Lippmann, R.¹

23
- 38849119808
- Phoneme representation and classification in primary auditory cortex
- Mesgarani, N., David, S., Fritz, J., and Shamma, S. (2008). "Phoneme representation and classification in primary auditory cortex," J. Acoust. Soc. Am. 123, 899-909. 10.1121/1.2816572
- (2008) J. Acoust. Soc. Am. , vol.123 , pp. 899-909
- Mesgarani, N.¹ David, S.² Fritz, J.³ Shamma, S.⁴

24
- 34047272330
- Discrimination of speech from non-speech based on multiscale spectro-temporal modulations
- Mesgarani, N., Slaney, M., and Shamma, S. (2006). "Discrimination of speech from non-speech based on multiscale spectro-temporal modulations," IEEE Trans. Audio Speech Lang. Proc. 14, 920-930. 10.1109/TSA.2005.858055
- (2006) IEEE Trans. Audio Speech Lang. Proc. , vol.14 , pp. 920-930
- Mesgarani, N.¹ Slaney, M.² Shamma, S.³

25
- 79959816304
- A multistream multiresolution framework for phoneme recognition
- Mesgarani, N., Thomas, S., and Hermansky, H. (2010). "A multistream multiresolution framework for phoneme recognition," in Proceedings of Interspeech 2010, pp. 318-321.
- (2010) Proceedings of Interspeech 2010 , pp. 318-321
- Mesgarani, N.¹ Thomas, S.² Hermansky, H.³

26
- 79551679242
- Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
- Meyer, B. T., Brand, T., and Kollmeier, B. (2011b). "Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes," J. Acoust. Soc. Am. 129, 388-403. 10.1121/1.3514525
- (2011) J. Acoust. Soc. Am. , vol.129 , pp. 388-403
- Meyer, B.T.¹ Brand, T.² Kollmeier, B.³

27
- 79953659090
- Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition
- Meyer, B., and Kollmeier, B. (2011a). "Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition," Speech Commun. 53, 753-767. 10.1016/j.specom.2010.07.002
- (2011) Speech Commun. , vol.53 , pp. 753-767
- Meyer, B.¹ Kollmeier, B.²

28
- 84987754323
- Multiresolution spectrotemporal analysis of complex sounds
- Nemala, S. K., and Elhilali, M. (2010). "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Am. 127, 1817. 10.1121/1.3384192
- (2010) J. Acoust. Soc. Am. , vol.127 , pp. 1817
- Nemala, S.K.¹ Elhilali, M.²

29
- 84987702417
- The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
- Pearce, D., and Hirsch, H. (2000). "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proceedings of ICSLP 2000, Vol. 4, pp. 29-32.
- (2000) Proceedings of ICSLP 2000 , vol.4 , pp. 29-32
- Pearce, D.¹ Hirsch, H.²

30
- 0037824480
- Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition
- Qiu, A., Schreiner, C., and Escabi, M. (2003). "Gabor analysis of auditory midbrain receptive fields: Spectro-temporal and binaural composition," J. Neurophysiol. 90, 456-476. 10.1152/jn.00851.2002
- (2003) J. Neurophysiol. , vol.90 , pp. 456-476
- Qiu, A.¹ Schreiner, C.² Escabi, M.³

31
- 84906262880
- (Last visited February 27, 2012)
- Schädler, M. R. (2011). "Gabor filter bank (GBFB) feature extraction reference implementation in MATLAB," available at http://medi.uni-oldenburg.de/GBFB (Last visited February 27, 2012).
- (2011) Gabor filter bank (GBFB) feature extraction reference implementation in MATLAB
- Schädler, M.R.¹

32
- 84912114311
- Comparative experiments on large vocabulary speech recognition
- Schwarz, R., Anastasakos, T., Kubala, F., Makhoul, J., Nguyen, L., and Zavaliagkos, G. (1993). "Comparative experiments on large vocabulary speech recognition," in Proceedings of the Workshop on Human Language Technology, pp. 75-80.
- (1993) Proceedings of the Workshop on Human Language Technology , pp. 75-80
- Schwarz, R.¹ Anastasakos, T.² Kubala, F.³ Makhoul, J.⁴ Nguyen, L.⁵ Zavaliagkos, G.⁶

33
- 0027623210
- Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- Varga, A., and Steeneken, H. J. M. (1993). "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun. 12, 247-251. 10.1016/0167-6393(93)90095-3
- (1993) Speech Commun. , vol.12 , pp. 247-251
- Varga, A.¹ Steeneken, H.J.M.²

34
- 33745183789
- Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines
- Wesker, T., Meyer, B., Wagener, K., Anemüller, J., Mertins, A., and Kollmeier, B. (2005). "Oldenburg logatome speech corpus (OLLO) for speech recognition experiments with humans and machines," in Proceedings of Eurospeech/Interspeech 2005, pp. 1273-1276.
- (2005) Proceedings of Eurospeech/Interspeech 2005 , pp. 1273-1276
- Wesker, T.¹ Meyer, B.² Wagener, K.³ Anemüller, J.⁴ Mertins, A.⁵ Kollmeier, B.⁶

35
- 4544219816
- (Cambridge University Engineering Department, Cambridge, UK)
- Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., and Woodland, P. (2001). The HTK Book, version 3.1 (Cambridge University Engineering Department, Cambridge, UK), pp. 1-271.
- (2001) The HTK Book, Version 3.1 , pp. 1-271
- Young, S.¹ Kershaw, D.² Odell, J.³ Ollason, D.⁴ Valtchev, V.⁵ Woodland, P.⁶

36
- 84867220821
- Multi-stream spectro-temporal features for robust speech recognition
- Zhao, S., and Morgan, N. (2008). "Multi-stream spectro-temporal features for robust speech recognition," in Proceedings of Interspeech 2008, pp. 898-901.
- (2008) Proceedings of Interspeech 2008 , pp. 898-901
- Zhao, S.¹ Morgan, N.²

37
- 70450216114
- Multi-stream to many-stream: Using spectro-temporal features for ASR
- Zhao, S., Ravuri, S., and Morgan, N. (2009). "Multi-stream to many-stream: Using spectro-temporal features for ASR," in Proceedings of Interspeech 2009, pp. 2951-2954.
- (2009) Proceedings of Interspeech 2009 , pp. 2951-2954
- Zhao, S.¹ Ravuri, S.² Morgan, N.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.