메뉴 건너뛰기




Volumn 50, Issue 10, 2008, Pages 797-809

Combined speech enhancement and auditory modelling for robust distributed speech recognition

Author keywords

Auditory front end; Robust speech recognition; Speech enhancement

Indexed keywords

ADDITIVE NOISE; ALGORITHMS; CONVOLUTION; DATA COMPRESSION; DATABASE SYSTEMS; FEATURE EXTRACTION; FOOD ADDITIVES; LITHIUM; REMELTING; SPEECH ANALYSIS; SPEECH COMMUNICATION; SPEECH ENHANCEMENT; SPEECH PROCESSING; SPEECH RECOGNITION; SPEECH TRANSMISSION; SPURIOUS SIGNAL NOISE; SULFATE MINERALS;

EID: 52949093125     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2008.05.004     Document Type: Article
Times cited : (19)

References (31)
  • 1
    • 52949124909 scopus 로고    scopus 로고
    • Agarwal, A., Cheng, Y.M., 1999. Two-stage Mel-warped Wiener filter for robust speech recognition. In: Proc. Automatic Speech Recognition and Understanding Workshop, Keystone, Colorado, USA, pp. 67-70.
    • Agarwal, A., Cheng, Y.M., 1999. Two-stage Mel-warped Wiener filter for robust speech recognition. In: Proc. Automatic Speech Recognition and Understanding Workshop, Keystone, Colorado, USA, pp. 67-70.
  • 2
    • 0029952425 scopus 로고    scopus 로고
    • A quantitative model of the 'effective' signal processing in the auditory system: I. Model structure
    • Dau T., Püschel D., and Kohlrausch D. A quantitative model of the 'effective' signal processing in the auditory system: I. Model structure. J. Acoust. Soc. Amer. 99 6 (1996) 3615-3622
    • (1996) J. Acoust. Soc. Amer. , vol.99 , Issue.6 , pp. 3615-3622
    • Dau, T.1    Püschel, D.2    Kohlrausch, D.3
  • 3
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllable word recognition in continuous spoken sentences
    • Davis S., and Mermelstein P. Comparison of parametric representations for monosyllable word recognition in continuous spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28 4 (1980) 357-366
    • (1980) IEEE Trans. Acoust. Speech Signal Process. , vol.28 , Issue.4 , pp. 357-366
    • Davis, S.1    Mermelstein, P.2
  • 5
    • 0021645331 scopus 로고
    • Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator
    • Ephraim Y., and Malah D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Process. ASSP-32 6 (1984) 1109-1121
    • (1984) IEEE Trans. Acoust. Speech Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
    • Ephraim, Y.1    Malah, D.2
  • 6
    • 52949106557 scopus 로고    scopus 로고
    • ETSI ES 201 108 Ver. 1.1.3, 2003. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; front-end feature extraction algorithm; compression algorithms.
    • ETSI ES 201 108 Ver. 1.1.3, 2003. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; front-end feature extraction algorithm; compression algorithms.
  • 7
    • 52949145356 scopus 로고    scopus 로고
    • ETSI ES 202 050 Ver. 1.1.5, 2007. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms.
    • ETSI ES 202 050 Ver. 1.1.5, 2007. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms.
  • 8
    • 52949128833 scopus 로고    scopus 로고
    • Flynn, R., Jones, E., 2006. A comparative study of auditory-based front-ends for robust speech recognition using the Aurora 2 database. In: Proc. IET Irish Signals and Systems Conf., Dublin, Ireland, pp. 111-116.
    • Flynn, R., Jones, E., 2006. A comparative study of auditory-based front-ends for robust speech recognition using the Aurora 2 database. In: Proc. IET Irish Signals and Systems Conf., Dublin, Ireland, pp. 111-116.
  • 9
    • 84928838192 scopus 로고
    • Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment
    • Ghitza O. Temporal non-place information in the auditory-nerve firing patterns as a front-end for speech recognition in a noisy environment. J. Phonetics 16 (1988) 109-123
    • (1988) J. Phonetics , vol.16 , pp. 109-123
    • Ghitza, O.1
  • 10
    • 0025041264 scopus 로고
    • Perceptual linear prediction (PLP) analysis of speech
    • Hermansky H. Perceptual linear prediction (PLP) analysis of speech. J. Acoust. Soc. Amer. 87 4 (1990) 1738-1752
    • (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 11
    • 52949104135 scopus 로고    scopus 로고
    • Hirsch, H.G., Pearce, D., 2000. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. ISCA ITRW ASR2000, Paris, France, pp. 181-188.
    • Hirsch, H.G., Pearce, D., 2000. The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions. In: Proc. ISCA ITRW ASR2000, Paris, France, pp. 181-188.
  • 12
    • 52949103776 scopus 로고    scopus 로고
    • HTK Speech Recognition Toolkit. Available from: (accessed April 2008)
    • HTK Speech Recognition Toolkit. Available from: (accessed April 2008)
  • 13
    • 35248891610 scopus 로고    scopus 로고
    • A comparative intelligibility study of single-microphone noise reduction algorithms
    • Hu Y., and Loizou P. A comparative intelligibility study of single-microphone noise reduction algorithms. J. Acoust. Soc. Amer. 122 3 (2007) 1777-1786
    • (2007) J. Acoust. Soc. Amer. , vol.122 , Issue.3 , pp. 1777-1786
    • Hu, Y.1    Loizou, P.2
  • 14
    • 52949108548 scopus 로고    scopus 로고
    • ITU-T Rec. P.862, 2001. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs.
    • ITU-T Rec. P.862, 2001. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs.
  • 15
    • 0034824912 scopus 로고    scopus 로고
    • Combining speech enhancement and auditory feature extraction for robust speech recognition
    • Kleinschmidt M., Tchorz J., and Kollmeier B. Combining speech enhancement and auditory feature extraction for robust speech recognition. Speech Comm. 34 (2001) 75-91
    • (2001) Speech Comm. , vol.34 , pp. 75-91
    • Kleinschmidt, M.1    Tchorz, J.2    Kollmeier, B.3
  • 16
    • 85009154411 scopus 로고    scopus 로고
    • Li, Q., Soong, F.K., Siohan, O., 2000. A high-performance auditory feature for robust speech recognition. In: Proc. 6th Internat. Conf. on Spoken Language Processing (ICSLP), Vol. III, pp. 51-54.
    • Li, Q., Soong, F.K., Siohan, O., 2000. A high-performance auditory feature for robust speech recognition. In: Proc. 6th Internat. Conf. on Spoken Language Processing (ICSLP), Vol. III, pp. 51-54.
  • 17
    • 85009115888 scopus 로고    scopus 로고
    • Li, Q., Soong, F.K., Siohan, O., 2001. An auditory system-based feature for robust speech recognition. In: Proc. Eurospeech, 2001, Vol. 1, pp. 619-622.
    • Li, Q., Soong, F.K., Siohan, O., 2001. An auditory system-based feature for robust speech recognition. In: Proc. Eurospeech, 2001, Vol. 1, pp. 619-622.
  • 18
    • 4544303834 scopus 로고    scopus 로고
    • Li, J., Liu, B., Wang, R., Dai, L., 2004. A complexity reduction of ETSI advanced front-end for DSR. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'04), Vol. 1, pp. 61-64.
    • Li, J., Liu, B., Wang, R., Dai, L., 2004. A complexity reduction of ETSI advanced front-end for DSR. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'04), Vol. 1, pp. 61-64.
  • 19
    • 0034848706 scopus 로고    scopus 로고
    • Macho, D., Cheng, Y.M., 2001. SNR-dependent waveform processing for robust speech recognition. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'01), pp. 305-308.
    • Macho, D., Cheng, Y.M., 2001. SNR-dependent waveform processing for robust speech recognition. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'01), pp. 305-308.
  • 20
    • 85009242725 scopus 로고    scopus 로고
    • Macho, D. et al., 2002. Evaluation of a noise-robust DSR front-end on Aurora databases. In: Proc. Internat. Conf. on Speech and Language Processing, Denver, Colorado, USA, pp. 17-20.
    • Macho, D. et al., 2002. Evaluation of a noise-robust DSR front-end on Aurora databases. In: Proc. Internat. Conf. on Speech and Language Processing, Denver, Colorado, USA, pp. 17-20.
  • 21
    • 0742272653 scopus 로고    scopus 로고
    • Discriminative auditory-based features for robust speech recognition
    • Mak B., Tam Y., and Li P. Discriminative auditory-based features for robust speech recognition. IEEE Trans. Speech Audio Process. 12 1 (2004) 27-36
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.1 , pp. 27-36
    • Mak, B.1    Tam, Y.2    Li, P.3
  • 22
    • 52949083282 scopus 로고    scopus 로고
    • Martin, R., 1994. Spectral subtraction based on minimum statistics. In: Proc. Eur. Signal Processing Conf., pp. 1182-1185.
    • Martin, R., 1994. Spectral subtraction based on minimum statistics. In: Proc. Eur. Signal Processing Conf., pp. 1182-1185.
  • 23
    • 0035396555 scopus 로고    scopus 로고
    • Noise power spectral density estimation based on optimal smoothing and minimum statistics
    • Martin R. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Trans. Speech Audio Process. 9 5 (2001) 504-512
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.5 , pp. 504-512
    • Martin, R.1
  • 24
    • 52949130718 scopus 로고    scopus 로고
    • Mauuary, L., 1998. Blind Equalization in the cepstral domain for robust telephone based speech recognition. In: Proc. EUSPICO'98, Vol. 1, pp. 359-363.
    • Mauuary, L., 1998. Blind Equalization in the cepstral domain for robust telephone based speech recognition. In: Proc. EUSPICO'98, Vol. 1, pp. 359-363.
  • 25
    • 0036293863 scopus 로고    scopus 로고
    • Milner, B., 2002. A comparison of front-end configurations for robust speech recognition. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'02), Vol. I, pp. 797-800.
    • Milner, B., 2002. A comparison of front-end configurations for robust speech recognition. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'02), Vol. I, pp. 797-800.
  • 27
    • 29444448046 scopus 로고    scopus 로고
    • A noise-estimation algorithm for highly non-stationary environments
    • Rangachari S., and Loizou P. A noise-estimation algorithm for highly non-stationary environments. Speech Comm. 48 (2006) 220-231
    • (2006) Speech Comm. , vol.48 , pp. 220-231
    • Rangachari, S.1    Loizou, P.2
  • 28
    • 84928837806 scopus 로고
    • A joint synchrony/mean-rate model of auditory speech processing
    • Seneff S. A joint synchrony/mean-rate model of auditory speech processing. J. Phonetics 16 (1988) 55-76
    • (1988) J. Phonetics , vol.16 , pp. 55-76
    • Seneff, S.1
  • 29
    • 0032828464 scopus 로고    scopus 로고
    • A model of auditory perception as front end for automatic speech recognition
    • Tchorz J., and Kollmeier B. A model of auditory perception as front end for automatic speech recognition. J. Acoust. Soc. Amer. 106 4 (1999) 2040-2050
    • (1999) J. Acoust. Soc. Amer. , vol.106 , Issue.4 , pp. 2040-2050
    • Tchorz, J.1    Kollmeier, B.2
  • 30
    • 34547500379 scopus 로고    scopus 로고
    • Tsontzos, G., Diakoloukas, V., Koniaris, C., Digalakis, V., 2007. Estimation of general identifiable linear dynamic models with an application in speech recognition. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'07), Vol. IV, pp. 453-456.
    • Tsontzos, G., Diakoloukas, V., Koniaris, C., Digalakis, V., 2007. Estimation of general identifiable linear dynamic models with an application in speech recognition. In: Proc. IEEE Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP'07), Vol. IV, pp. 453-456.
  • 31
    • 17644398420 scopus 로고    scopus 로고
    • Speech enhancement for personal communication using an adaptive gain equalizer
    • Westerlund N., Dahl M., and Claesson I. Speech enhancement for personal communication using an adaptive gain equalizer. Speech Comm. 85 (2005) 1089-1101
    • (2005) Speech Comm. , vol.85 , pp. 1089-1101
    • Westerlund, N.1    Dahl, M.2    Claesson, I.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.