메뉴 건너뛰기




Volumn 7, Issue 1, 1999, Pages 55-68

Auditory processing of speech signals for robust speech recognition in real-world noisy environments

Author keywords

Auditory model; Noise robustness; Speech recognition; Zero crossing

Indexed keywords

ACOUSTIC NOISE; ACOUSTIC SURFACE WAVE FILTERS; MATHEMATICAL MODELS; SPEECH ANALYSIS;

EID: 0032785783     PISSN: 10636676     EISSN: None     Source Type: Journal    
DOI: 10.1109/89.736331     Document Type: Article
Times cited : (209)

References (45)
  • 1
    • 0021124460 scopus 로고
    • Pitch and spectral estimation of speech based on auditory synchrony model
    • S. Seneff, "Pitch and spectral estimation of speech based on auditory synchrony model," in Proc. Int. Conf. Acoustics, Speech, Signal Processing. 1984, pp. 36.2.1-36.2.4.
    • (1984) Proc. Int. Conf. Acoustics, Speech, Signal Processing. , pp. 3621-3624
    • Seneff, S.1
  • 2
    • 84928837806 scopus 로고
    • "A joint synchrony/mean-rate model of auditory processing,"
    • _, "A joint synchrony/mean-rate model of auditory processing," J. Phnnel., vol. 16, pp. 55-76, 1988.
    • (1988) J. Phnnel. , vol.16 , pp. 55-76
  • 4
    • 0025041264 scopus 로고
    • "Perceptual linear predictive (PLP) analysis of speech,"
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, pp. 1738-1752, 1990.
    • (1990) J. Acoust. Soc. Amer. , vol.87 , pp. 1738-1752
    • Hermansky, H.1
  • 5
  • 7
    • 0028497508 scopus 로고
    • "Speech analysis and speech recognition using subband-autocorrelation analysis,"
    • S. Kajita and F. Itakura, "Speech analysis and speech recognition using subband-autocorrelation analysis," J. Acoust. Soc. Jpn., vol. 15, pp. 329-338, 1994.
    • (1994) J. Acoust. Soc. Jpn. , vol.15 , pp. 329-338
    • Kajita, S.1    Itakura, F.2
  • 9
    • 0023167345 scopus 로고
    • "Speech recognition using an auditory model with pitch-synchronous analysis,"
    • M. Hunt and C. Lefebvre, "Speech recognition using an auditory model with pitch-synchronous analysis," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1987, pp. 20.5.1-20.5.4.
    • (1987) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 2051-2054
    • Hunt, M.1    Lefebvre, C.2
  • 10
    • 0026626445 scopus 로고
    • "Auditory representations of acoustic signals,"
    • X. Yang, K. Wang, and S. A. Shamma, "Auditory representations of acoustic signals," IEEE Trans. Inform. Tlieory, vol. 38, pp. 824-839, 1992.
    • (1992) IEEE Trans. Inform. Tlieory , vol.38 , pp. 824-839
    • Yang, X.1    Wang, K.2    Shamma, S.A.3
  • 11
    • 0028462212 scopus 로고
    • "Self-normalization and noise-robustness in early auditory representations,"
    • K. Wang and S. A. Shamma, "Self-normalization and noise-robustness in early auditory representations," IEEE Trans. Speech Audio Processing, vol. 2, pp. 421-435, 1994.
    • (1994) IEEE Trans. Speech Audio Processing , vol.2 , pp. 421-435
    • Wang, K.1    Shamma, S.A.2
  • 12
    • 0027574303 scopus 로고
    • "An information theoretic investigation into the distribution of phonetic information across the auditory spectrogram,"
    • A. Morris, J. L. Schwartz, and P. Escudier, "An information theoretic investigation into the distribution of phonetic information across the auditory spectrogram," Comput. Speech Lang., vol. 2, pp. 121-136, 1993.
    • (1993) Comput. Speech Lang. , vol.2 , pp. 121-136
    • Morris, A.1    Schwartz, J.L.2    Escudier, P.3
  • 14
    • 0029765806 scopus 로고    scopus 로고
    • "Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments
    • Atlanta, GA, May
    • D.-S. Kirn, J.-H. Jeong, J.-W. Kim, and S.-Y. Lee, "Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Atlanta, GA, May 1996, pp. 61-64.
    • (1996) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 61-64
    • Kirn, D.-S.1    Jeong, J.-H.2    Kim, J.-W.3    Lee, S.-Y.4
  • 15
    • 0026400728 scopus 로고
    • "A time-domain digital cochlear model,"
    • J. M. Kates, "A time-domain digital cochlear model," IEEE Trans. Signal Processing, vol. 39, pp. 2573-2592, 1991.
    • (1991) IEEE Trans. Signal Processing , vol.39 , pp. 2573-2592
    • Kates, J.M.1
  • 17
    • 0025126556 scopus 로고
    • "A cochlear frequency-position function for several species-29 years later,"
    • D. Greenwood, "A cochlear frequency-position function for several species-29 years later," J. Acoust. Soc. Amer., vol. 87, pp. 2592-2650, 1990.
    • (1990) J. Acoust. Soc. Amer. , vol.87 , pp. 2592-2650
    • Greenwood, D.1
  • 18
    • 84928841665 scopus 로고
    • "Rate-place and temporal-place representations of vowels in the auditory nerve and anteroventral cochlear nucleus,"
    • M. B. Sachs, C. C. Blackburn, and E. D. Young, "Rate-place and temporal-place representations of vowels in the auditory nerve and anteroventral cochlear nucleus," J. Phonet., vol. 16, pp. 37-53, 1988.
    • (1988) J. Phonet. , vol.16 , pp. 37-53
    • Sachs, M.B.1    Blackburn, C.C.2    Young, E.D.3
  • 19
    • 0018617277 scopus 로고
    • "Encoding of steady state vowels in the auditory-nerve: Representation in terms of discharge rate,"
    • M. B. Sachs and E. D. Young, "Encoding of steady state vowels in the auditory-nerve: Representation in terms of discharge rate," J. Acoust. Soc. Amer., vol. 66, pp. 470-479, 1979.
    • (1979) J. Acoust. Soc. Amer. , vol.66 , pp. 470-479
    • Sachs, M.B.1    Young, E.D.2
  • 20
    • 0018606571 scopus 로고
    • "Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory nerve fibers,"
    • E. D. Young and M. B. Sachs, "Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory nerve fibers," J. Acoust. Soc. Amer., vol. 66, pp. 1381-1403, 1979.
    • (1979) J. Acoust. Soc. Amer. , vol.66 , pp. 1381-1403
    • Young, E.D.1    Sachs, M.B.2
  • 21
    • 0021403669 scopus 로고
    • "Speech coding in the auditory nerve: I,"
    • B. Delgutte and N. Y. S. Kiang, "Speech coding in the auditory nerve: I," J. Acoust. Soc. Amer., vol. 75, pp. 866-878, 1984.
    • (1984) J. Acoust. Soc. Amer. , vol.75 , pp. 866-878
    • Delgutte, B.1    Kiang, N.Y.S.2
  • 22
    • 84912495580 scopus 로고
    • "Analytical expressions for critical-band rate and critical bandwidth as a function of frequency,"
    • E. Zwicker and E. Terhart, "Analytical expressions for critical-band rate and critical bandwidth as a function of frequency," J. Acoust. Soc. Amer., vol. 68, pp. 1523-1525, 1980.
    • (1980) J. Acoust. Soc. Amer. , vol.68 , pp. 1523-1525
    • Zwicker, E.1    Terhart, E.2
  • 23
    • 0028312802 scopus 로고
    • "Auditory models and human performances in tasks related to speech coding and speech recognition,"
    • pt. II
    • O. Ghitza, "Auditory models and human performances in tasks related to speech coding and speech recognition," IEEE Trans. Speech Audio Processing, vol. 2, pt. II, pp. 115-132, 1994.
    • (1994) IEEE Trans. Speech Audio Processing , vol.2 , pp. 115-132
    • Ghitza, O.1
  • 25
    • 0026819492 scopus 로고
    • "Zero-crossing based spectral analysis and SVD spectral analysis for formant frequency estimation in noise,"
    • T. V. Sreenivas and R. J. Niederjohn, "Zero-crossing based spectral analysis and SVD spectral analysis for formant frequency estimation in noise," IEEE Trans. Signal Processing, vol. 40, pp. 282-293, 1992.
    • (1992) IEEE Trans. Signal Processing , vol.40 , pp. 282-293
    • Sreenivas, T.V.1    Niederjohn, R.J.2
  • 26
    • 0000030810 scopus 로고    scopus 로고
    • "Auditory nerve representation as a basis for speech processing
    • S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker
    • O. Ghitza, "Auditory nerve representation as a basis for speech processing," in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker, 1992, pp. 453-485.
    • Advances in Speech Signal Processing , vol.1992 , pp. 453-485
    • Ghitza, O.1
  • 27
    • 0022806994 scopus 로고
    • "Spectral analysis and discrimination by zero-crossings,"
    • Nov
    • B. Kedem, "Spectral analysis and discrimination by zero-crossings," Proc. IEEE, vol. 74, pp. 1477-1493, Nov. 1986.
    • (1986) Proc. IEEE , vol.74 , pp. 1477-1493
    • Kedem, B.1
  • 30
    • 0006671437 scopus 로고
    • "Intelligent judge neural network for speech recognition,"
    • D.-S. Kim and S.-Y. Lee, "Intelligent judge neural network for speech recognition," Neural Process. Lett., vol. I, pp. 17-20, 1994.
    • (1994) Neural Process. Lett. , vol.1 , pp. 17-20
    • Kim, D.-S.1    Lee, S.-Y.2
  • 31
    • 0342606430 scopus 로고    scopus 로고
    • "Voice command: A digital neuro-chip for robust speech recognition in real-world noisy environments (Invited talk),"
    • Hong Kong, Sept.
    • S.-Y. Lee et at., "Voice command: A digital neuro-chip for robust speech recognition in real-world noisy environments (Invited talk)," in Proc. Int. Conf. Neural Information Processing, Hong Kong, Sept. 1996, pp. 283-287.
    • (1996) Proc. Int. Conf. Neural Information Processing , pp. 283-287
    • Lee, S.-Y.1
  • 32
    • 0024610919 scopus 로고
    • "A tutorial on hidden Markov models and selected applications in speech recognition,"
    • L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, pp. 257-286, 1989.
    • (1989) Proc. IEEE , vol.77 , pp. 257-286
    • Rabiner, L.R.1
  • 33
    • 0027623210 scopus 로고
    • "Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems,"
    • A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, pp. 247-251, 1993.
    • (1993) Speech Commun. , vol.12 , pp. 247-251
    • Varga, A.1    Steeneken, H.2
  • 34
    • 0020816083 scopus 로고
    • "Suggested formula for calculating auditory-filter bandwidth and excitation patterns,"
    • B. C. J. Moore and B. R. Glasberg, "Suggested formula for calculating auditory-filter bandwidth and excitation patterns," J. Acoitst. Soc. Amer., vol. 74, pp. 750-753, 1983.
    • (1983) J. Acoitst. Soc. Amer. , vol.74 , pp. 750-753
    • Moore, B.C.J.1    Glasberg, B.R.2
  • 35
    • 0011872351 scopus 로고
    • "Time derivatives, cepstral normalization, and spectral parameter filtering for continuously spelled names over the telephone,"
    • J.-C. Junqua et al, "Time derivatives, cepstral normalization, and spectral parameter filtering for continuously spelled names over the telephone," in Proc, Europ. Conf. Speech Communication and Technology, 1995.
    • (1995) Proc, Europ. Conf. Speech Communication and Technology
    • Junqua, J.-C.1
  • 36
    • 85135377175 scopus 로고
    • "Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP),"
    • H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, "Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)," in Proc. Europ. Conf. Speech Communication and Technology, 1991, pp. 1367-1370.
    • (1991) Proc. Europ. Conf. Speech Communication and Technology , pp. 1367-1370
    • Hermansky, H.1    Morgan, N.2    Bayya, A.3    Kohn, P.4
  • 38
    • 0029239090 scopus 로고
    • "A comparative study of mel cepstra and EIH's for phone classification under adverse conditions,"
    • Detroit, MI
    • S. Sandhu and O. Ghitza, "A comparative study of mel cepstra and EIH's for phone classification under adverse conditions," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Detroit, MI, 1995, pp. 409-412.
    • (1995) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 409-412
    • Sandhu, S.1    Ghitza, O.2
  • 39
  • 40
    • 0027167185 scopus 로고
    • "A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition
    • K. Aikavva, H. Singer, H. Kawahara, and Y. Tohkura, "A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1993, vol. II, pp. 668-671.
    • (1993) Proc. Int. Conf. Acoustics, Speech, Signal Processing , vol.2 , pp. 668-671
    • Aikavva, K.1    Singer, H.2    Kawahara, H.3    Tohkura, Y.4
  • 41
    • 0031546280 scopus 로고    scopus 로고
    • "Auditory model for robust speech recognition in real world noisy environments,"
    • D.-S. Kim, S.-Y. Lee, R. M. Kil, and X. Zhu, "Auditory model for robust speech recognition in real world noisy environments," Electron. Lett., vol. 33, p. 12, 1997.
    • (1997) Electron. Lett. , vol.33 , pp. 12
    • Kim, D.-S.1    Lee, S.-Y.2    Kil, R.M.3    Zhu, X.4
  • 42
    • 0022667694 scopus 로고
    • "Speaker-independent isolated word recognition using dynamic features of speech spectrum,"
    • S. Furui, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 52-59, 1986.
    • (1986) IEEE Trans. Acoust., Speech, Signal Processing , vol.ASSP-34 , pp. 52-59
    • Furui, S.1
  • 43
    • 0343685350 scopus 로고
    • "Spectral dynamics for speech recognition under adverse conditions,"
    • C.-H. Lee, K. Paliwal, and F. Soong, Eds. Boston, MA: Kluwer
    • B. Hanson, T. Applebaum, and J.-C. Junqua, "Spectral dynamics for speech recognition under adverse conditions," in Advanced Topics in Automatic Speech and Speaker Recognition, C.-H. Lee, K. Paliwal, and F. Soong, Eds. Boston, MA: Kluwer, 1995.
    • (1995) Advanced Topics in Automatic Speech and Speaker Recognition
    • Hanson, B.1    Applebaum, T.2    Junqua, J.-C.3
  • 44
    • 0003234444 scopus 로고
    • "Multiple approaches to robust speech recognition,"
    • Harriman, NY
    • R. M. Stern et al, "Multiple approaches to robust speech recognition," in Proc. DARPA Speech V Natural Language Workshop, Harriman, NY, 1992, pp. 274-279.
    • (1992) Proc. DARPA Speech V Natural Language Workshop , pp. 274-279
    • Stern, R.M.1
  • 45


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.