SCOPUS 정보 검색 플랫폼

IEEE Transactions on Speech and Audio Processing

Volumn 7, Issue 1, 1999, Pages 55-68

Auditory processing of speech signals for robust speech recognition in real-world noisy environments

(3) Kim, Doh Suk a,c Lee, Soo Young b Kil, Rhee M c

a LUCENT TECHNOLOGIES (United States)

b Samsung Advanced Institute of Technology (SAIT) (South Korea)

c Korea Advanced Institute of Science and Technology (KAIST) (South Korea)

Author keywords

Auditory model; Noise robustness; Speech recognition; Zero crossing

Indexed keywords

ACOUSTIC NOISE; ACOUSTIC SURFACE WAVE FILTERS; MATHEMATICAL MODELS; SPEECH ANALYSIS;

AUTOMATIC SPEECH RECOGNITION (ASR);

SPEECH RECOGNITION;

EID: 0032785783 PISSN: 10636676 EISSN: None Source Type: Journal
DOI: 10.1109/89.736331 Document Type: Article

Times cited : (209)

References (45)

1
- 0021124460
- Pitch and spectral estimation of speech based on auditory synchrony model
- S. Seneff, "Pitch and spectral estimation of speech based on auditory synchrony model," in Proc. Int. Conf. Acoustics, Speech, Signal Processing. 1984, pp. 36.2.1-36.2.4.
- (1984) Proc. Int. Conf. Acoustics, Speech, Signal Processing. , pp. 3621-3624
- Seneff, S.¹

2
- 84928837806
- "A joint synchrony/mean-rate model of auditory processing,"
- _, "A joint synchrony/mean-rate model of auditory processing," J. Phnnel., vol. 16, pp. 55-76, 1988.
- (1988) J. Phnnel. , vol.16 , pp. 55-76

3
- 0022873929
- "Speech recognition using a cochlear model,"
- M. Hunt and C. Lefebvre, "Speech recognition using a cochlear model," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1986, pp. 37.7.1-37.7.4.
- (1986) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 3771-3774
- Hunt, M.¹ Lefebvre, C.²

4
- 0025041264
- "Perceptual linear predictive (PLP) analysis of speech,"
- H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, pp. 1738-1752, 1990.
- (1990) J. Acoust. Soc. Amer. , vol.87 , pp. 1738-1752
- Hermansky, H.¹

5
- 0003071809
- "Evaluation and optimization of perceptually-based ASR front-end,"
- J. C. Junqua, H. Wakita, and H. Hermansky, "Evaluation and optimization of perceptually-based ASR front-end," IEEE Trans. Speech Audio Processing, vol. 1, pp. 329-338, 1993.
- (1993) IEEE Trans. Speech Audio Processing , vol.1 , pp. 329-338
- Junqua, J.C.¹ Wakita, H.² Hermansky, H.³

6
- 4143055851
- Ph.D. dissertation, Univ. Nancy I, Nancy, France
- J. C. Junqua, 'Toward robustness in isolated-word automatic speech recognition," Ph.D. dissertation, Univ. Nancy I, Nancy, France, 1989.
- (1989) Toward Robustness in Isolated-word Automatic Speech Recognition
- Junqua, J.C.¹

7
- 0028497508
- "Speech analysis and speech recognition using subband-autocorrelation analysis,"
- S. Kajita and F. Itakura, "Speech analysis and speech recognition using subband-autocorrelation analysis," J. Acoust. Soc. Jpn., vol. 15, pp. 329-338, 1994.
- (1994) J. Acoust. Soc. Jpn. , vol.15 , pp. 329-338
- Kajita, S.¹ Itakura, F.²

8
- 0028996914
- "Robust feature extraction using SBCOR analysis,"
- _, "Robust feature extraction using SBCOR analysis," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1995, pp. 421-424.
- (1995) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 421-424

9
- 0023167345
- "Speech recognition using an auditory model with pitch-synchronous analysis,"
- M. Hunt and C. Lefebvre, "Speech recognition using an auditory model with pitch-synchronous analysis," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1987, pp. 20.5.1-20.5.4.
- (1987) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 2051-2054
- Hunt, M.¹ Lefebvre, C.²

10
- 0026626445
- "Auditory representations of acoustic signals,"
- X. Yang, K. Wang, and S. A. Shamma, "Auditory representations of acoustic signals," IEEE Trans. Inform. Tlieory, vol. 38, pp. 824-839, 1992.
- (1992) IEEE Trans. Inform. Tlieory , vol.38 , pp. 824-839
- Yang, X.¹ Wang, K.² Shamma, S.A.³

11
- 0028462212
- "Self-normalization and noise-robustness in early auditory representations,"
- K. Wang and S. A. Shamma, "Self-normalization and noise-robustness in early auditory representations," IEEE Trans. Speech Audio Processing, vol. 2, pp. 421-435, 1994.
- (1994) IEEE Trans. Speech Audio Processing , vol.2 , pp. 421-435
- Wang, K.¹ Shamma, S.A.²

12
- 0027574303
- "An information theoretic investigation into the distribution of phonetic information across the auditory spectrogram,"
- A. Morris, J. L. Schwartz, and P. Escudier, "An information theoretic investigation into the distribution of phonetic information across the auditory spectrogram," Comput. Speech Lang., vol. 2, pp. 121-136, 1993.
- (1993) Comput. Speech Lang. , vol.2 , pp. 121-136
- Morris, A.¹ Schwartz, J.L.² Escudier, P.³

13
- 84936526542
- Cambridge, MA: MIT Press
- S. Handel, Listening: An Introduction to the Perception of Auditory Events. Cambridge, MA: MIT Press, 1993.
- (1993) Listening: an Introduction to the Perception of Auditory Events.
- Handel, S.¹

14
- 0029765806
- "Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments
- Atlanta, GA, May
- D.-S. Kirn, J.-H. Jeong, J.-W. Kim, and S.-Y. Lee, "Feature extraction based on zero-crossings with peak amplitudes for robust speech recognition in noisy environments," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Atlanta, GA, May 1996, pp. 61-64.
- (1996) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 61-64
- Kirn, D.-S.¹ Jeong, J.-H.² Kim, J.-W.³ Lee, S.-Y.⁴

15
- 0026400728
- "A time-domain digital cochlear model,"
- J. M. Kates, "A time-domain digital cochlear model," IEEE Trans. Signal Processing, vol. 39, pp. 2573-2592, 1991.
- (1991) IEEE Trans. Signal Processing , vol.39 , pp. 2573-2592
- Kates, J.M.¹

16
- 0024048578
- "An analog electronic cochlea,"
- R. F. Lyon and C. Mead, "An analog electronic cochlea," IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 1119-1134, 1988.
- (1988) IEEE Trans. Acoust., Speech, Signal Processing , vol.36 , pp. 1119-1134
- Lyon, R.F.¹ Mead, C.²

17
- 0025126556
- "A cochlear frequency-position function for several species-29 years later,"
- D. Greenwood, "A cochlear frequency-position function for several species-29 years later," J. Acoust. Soc. Amer., vol. 87, pp. 2592-2650, 1990.
- (1990) J. Acoust. Soc. Amer. , vol.87 , pp. 2592-2650
- Greenwood, D.¹

18
- 84928841665
- "Rate-place and temporal-place representations of vowels in the auditory nerve and anteroventral cochlear nucleus,"
- M. B. Sachs, C. C. Blackburn, and E. D. Young, "Rate-place and temporal-place representations of vowels in the auditory nerve and anteroventral cochlear nucleus," J. Phonet., vol. 16, pp. 37-53, 1988.
- (1988) J. Phonet. , vol.16 , pp. 37-53
- Sachs, M.B.¹ Blackburn, C.C.² Young, E.D.³

19
- 0018617277
- "Encoding of steady state vowels in the auditory-nerve: Representation in terms of discharge rate,"
- M. B. Sachs and E. D. Young, "Encoding of steady state vowels in the auditory-nerve: Representation in terms of discharge rate," J. Acoust. Soc. Amer., vol. 66, pp. 470-479, 1979.
- (1979) J. Acoust. Soc. Amer. , vol.66 , pp. 470-479
- Sachs, M.B.¹ Young, E.D.²

20
- 0018606571
- "Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory nerve fibers,"
- E. D. Young and M. B. Sachs, "Representation of steady-state vowels in the temporal aspects of the discharge patterns of populations of auditory nerve fibers," J. Acoust. Soc. Amer., vol. 66, pp. 1381-1403, 1979.
- (1979) J. Acoust. Soc. Amer. , vol.66 , pp. 1381-1403
- Young, E.D.¹ Sachs, M.B.²

21
- 0021403669
- "Speech coding in the auditory nerve: I,"
- B. Delgutte and N. Y. S. Kiang, "Speech coding in the auditory nerve: I," J. Acoust. Soc. Amer., vol. 75, pp. 866-878, 1984.
- (1984) J. Acoust. Soc. Amer. , vol.75 , pp. 866-878
- Delgutte, B.¹ Kiang, N.Y.S.²

22
- 84912495580
- "Analytical expressions for critical-band rate and critical bandwidth as a function of frequency,"
- E. Zwicker and E. Terhart, "Analytical expressions for critical-band rate and critical bandwidth as a function of frequency," J. Acoust. Soc. Amer., vol. 68, pp. 1523-1525, 1980.
- (1980) J. Acoust. Soc. Amer. , vol.68 , pp. 1523-1525
- Zwicker, E.¹ Terhart, E.²

23
- 0028312802
- "Auditory models and human performances in tasks related to speech coding and speech recognition,"
- pt. II
- O. Ghitza, "Auditory models and human performances in tasks related to speech coding and speech recognition," IEEE Trans. Speech Audio Processing, vol. 2, pt. II, pp. 115-132, 1994.
- (1994) IEEE Trans. Speech Audio Processing , vol.2 , pp. 115-132
- Ghitza, O.¹

24
- 0022667695
- "A zero crossing-based spectrum analyzer,"
- Feb.
- S. M. Kay and R. Sudhaker, "A zero crossing-based spectrum analyzer," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 96-104, Feb. 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Processing , vol.ASSP-34 , pp. 96-104
- Kay, S.M.¹ Sudhaker, R.²

25
- 0026819492
- "Zero-crossing based spectral analysis and SVD spectral analysis for formant frequency estimation in noise,"
- T. V. Sreenivas and R. J. Niederjohn, "Zero-crossing based spectral analysis and SVD spectral analysis for formant frequency estimation in noise," IEEE Trans. Signal Processing, vol. 40, pp. 282-293, 1992.
- (1992) IEEE Trans. Signal Processing , vol.40 , pp. 282-293
- Sreenivas, T.V.¹ Niederjohn, R.J.²

26
- 0000030810
- "Auditory nerve representation as a basis for speech processing
- S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker
- O. Ghitza, "Auditory nerve representation as a basis for speech processing," in Advances in Speech Signal Processing, S. Furui and M. M. Sondhi, Eds. New York: Marcel Dekker, 1992, pp. 453-485.
- Advances in Speech Signal Processing , vol.1992 , pp. 453-485
- Ghitza, O.¹

27
- 0022806994
- "Spectral analysis and discrimination by zero-crossings,"
- Nov
- B. Kedem, "Spectral analysis and discrimination by zero-crossings," Proc. IEEE, vol. 74, pp. 1477-1493, Nov. 1986.
- (1986) Proc. IEEE , vol.74 , pp. 1477-1493
- Kedem, B.¹

28
- 33646924923
- "A Korean speech database for use in automatic translation,"
- in Korean.
- I. J. Choi et al, "A Korean speech database for use in automatic translation," in Proc. llth Workshop on Speech Communication and Signal Processing, 1994, pp. 287-290, in Korean.
- (1994) Proc. Llth Workshop on Speech Communication and Signal Processing , pp. 287-290
- Choi, I.J.¹

29
- 0019220773
- "State constrained dynamic programming (SCDP) for discrete utterance recognition,"
- H. F. Silverman and N. R. Dixon, "State constrained dynamic programming (SCDP) for discrete utterance recognition," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1980, pp. 169-172.
- (1980) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 169-172
- Silverman, H.F.¹ Dixon, N.R.²

30
- 0006671437
- "Intelligent judge neural network for speech recognition,"
- D.-S. Kim and S.-Y. Lee, "Intelligent judge neural network for speech recognition," Neural Process. Lett., vol. I, pp. 17-20, 1994.
- (1994) Neural Process. Lett. , vol.1 , pp. 17-20
- Kim, D.-S.¹ Lee, S.-Y.²

31
- 0342606430
- "Voice command: A digital neuro-chip for robust speech recognition in real-world noisy environments (Invited talk),"
- Hong Kong, Sept.
- S.-Y. Lee et at., "Voice command: A digital neuro-chip for robust speech recognition in real-world noisy environments (Invited talk)," in Proc. Int. Conf. Neural Information Processing, Hong Kong, Sept. 1996, pp. 283-287.
- (1996) Proc. Int. Conf. Neural Information Processing , pp. 283-287
- Lee, S.-Y.¹

32
- 0024610919
- "A tutorial on hidden Markov models and selected applications in speech recognition,"
- L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, pp. 257-286, 1989.
- (1989) Proc. IEEE , vol.77 , pp. 257-286
- Rabiner, L.R.¹

33
- 0027623210
- "Assessment for automatic speech recognition: II. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems,"
- A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, pp. 247-251, 1993.
- (1993) Speech Commun. , vol.12 , pp. 247-251
- Varga, A.¹ Steeneken, H.²

34
- 0020816083
- "Suggested formula for calculating auditory-filter bandwidth and excitation patterns,"
- B. C. J. Moore and B. R. Glasberg, "Suggested formula for calculating auditory-filter bandwidth and excitation patterns," J. Acoitst. Soc. Amer., vol. 74, pp. 750-753, 1983.
- (1983) J. Acoitst. Soc. Amer. , vol.74 , pp. 750-753
- Moore, B.C.J.¹ Glasberg, B.R.²

35
- 0011872351
- "Time derivatives, cepstral normalization, and spectral parameter filtering for continuously spelled names over the telephone,"
- J.-C. Junqua et al, "Time derivatives, cepstral normalization, and spectral parameter filtering for continuously spelled names over the telephone," in Proc, Europ. Conf. Speech Communication and Technology, 1995.
- (1995) Proc, Europ. Conf. Speech Communication and Technology
- Junqua, J.-C.¹

36
- 85135377175
- "Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP),"
- H. Hermansky, N. Morgan, A. Bayya, and P. Kohn, "Compensation for the effect of the communication channel in auditory-like analysis of speech (RASTA-PLP)," in Proc. Europ. Conf. Speech Communication and Technology, 1991, pp. 1367-1370.
- (1991) Proc. Europ. Conf. Speech Communication and Technology , pp. 1367-1370
- Hermansky, H.¹ Morgan, N.² Bayya, A.³ Kohn, P.⁴

37
- 0028996922
- "Speech enhancement based on temporal processing,"
- H. Hermansky, E. Wan, and C. Avendano, "Speech enhancement based on temporal processing," in P roc. Int. Conf. Acoust., Speech, Signal Processing, 1995, pp. 405-408.
- (1995) P Roc. Int. Conf. Acoust., Speech, Signal Processing , pp. 405-408
- Hermansky, H.¹ Wan, E.² Avendano, C.³

38
- 0029239090
- "A comparative study of mel cepstra and EIH's for phone classification under adverse conditions,"
- Detroit, MI
- S. Sandhu and O. Ghitza, "A comparative study of mel cepstra and EIH's for phone classification under adverse conditions," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, Detroit, MI, 1995, pp. 409-412.
- (1995) Proc. Int. Conf. Acoustics, Speech, Signal Processing , pp. 409-412
- Sandhu, S.¹ Ghitza, O.²

39
- 0027152841
- "In search for the relevant parameters for speaker independent speech recognition,"
- J. Smolders and D. V. Compernolle, "In search for the relevant parameters for speaker independent speech recognition," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1993, vol. II, pp. 684-687.
- (1993) Proc. Int. Conf. Acoustics, Speech, Signal Processing , vol.2 , pp. 684-687
- Smolders, J.¹ Compernolle, D.V.²

40
- 0027167185
- "A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition
- K. Aikavva, H. Singer, H. Kawahara, and Y. Tohkura, "A dynamic cepstrum incorporating time-frequency masking and its application to continuous speech recognition," in Proc. Int. Conf. Acoustics, Speech, Signal Processing, 1993, vol. II, pp. 668-671.
- (1993) Proc. Int. Conf. Acoustics, Speech, Signal Processing , vol.2 , pp. 668-671
- Aikavva, K.¹ Singer, H.² Kawahara, H.³ Tohkura, Y.⁴

41
- 0031546280
- "Auditory model for robust speech recognition in real world noisy environments,"
- D.-S. Kim, S.-Y. Lee, R. M. Kil, and X. Zhu, "Auditory model for robust speech recognition in real world noisy environments," Electron. Lett., vol. 33, p. 12, 1997.
- (1997) Electron. Lett. , vol.33 , pp. 12
- Kim, D.-S.¹ Lee, S.-Y.² Kil, R.M.³ Zhu, X.⁴

42
- 0022667694
- "Speaker-independent isolated word recognition using dynamic features of speech spectrum,"
- S. Furui, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-34, pp. 52-59, 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Processing , vol.ASSP-34 , pp. 52-59
- Furui, S.¹

43
- 0343685350
- "Spectral dynamics for speech recognition under adverse conditions,"
- C.-H. Lee, K. Paliwal, and F. Soong, Eds. Boston, MA: Kluwer
- B. Hanson, T. Applebaum, and J.-C. Junqua, "Spectral dynamics for speech recognition under adverse conditions," in Advanced Topics in Automatic Speech and Speaker Recognition, C.-H. Lee, K. Paliwal, and F. Soong, Eds. Boston, MA: Kluwer, 1995.
- (1995) Advanced Topics in Automatic Speech and Speaker Recognition
- Hanson, B.¹ Applebaum, T.² Junqua, J.-C.³

44
- 0003234444
- "Multiple approaches to robust speech recognition,"
- Harriman, NY
- R. M. Stern et al, "Multiple approaches to robust speech recognition," in Proc. DARPA Speech V Natural Language Workshop, Harriman, NY, 1992, pp. 274-279.
- (1992) Proc. DARPA Speech V Natural Language Workshop , pp. 274-279
- Stern, R.M.¹

45
- 0029345416
- "A comparison of signal processing front ends for automatic word recognition,"
- C. R. Jankowski, Jr., H.-D. H. Vo, and R. P. Lippmann, "A comparison of signal processing front ends for automatic word recognition," IEEE Trans. Speech Audio Processing, vol. 3, pp. 286-293, 1995.
- (1995) IEEE Trans. Speech Audio Processing , vol.3 , pp. 286-293
- Jankowski Jr., C.R.¹ Vo, H.-D.H.² Lippmann, R.P.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.