메뉴 건너뛰기




Volumn 29, Issue 6, 2012, Pages 34-43

Hearing is believing: Biologically inspired methods for robust automatic speech recognition

Author keywords

[No Author keywords available]

Indexed keywords

ACOUSTICS; AUDIO SYSTEMS; AUDITION; DEEP NEURAL NETWORKS; FEATURE EXTRACTION;

EID: 85032751341     PISSN: 10535888     EISSN: None     Source Type: Journal    
DOI: 10.1109/MSP.2012.2207989     Document Type: Article
Times cited : (49)

References (59)
  • 1
    • 3442876970 scopus 로고    scopus 로고
    • Phase-based dual-microphone robust speech enhancement
    • P. Aarabi and G. Shi "Phase-based dual-microphone robust speech enhancement," IEEE Trans. Syst., Man, Cybern., B, vol. 34, no. 4, pp. 1763-1773, 2004.
    • (2004) IEEE Trans. Syst., Man, Cybern., B , vol.34 , Issue.4 , pp. 1763-1773
    • Aarabi, P.1    Shi, G.2
  • 2
    • 0028516073 scopus 로고
    • How do humans process and recognize speech?
    • J. B. Allen, "How do humans process and recognize speech?," IEEE Trans. Speech Audio, vol. 2, no. 4, pp. 567-577, 1994.
    • (1994) IEEE Trans. Speech Audio , vol.2 , Issue.4 , pp. 567-577
    • Allen, J.B.1
  • 3
    • 84863773378 scopus 로고    scopus 로고
    • Frequency-domain linear prediction for temporal features
    • M. Athineos and D. Ellis, "Frequency-domain linear prediction for temporal features," in Proc. IEEE ASRU Workshop, 2003, pp. 261-266.
    • (2003) Proc. IEEE ASRU Workshop , pp. 261-266
    • Athineos, M.1    Ellis, D.2
  • 4
    • 23744508888 scopus 로고    scopus 로고
    • Multiresolution spectrotemporal analysis of complex sounds
    • DOI 10.1121/1.1945807
    • T. Chi, P. Ru, and S. A. Shamma, "Multiresolution spectrotemporal analysis of complex sounds," J. Acoust. Soc. Amer., vol. 118, no. 2, pp. 887-906, 2005. (Pubitemid 41129224)
    • (2005) Journal of the Acoustical Society of America , vol.118 , Issue.2 , pp. 887-906
    • Chi, T.1    Ru, P.2    Shamma, S.A.3
  • 5
    • 0024392496 scopus 로고
    • Application of an auditory model to speech recognition
    • DOI 10.1121/1.397756
    • J. R. Cohen, "Application of an auditory model to speech recognition," J. Acoust. Soc. Amer., vol. 85, no. 6, pp. 2623-2629, 1989. (Pubitemid 19160389)
    • (1989) Journal of the Acoustical Society of America , vol.85 , Issue.6 , pp. 2623-2629
    • Cohen, J.R.1
  • 6
    • 33745224873 scopus 로고
    • Vocal track normalization in speech recognition: Compensating for systematic speaker variability
    • J. R. Cohen, T. Kamm, and A.G. Andreou, "Vocal track normalization in speech recognition: Compensating for systematic speaker variability," J. Acoust. Soc. Amer., vol. 97, no. 5, pp. 3246-3247, 1995.
    • (1995) J. Acoust. Soc. Amer. , vol.97 , Issue.5 , pp. 3246-3247
    • Cohen, J.R.1    Kamm, T.2    Andreou, A.G.3
  • 7
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • S. Davis and P. Mermelstein, "Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust. Speech Signal Processing, vol. 28, no. 4, pp. 357-366, 1980. (Pubitemid 11464930)
    • (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis Steven, B.1    Mermelstein Paul2
  • 8
    • 0023516708 scopus 로고
    • A composite auditory model for processing speech sounds
    • L. Deng and D. C. Geisler, "A composite auditory model for processing speech sounds," J. Acoust. Soc. Amer., vol. 82, no. 6, pp. 2001-2012, 1987.
    • (1987) J. Acoust. Soc. Amer. , vol.82 , Issue.6 , pp. 2001-2012
    • Deng, L.1    Geisler, D.C.2
  • 9
    • 0035097825 scopus 로고    scopus 로고
    • Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex
    • D. A. Depireux, J. Z. Simon, D. J. Klein, and S. A. Shamma, "Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex," J. Neurophysiol., vol. 85, no. 3, pp. 1220-1234, 2001. (Pubitemid 32209608)
    • (2001) Journal of Neurophysiology , vol.85 , Issue.3 , pp. 1220-1234
    • Depireux, D.A.1    Simon, J.Z.2    Klein, D.J.3    Shamma, S.A.4
  • 10
    • 0002439510 scopus 로고
    • Auditory patterns
    • H. Fletcher, "Auditory patterns," Rev. Mod. Phys., vol. 12, no. 1, pp. 47-65, 1940.
    • (1940) Rev. Mod. Phys. , vol.12 , Issue.1 , pp. 47-65
    • Fletcher, H.1
  • 11
    • 84955013022 scopus 로고
    • Loudness, its definition, measurement, and calculation
    • H. Fletcher and W. A. Munson, "Loudness, its definition, measurement, and calculation," J. Acoust. Soc. Amer., vol. 5, no. 2, pp. 82-108, 1933.
    • (1933) J. Acoust. Soc. Amer. , vol.5 , Issue.2 , pp. 82-108
    • Fletcher, H.1    Munson, W.A.2
  • 12
    • 84953657538 scopus 로고
    • Factors governing the intelligibility of speech sounds
    • N. R. French and J. C. Steinberg, "Factors governing the intelligibility of speech sounds," J. Acoust. Soc. Amer. vol. 19, no. 1, pp. 90-119, 1947.
    • (1947) J. Acoust. Soc. Amer. , vol.19 , Issue.1 , pp. 90-119
    • French, N.R.1    Steinberg, J.C.2
  • 13
    • 0022548705 scopus 로고
    • On the role of spectral transition for speech perception
    • S. Furui, "On the role of spectral transition for speech perception," J. Acoust. Soc. Amer., vol. 80, no. 4, pp. 1016-1025, 1986. (Pubitemid 16023317)
    • (1986) Journal of the Acoustical Society of America , vol.80 , Issue.4 , pp. 1016-1025
    • Furui, S.1
  • 14
    • 84991416125 scopus 로고
    • Auditory nerve representation as a front-end for speech recognition in a noisy environment
    • O. Ghitza, "Auditory nerve representation as a front-end for speech recognition in a noisy environment," Comput. Speech Lang., vol. 1, no. 2, pp. 109-130, 1986.
    • (1986) Comput. Speech Lang. , vol.1 , Issue.2 , pp. 109-130
    • Ghitza, O.1
  • 16
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • DOI 10.1121/1.399423
    • H. Hermansky, "Perceptual linear predictive (PLP) analysis of speech," J. Acoust. Soc. Amer., vol. 87, no. 4, pp. 1738-1752, 1990. (Pubitemid 20256470)
    • (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 17
    • 0033709098 scopus 로고    scopus 로고
    • Tandem connectionist feature extraction for conventional HMM systems
    • H. Hermansky, D. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems," in Proc. IEEE ICASSP, Istanbul, Turkey, 2000, pp. 1635-1638.
    • (2000) Proc. IEEE ICASSP, Istanbul, Turkey , pp. 1635-1638
    • Hermansky, H.1    Ellis, D.2    Sharma, S.3
  • 19
    • 0032658253 scopus 로고    scopus 로고
    • Temporal patterns (TRAPS) in ASR of noisy speech
    • H. Hermansky and S. Sharma, "Temporal patterns (TRAPS) in ASR of noisy speech," in Proc. IEEE ICASSP, Phoenix, AZ, 1999, pp. 1255-1258.
    • (1999) Proc. IEEE ICASSP, Phoenix, AZ , pp. 1255-1258
    • Hermansky, H.1    Sharma, S.2
  • 21
    • 0001887874 scopus 로고
    • A place theory of sound localization
    • L. A. Jeffress, "A place theory of sound localization," J. Comp. Physiol. Psychol., vol. 41, no. 1, pp. 35-39, 1948.
    • (1948) J. Comp. Physiol. Psychol. , vol.41 , Issue.1 , pp. 35-39
    • Jeffress, L.A.1
  • 22
    • 1642342844 scopus 로고    scopus 로고
    • Neural Processing of Amplitude-Modulated Sounds
    • DOI 10.1152/physrev.00029.2003
    • P. X. Joris, C. E. Schreiner, and A. Rees, "Neural processing of amplitude-modulated sounds," Physiol. Rev., vol. 84, no. 2, pp. 541-577, 2004. (Pubitemid 38365492)
    • (2004) Physiological reviews , vol.84 , Issue.2 , pp. 541-577
    • Joris, P.X.1    Schreiner, C.E.2    Rees, A.3
  • 23
    • 79959824321 scopus 로고    scopus 로고
    • Nonlinear enhancement of onset for robust speech recognition
    • C. Kim and R. M. Stern, "Nonlinear enhancement of onset for robust speech recognition," in Proc. Interspeech, Makuhari, Japan, 2010.
    • (2010) Proc. Interspeech, Makuhari, Japan
    • Kim, C.1    Stern, R.M.2
  • 24
    • 79959834164 scopus 로고    scopus 로고
    • Automatic selection of thresholds for signal separation algorithms based on interaural delay
    • C. Kim, R. M. Stern, K. Eom, and J. Lee, "Automatic selection of thresholds for signal separation algorithms based on interaural delay," in Proc Interspeech, Makuhari, Japan, 2010.
    • (2010) Proc Interspeech, Makuhari, Japan
    • Kim, C.1    Stern, R.M.2    Eom, K.3    Lee, J.4
  • 25
    • 84910064377 scopus 로고    scopus 로고
    • Power-normalized cepstral coefficients (PNCC) for robust speech recognition
    • to be published
    • C. Kim and R. M. Stern, "Power-normalized cepstral coefficients (PNCC) for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Processing, to be published.
    • IEEE Trans. Audio, Speech, Lang. Processing
    • Kim, C.1    Stern, R.M.2
  • 26
    • 0032785783 scopus 로고    scopus 로고
    • Auditory processing of speech signals for robust speech recognition in real world noisy environments
    • D. Kim, S. Lee, and R. M. Kil, "Auditory processing of speech signals for robust speech recognition in real world noisy environments," IEEE Trans. Speech Audio Processing, vol. 7, no. 1, pp. 55-69, 1999.
    • (1999) IEEE Trans. Speech Audio Processing , vol.7 , Issue.1 , pp. 55-69
    • Kim, D.1    Lee, S.2    Kil, R.M.3
  • 28
    • 85009227802 scopus 로고    scopus 로고
    • Localized spectro-temporal features for automatic speech recognition
    • M. Kleinschmidt, "Localized spectro-temporal features for automatic speech recognition," in Proc. Eurospeech, 2003, pp. 2573-2576.
    • (2003) Proc. Eurospeech , pp. 2573-2576
    • Kleinschmidt, M.1
  • 29
    • 0001463644 scopus 로고
    • A duplex theory of pitch perception
    • J. C. R. Licklider, "A duplex theory of pitch perception," Experientia, vol. 7, no. 4, pp. 128-134, 1951.
    • (1951) Experientia , vol.7 , Issue.4 , pp. 128-134
    • Licklider, J.C.R.1
  • 30
    • 70450168923 scopus 로고    scopus 로고
    • Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments
    • X. Lu, M. Unoki , and S. Nakamura. "Subband temporal modulation spectrum normalization for automatic speech recognition in reverberant environments," in Proc. Interspeech 2009.
    • (2009) Proc. Interspeech
    • Lu, X.1    Unoki, M.2    Nakamura, S.3
  • 31
    • 79251542316 scopus 로고
    • A computational model of filtering, detection and compression in the cochlea
    • R. F. Lyon, "A computational model of filtering, detection and compression in the cochlea," in Proc. ICASSP, Paris, France, 1982, pp. 1282-1285.
    • (1982) Proc. ICASSP, Paris, France , pp. 1282-1285
    • Lyon, R.F.1
  • 32
    • 0020497765 scopus 로고
    • A computational model of binaural localization and separation
    • R. F. Lyon, "A computational model of binaural localization and separation," in Proc. ICASSP, Boston, MA, 1983, pp. 1148-1151.
    • (1983) Proc. ICASSP, Boston, MA , pp. 1148-1151
    • Lyon, R.F.1
  • 34
    • 0034296009 scopus 로고    scopus 로고
    • Finding consensus in speech recognition; Word error minimization and other applications of confusion networks
    • L. Mangu, E. Brill, and A. Stolcke, "Finding consensus in speech recognition; word error minimization and other applications of confusion networks," Comput. Speech Lang., vol. 14, no. 4, pp. 373-400, 2000.
    • (2000) Comput. Speech Lang. , vol.14 , Issue.4 , pp. 373-400
    • Mangu, L.1    Brill, E.2    Stolcke, A.3
  • 36
    • 34047272330 scopus 로고    scopus 로고
    • Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations
    • N. Mesgarani, M. Slaney, and S. Shamma. "Discrimination of speech from nonspeech based on multiscale spectro-temporal modulations," IEEE Trans. Audio, Speech, Lang. Processing, vol. 14, no. 3, pp. 920-929, 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Processing , vol.14 , Issue.3 , pp. 920-929
    • Mesgarani, N.1    Slaney, M.2    Shamma, S.3
  • 38
    • 0020816083 scopus 로고
    • Suggested formulae for calculating auditory-filter bandwidths and excitation patterns
    • B. C. J. Moore and B. R. Glasberg, "Suggested formulae for calculating auditory-filter bandwidths and excitation patterns," J. Acoust. Soc. Amer., vol. 74, no. 3, pp. 750-753, 1983. (Pubitemid 13019047)
    • (1983) Journal of the Acoustical Society of America , vol.74 , Issue.3 , pp. 750-753
    • Moore, B.C.J.1    Glasberg, B.R.2
  • 40
    • 44949110409 scopus 로고
    • Environmental robustness in automatic speech recognition using physiologically motivated signal processing
    • Y. Ohshima and R. M. Stern, "Environmental robustness in automatic speech recognition using physiologically motivated signal processing," in Proc. ICSLP 1994, pp. 1619-1622.
    • (1994) Proc. ICSLP , pp. 1619-1622
    • Ohshima, Y.1    Stern, R.M.2
  • 41
    • 0031187171 scopus 로고    scopus 로고
    • Speech recognition by machines and humans
    • PII S0167639397000216
    • R. P. Lippmann, "Speech recognition by machines and humans," Speech Commun., vol. 22, no. 1, pp. 1-15, 1997. (Pubitemid 127403436)
    • (1997) Speech Communication , vol.22 , Issue.1 , pp. 1-15
    • Lippmann, R.P.1
  • 43
    • 0023841401 scopus 로고
    • Vowel processing by a model of the auditory periphery: A comparison to eighth-nerve responses
    • K. L. Payton, "Vowel processing by a model of the auditory periphery: A comparison to eighth-nerve responses," J. Acoust. Soc. Amer., vol. 83, no. 1, pp. 145-162, 1988. (Pubitemid 18036631)
    • (1988) Journal of the Acoustical Society of America , vol.83 , Issue.1 , pp. 145-162
    • Payton, K.L.1
  • 46
    • 0142026377 scopus 로고    scopus 로고
    • Speech segregation based on sound localization
    • DOI 10.1121/1.1610463
    • N. Roman, DeL. Wang, and G. J. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, no. 4, pp. 2236-2252, 2003. (Pubitemid 37266649)
    • (2003) Journal of the Acoustical Society of America , vol.114 , Issue.4 , pp. 2236-2252
    • Roman, N.1    Wang, D.2    Brown, G.J.3
  • 48
    • 17344367464 scopus 로고
    • Recognition of complex acoustic signals
    • T. H. Bullock, Ed. Abakon Verlag
    • M. R. Schroeder, "Recognition of complex acoustic signals," in Life Sciences Research Report 5, T. H. Bullock, Ed. Abakon Verlag, 1977.
    • (1977) Life Sciences Research Report 5
    • Schroeder, M.R.1
  • 49
    • 84928837806 scopus 로고
    • A joint synchrony/mean-rate model of auditory speech processing
    • S. Seneff, "A joint synchrony/mean-rate model of auditory speech processing," J. Phonet., vol. 15, no. 1, pp. 55-76, 1988.
    • (1988) J. Phonet. , vol.15 , Issue.1 , pp. 55-76
    • Seneff, S.1
  • 50
    • 0031647650 scopus 로고    scopus 로고
    • Speech analysis and recognition using interval statistics generated from a composite auditory model
    • H. Sheikhzadeh and L. Deng, "Speech analysis and recognition using interval statistics generated from a composite auditory model," IEEE Trans. Speech Audio Processing, vol. 6, no. 1, pp. 50-54, 1998.
    • (1998) IEEE Trans. Speech Audio Processing , vol.6 , Issue.1 , pp. 50-54
    • Sheikhzadeh, H.1    Deng, L.2
  • 51
    • 84868663836 scopus 로고    scopus 로고
    • Binaural sound localization
    • D. Wang and G. J. Brown, Eds. New York: IEEE Press
    • R. M. Stern, G. J. Brown, and D. Wang, "Binaural sound localization," in Computational Auditory Scene Analysis, D. Wang and G. J. Brown, Eds. New York: IEEE Press, 2006, pp. 147-185.
    • (2006) Computational Auditory Scene Analysis , pp. 147-185
    • Stern, R.M.1    Brown, G.J.2    Wang, D.3
  • 52
    • 56149126779 scopus 로고    scopus 로고
    • 'Polyaural' array processing for automatic speech recognition in degraded environments
    • R. M. Stern, E. Gouvêa, and G. Thattai, "'Polyaural' array processing for automatic speech recognition in degraded environments," in Proc. Interspeech 2007.
    • (2007) Proc. Interspeech
    • Stern, R.M.1    Gouvêa, E.2    Thattai, G.3
  • 53
    • 34447546202 scopus 로고
    • On the psychophysical law
    • S. S. Stevens, "On the psychophysical law," Psychol. Rev., vol. 64, pp. 153-181, 1957.
    • (1957) Psychol. Rev. , vol.64 , pp. 153-181
    • Stevens, S.S.1
  • 54
    • 84955035459 scopus 로고
    • A scale for the measurement of the psychological magnitude pitch
    • S. S. Stevens, J. Volkman, and E. Newman, "A scale for the measurement of the psychological magnitude pitch," J. Acoust. Soc. Amer., vol. 8, no. 3, pp. 185-190, 1937.
    • (1937) J. Acoust. Soc. Amer. , vol.8 , Issue.3 , pp. 185-190
    • Stevens, S.S.1    Volkman, J.2    Newman, E.3
  • 55
    • 0032828464 scopus 로고    scopus 로고
    • A model of auditory perception as front end for automatic speech recognition
    • J. Tchorz and B. Kollmeier, "A model of auditory perception as front end for automatic speech recognition," J. Acoust. Soc. Amer., vol. 106, no. 4, pp. 2040-2060, 1999.
    • (1999) J. Acoust. Soc. Amer. , vol.106 , Issue.4 , pp. 2040-2060
    • Tchorz, J.1    Kollmeier, B.2
  • 57
    • 0035122055 scopus 로고    scopus 로고
    • A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression
    • DOI 10.1121/1.1336503
    • X. Zhang, M. G. Heinz, I. C. Bruce, and L. H. Carney, "A phenomenological model for the response of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression," J. Acoust. Soc. Amer., vol. 109, no. 2, pp. 648-670, 2001. (Pubitemid 32144001)
    • (2001) Journal of the Acoustical Society of America , vol.109 , Issue.2 , pp. 648-670
    • Zhang, X.1    Heinz, M.G.2    Bruce, I.C.3    Carney, L.H.4
  • 58
    • 84953656445 scopus 로고
    • Subdivision of the audible frequency range into critical bands (frequenzgruppen)
    • E. Zwicker, "Subdivision of the audible frequency range into critical bands (frequenzgruppen)," J. Acoustic. Soc. Amer., vol. 33, no. 248, , p. 248, 1961.
    • (1961) J. Acoustic. Soc. Amer. , vol.33 , Issue.248 , pp. 248
    • Zwicker, E.1
  • 59
    • 0022976531 scopus 로고    scopus 로고
    • Extension of a binaural cross-correlation model by contralateral inhibiltion: I Simulation of lateralizaton for stationary signals
    • W. Lindemann, "Extension of a binaural cross-correlation model by contralateral inhibiltion: I Simulation of lateralizaton for stationary signals," J. Acoustic. Soc. Amer., vol. 80, no. 6, pp. 1608-1622.
    • J. Acoustic. Soc. Amer. , vol.80 , Issue.6 , pp. 1608-1622
    • Lindemann, W.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.