메뉴 건너뛰기




Volumn 21, Issue 2, 2013, Pages 416-426

A multistream feature framework based on bandpass modulation filtering for robust speech recognition

Author keywords

Auditory cortex; automatic speech recognition (ASR); modulation; multistream; speech parameterization

Indexed keywords

AUDITORY CORTEX; AUTOMATIC SPEECH RECOGNITION; BAND PASS; BANDPASS OPERATION; CHANNEL DISTORTIONS; LOCALIZED FEATURES; MODULATION FILTERING; MULTI-STREAM; NONSTATIONARY NOISE; PARALLEL PATH; PARALLEL STREAMS; PHONEME RECOGNITION; PROPOSED ARCHITECTURES; ROBUST SPEECH RECOGNITION; SIGNAL DYNAMICS; SPECTRAL MODULATION; SPEECH SIGNALS; SPEECH SOUNDS; TEMPORAL DIMENSIONS;

EID: 84871829474     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2219526     Document Type: Article
Times cited : (38)

References (50)
  • 2
    • 0032139768 scopus 로고    scopus 로고
    • Should recognizers have ears?
    • PII S0167639398000272
    • H. Hermansky, "Should recognizers have ears?, " Speech Commun. , vol. 25, pp. 3-27, 1998. (Pubitemid 128413632)
    • (1998) Speech Communication , vol.25 , Issue.1-3 , pp. 3-27
    • Hermansky, H.1
  • 3
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • S. Boll, "Suppression of acoustic noise in speech using spectral subtraction, " IEEE Trans. Acoustic, Speech, Signal Process. , vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979. (Pubitemid 9467471)
    • (1979) IEEE Trans Acoust Speech Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll Steven, F.1
  • 4
    • 0019555090 scopus 로고
    • Cepstral analysis technique for automatic speaker verification
    • S. Furui, "Cepstral analysis technique for automatic speaker verification, " IEEE Trans. Acoust. , Speech, Signal Process. , vol. 29, no. 2, pp. 254-272, Apr. 1981. (Pubitemid 11495877)
    • (1981) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-29 , Issue.2 , pp. 254-272
    • Furui Sadaoki1
  • 10
    • 0141697346 scopus 로고    scopus 로고
    • Ph. D. dissertation, Lab. Intell. Artif. Perceptive, cole Polytechnique Fdrale, Lausanne, Switzerland
    • A. Hagen, "Robust speech recognition based on multi-stream processing, " Ph. D. dissertation, Lab. Intell. Artif. Perceptive, cole Polytechnique Fdrale, Lausanne, Switzerland, 2001.
    • (2001) Robust Speech Recognition Based on Multi-stream Processing
    • Hagen, A.1
  • 11
    • 73649085443 scopus 로고    scopus 로고
    • Multi-stream speech recognition based onDempster-Shafer combination rule
    • F. Valente, "Multi-stream speech recognition based onDempster-Shafer combination rule, " Speech Commun. , vol. 52, no. 3, pp. 213-222, 2010.
    • (2010) Speech Commun. , vol.52 , Issue.3 , pp. 213-222
    • Valente, F.1
  • 12
    • 70450216114 scopus 로고    scopus 로고
    • Multi-stream to many-stream: Using spectro-temporal features for ASR
    • R. S. and M. N.
    • S. Y. Zhao, R. S. , and M. N. , "Multi-stream to many-stream: Using spectro-temporal features for ASR, " in Proc. INTERSPEECH, 2009, pp. 2951-2954.
    • (2009) Proc. INTERSPEECH , pp. 2951-2954
    • Zhao, S.Y.1
  • 13
    • 79959816304 scopus 로고    scopus 로고
    • A multistream multiresolution framework for phoneme recognition
    • N. Mesgarani, S. Thomas, and H. Hermansky, "A multistream multiresolution framework for phoneme recognition, " in Proc. INTERSPEECH, 2010, pp. 318-321.
    • (2010) Proc. INTERSPEECH , pp. 318-321
    • Mesgarani, N.1    Thomas, S.2    Hermansky, H.3
  • 14
    • 0034825241 scopus 로고    scopus 로고
    • Multi-stream adaptive evidence combination for noise robust ASR
    • DOI 10.1016/S0167-6393(00)00044-3
    • A. Morris, A. Hagen, H. Glotin, and H. Bourlard, "Multi-stream adaptive evidence combination for noise robust ASR, " Speech Commun. , vol. 34, pp. 25-40, 2001. (Pubitemid 32874681)
    • (2001) Speech Communication , vol.34 , Issue.1-2 , pp. 25-40
    • Morris, A.1    Hagen, A.2    Glotin, H.3    Bourlard, H.4
  • 15
    • 45549100188 scopus 로고    scopus 로고
    • Speech analysis in a model of the central auditory system
    • Aug
    • J. Woojay and B. Juang, "Speech analysis in a model of the central auditory system, " IEEE Trans. Speech Audio Process. , vol. 15, no. 6, pp. 1802-1817, Aug. 2007.
    • (2007) IEEE Trans. Speech Audio Process. , vol.15 , Issue.6 , pp. 1802-1817
    • Woojay, J.1    Juang, B.2
  • 16
    • 84871848126 scopus 로고    scopus 로고
    • Spectro-temporal gabor features as a front end for automatic speech recognition
    • M. Kleinschmidt, "Spectro-temporal gabor features as a front end for automatic speech recognition, " in Forum Acusticum, 2002.
    • (2002) Forum Acusticum
    • Kleinschmidt, M.1
  • 17
    • 84865769808 scopus 로고    scopus 로고
    • Comparing different flavors of spectro-temporal features for ASR
    • B. Meyer, S. Ravuri, M. Schädler, and N. Morgan, "Comparing different flavors of spectro-temporal features for ASR, " Proc. INTERSPEECH, vol. 1, pp. 1269-1272, 2011.
    • (2011) Proc. INTERSPEECH , vol.1 , pp. 1269-1272
    • Meyer, B.1    Ravuri, S.2    Schädler, M.3    Morgan, N.4
  • 19
    • 84865738978 scopus 로고    scopus 로고
    • Multistream bandpass modulation features for robust speech recognition
    • S. Nemala, K. Patil, and M. Elhilali, "Multistream bandpass modulation features for robust speech recognition, " in Proc. ISCA, 2011, pp. 1277-1280.
    • (2011) Proc. ISCA , pp. 1277-1280
    • Nemala, S.1    Patil, K.2    Elhilali, M.3
  • 20
    • 0026626445 scopus 로고
    • Auditory representations of acoustic signals
    • Mar
    • X. Yang, K. Wang, and S. A. Shamma, "Auditory representations of acoustic signals, " IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 824-839, Mar. 1992.
    • (1992) IEEE Trans. Inf. Theory , vol.38 , Issue.2 , pp. 824-839
    • Yang, X.1    Wang, K.2    Shamma, S.A.3
  • 21
    • 23744508888 scopus 로고    scopus 로고
    • Multiresolution spectrotemporal analysis of complex sounds
    • DOI 10.1121/1.1945807
    • T. Chi, P. Ru, and S. Shamma, "Multiresolution spectrotemporal analysis of complex sounds, " J. Acoust. Soc. Amer. , vol. 118, pp. 887-906, 2005. (Pubitemid 41129224)
    • (2005) Journal of the Acoustical Society of America , vol.118 , Issue.2 , pp. 887-906
    • Chi, T.1    Ru, P.2    Shamma, S.A.3
  • 22
    • 34247487053 scopus 로고    scopus 로고
    • The cortical organization of speech processing
    • DOI 10.1038/nrn2113, PII NRN2113
    • G. Hickock and D. Poeppel, "The cortical organization of speech processing, " Nature Neurosc. Reviews, vol. 8, pp. 393-402, 2007. (Pubitemid 46652465)
    • (2007) Nature Reviews Neuroscience , vol.8 , Issue.5 , pp. 393-402
    • Hickok, G.1    Poeppel, D.2
  • 23
    • 0032142971 scopus 로고    scopus 로고
    • Cortical processing of complex sounds
    • DOI 10.1016/S0959-4388(98)80040-8
    • J. P. Rauschecker, "Cortical processing of complex sounds, " Curr. Opin. Neurobiol. , vol. 8, pp. 516-521, 1998. (Pubitemid 28431742)
    • (1998) Current Opinion in Neurobiology , vol.8 , Issue.4 , pp. 516-521
    • Rauschecker, J.P.1
  • 24
    • 0018564438 scopus 로고
    • Temporal modulation transfer functions based upon modulation thresholds
    • N. F. Viemeister, "Temporal modulation transfer functions based upon modulation thresholds, " J Acoust Soc Amer. , vol. 66, no. 5, pp. 1364-1380, Nov. 1979. (Pubitemid 10098323)
    • (1979) Journal of the Acoustical Society of America , vol.66 , Issue.5 , pp. 1364-1380
    • Viemeister, N.F.1
  • 25
    • 0039816305 scopus 로고
    • Cambridge MA Plenum ch. Frequency and the detection of spectral shape change
    • D. Green, Auditory Frequency Selectivity. Cambridge, MA: Plenum, 1986, ch. Frequency and the detection of spectral shape change, pp. 351-359.
    • (1986) Auditory Frequency Selectivity , pp. 351-359
    • Green, D.1
  • 26
    • 0040290402 scopus 로고    scopus 로고
    • Spectrotemporal modulation transfer functions and speech intelligibility
    • T. Chi, Y. Gao, M. C. Guyton, P. Ru, and S. A. Shamma, "Spectrotemporal modulation transfer functions and speech intelligibility, " J. Acoust. Soc. Amer. , vol. 106, pp. 2719-2732, 1999.
    • (1999) J. Acoust. Soc. Amer. , vol.106 , pp. 2719-2732
    • Chi, T.1    Gao, Y.2    Guyton, M.C.3    Ru, P.4    Shamma, S.A.5
  • 28
    • 0038711696 scopus 로고    scopus 로고
    • A spectro-temporal modulation index (STMI) for assessment of speech intelligibility
    • M. Elhilali, T. Chi, and S. A. Shamma, "A spectro-temporal modulation index (STMI) for assessment of speech intelligibility, " Speech Commun. , vol. 41, pp. 331-348, 2003.
    • (2003) Speech Commun. , vol.41 , pp. 331-348
    • Elhilali, M.1    Chi, T.2    Shamma, S.A.3
  • 30
    • 63549114783 scopus 로고    scopus 로고
    • The modulation transfer function for speech intelligibility
    • T. Elliott and F. Theunissen, "The modulation transfer function for speech intelligibility, " PLoS Comput. Biol. , vol. 5, p. e1000302, 2009.
    • (2009) PLoS Comput. Biol. , vol.5
    • Elliott, T.1    Theunissen, F.2
  • 33
    • 0024768209 scopus 로고
    • Speaker-independent phone recognition using hidden Markov models
    • Nov
    • K. F. Lee and H. W. Hon, "Speaker-independent phone recognition using hidden Markov models, " IEEE Trans. Acoust. , Speech, Signal Process. , vol. 37, no. 11, pp. 1641-1648, Nov. 1989.
    • (1989) IEEE Trans. Acoust. , Speech, Signal Process. , vol.37 , Issue.11 , pp. 1641-1648
    • Lee, K.F.1    Hon, H.W.2
  • 34
    • 77957731201 scopus 로고    scopus 로고
    • Datadriven and feedback based spectro-temporal features for speech recognition
    • Nov.
    • S. Garimella, S. Nemala, N. Mesgarani, and H. Hermansky, "Datadriven and feedback based spectro-temporal features for speech recognition, " IEEE Signal Process. Lett. , vol. 17, no. 11, pp. 957-960, Nov. 2010.
    • (2010) IEEE Signal Process. Lett. , vol.17 , Issue.11 , pp. 957-960
    • Garimella, S.1    Nemala, S.2    Mesgarani, N.3    Hermansky, H.4
  • 35
    • 11144222882 scopus 로고    scopus 로고
    • Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system
    • DOI 10.1109/TSA.2004.834466
    • P. Pujol, S. Pol, C. Nadeu, A. Hagen, and H. Bourlard, "Comparison and combination of features in a hybridHMM/MLP and aHMM/GMM speech recognition system, " IEEE Trans. Speech Audio Process. , vol. 13, no. 1, pp. 14-22, Jan. 2005. (Pubitemid 40049936)
    • (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.1 , pp. 14-22
    • Pujol, P.1    Pol, S.2    Nadeu, C.3    Hagen, A.4    Bourlard, H.5
  • 36
    • 0001595997 scopus 로고
    • Neural network classifiers estimate Bayesian a posteriori probabilities
    • M. Richard and R. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities, " Neural Computation, vol. 3, no. 4, pp. 461-483, 1991.
    • (1991) Neural Computation , vol.3 , Issue.4 , pp. 461-483
    • Richard, M.1    Lippmann, R.2
  • 40
    • 79551573428 scopus 로고    scopus 로고
    • [Online]. Available date last viewed 11/25/2011
    • H. Hirsch, FaNT: Filtering andNoiseAdding Tool. [Online]. Available: http://dnt. kr. hsnr. de/download. html (date last viewed 11/25/2011), 2005
    • (2005) FaNT: Filtering AndNoiseAdding Tool
    • Hirsch, H.1
  • 42
    • 84871840545 scopus 로고    scopus 로고
    • Philadelphia PA: Linguistic Data Consortium
    • D. Reynolds, HTIMIT. Philadelphia, PA: Linguistic Data Consortium, 1998, p. LDC98S67.
    • (1998) HTIMIT
    • Reynolds, D.1
  • 45
    • 79952171347 scopus 로고    scopus 로고
    • Temporal envelope compensation for robust phoneme recognition using modulation spectrum
    • S. Ganapathy, S. Thomas, and H. Hermansky, "Temporal envelope compensation for robust phoneme recognition using modulation spectrum, " J. Acoust. Soc. Amer. , vol. 128, pp. 3769-3780, 2010.
    • (2010) J. Acoust. Soc. Amer. , vol.128 , pp. 3769-3780
    • Ganapathy, S.1    Thomas, S.2    Hermansky, H.3
  • 46
    • 0037828299 scopus 로고    scopus 로고
    • Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers
    • DOI 10.1121/1.1579009
    • M. K. Qin and A. J. Oxenham, "Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers, " J. Acoust. Soc. Amer. , vol. 114, no. 1, pp. 446-454, Jul. 2003. (Pubitemid 36835514)
    • (2003) Journal of the Acoustical Society of America , vol.114 , Issue.1 , pp. 446-454
    • Qin, M.K.1    Oxenham, A.J.2
  • 48
    • 79960669709 scopus 로고    scopus 로고
    • Toward optimizing stream fusion in multistream recognition of speech
    • N. Mesgarani, S. Thomas, and H. Hermansky, "Toward optimizing stream fusion in multistream recognition of speech, " J. Acoust. Soc. Amer. , vol. 130, pp. 14-18, 2011.
    • (2011) J. Acoust. Soc. Amer. , vol.130 , pp. 14-18
    • Mesgarani, N.1    Thomas, S.2    Hermansky, H.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.