SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 2, 2013, Pages 416-426

A multistream feature framework based on bandpass modulation filtering for robust speech recognition

(3) Nemala, Sridhar Krishna a Patil, Kailash a Elhilali, Mounya a

a Johns Hopkins University (United States)

Author keywords

Auditory cortex; automatic speech recognition (ASR); modulation; multistream; speech parameterization

Indexed keywords

AUDITORY CORTEX; AUTOMATIC SPEECH RECOGNITION; BAND PASS; BANDPASS OPERATION; CHANNEL DISTORTIONS; LOCALIZED FEATURES; MODULATION FILTERING; MULTI-STREAM; NONSTATIONARY NOISE; PARALLEL PATH; PARALLEL STREAMS; PHONEME RECOGNITION; PROPOSED ARCHITECTURES; ROBUST SPEECH RECOGNITION; SIGNAL DYNAMICS; SPECTRAL MODULATION; SPEECH SIGNALS; SPEECH SOUNDS; TEMPORAL DIMENSIONS;

SPEECH PROCESSING; SPEECH RECOGNITION;

MODULATION;

EID: 84871829474 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2219526 Document Type: Article

Times cited : (38)

References (50)

1
- 33947655538
- Berlin, Germany Springer
- S. Greenberg, A. Popper, andW. Ainsworth, Speech Processing in the Auditory System. Berlin, Germany: Springer, 2004.
- (2004) Speech Processing in the Auditory System
- Greenberg, S.¹ Popper, A.² Ainsworth, W.³

2
- 0032139768
- Should recognizers have ears?
- PII S0167639398000272
- H. Hermansky, "Should recognizers have ears?, " Speech Commun. , vol. 25, pp. 3-27, 1998. (Pubitemid 128413632)
- (1998) Speech Communication , vol.25 , Issue.1-3 , pp. 3-27
- Hermansky, H.¹

3
- 0018455310
- Suppression of acoustic noise in speech using spectral subtraction
- S. Boll, "Suppression of acoustic noise in speech using spectral subtraction, " IEEE Trans. Acoustic, Speech, Signal Process. , vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979. (Pubitemid 9467471)
- (1979) IEEE Trans Acoust Speech Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
- Boll Steven, F.¹

4
- 0019555090
- Cepstral analysis technique for automatic speaker verification
- S. Furui, "Cepstral analysis technique for automatic speaker verification, " IEEE Trans. Acoust. , Speech, Signal Process. , vol. 29, no. 2, pp. 254-272, Apr. 1981. (Pubitemid 11495877)
- (1981) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-29 , Issue.2 , pp. 254-272
- Furui Sadaoki¹

5
- 0030711157
- Transcription of broadcast television and radio news: The 1996 abbot system
- G. Cook, D. Kershaw, J. Christie, C. Seymour, and S. Waterhouse, "Transcription of broadcast television and radio news: The 1996 abbot system, " Proc. Int. Acoustics Speech Signal Process. , pp. 723-726, 1997.
- (1997) Proc. Int. Acoustics Speech Signal Process. , pp. 723-726
- Cook, G.¹ Kershaw, D.² Christie, J.³ Seymour, C.⁴ Waterhouse, S.⁵

6
- 42549139762
- Mva processing of speech features
- Jan
- C. Chen and J. Bilmes, "Mva processing of speech features, " IEEE Trans. Audio, Speech, Lang. Process. , vol. 15, no. 1, pp. 257-270, Jan. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 257-270
- Chen, C.¹ Bilmes, J.²

7
- 84871822913
- ETSI ES 202 050 v1. 1. 1 STQ Compression Algorithms, ETSI ES 202 050 v1. 1. 1 STQ, ETSI
- ETSI ES 202 050 v1. 1. 1 STQ; Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm; Compression Algorithms, ETSI ES 202 050 v1. 1. 1 STQ, ETSI, 2002.
- (2002) Distributed Speech Recognition; Advanced Front-End Feature Extraction Algorithm

8
- 34447100796
- 1st ed Boca Raton, FL: CRC
- P. Loizou, Speech Enhancement: Theory and Practice, 1st ed. Boca Raton, FL: CRC, 2007.
- (2007) Speech Enhancement: Theory and Practice
- Loizou, P.¹

9
- 9644298434
- Ph. D. dissertation, Oregon Graduate Inst. of Sci. Technol. , Portland, OR
- S. Sharma, "Multi-stream approach to robust speech recognition, " Ph. D. dissertation, Oregon Graduate Inst. of Sci. Technol. , Portland, OR, 1999.
- (1999) Multi-stream Approach to Robust Speech Recognition
- Sharma, S.¹

10
- 0141697346
- Ph. D. dissertation, Lab. Intell. Artif. Perceptive, cole Polytechnique Fdrale, Lausanne, Switzerland
- A. Hagen, "Robust speech recognition based on multi-stream processing, " Ph. D. dissertation, Lab. Intell. Artif. Perceptive, cole Polytechnique Fdrale, Lausanne, Switzerland, 2001.
- (2001) Robust Speech Recognition Based on Multi-stream Processing
- Hagen, A.¹

11
- 73649085443
- Multi-stream speech recognition based onDempster-Shafer combination rule
- F. Valente, "Multi-stream speech recognition based onDempster-Shafer combination rule, " Speech Commun. , vol. 52, no. 3, pp. 213-222, 2010.
- (2010) Speech Commun. , vol.52 , Issue.3 , pp. 213-222
- Valente, F.¹

12
- 70450216114
- Multi-stream to many-stream: Using spectro-temporal features for ASR
- R. S. and M. N.
- S. Y. Zhao, R. S. , and M. N. , "Multi-stream to many-stream: Using spectro-temporal features for ASR, " in Proc. INTERSPEECH, 2009, pp. 2951-2954.
- (2009) Proc. INTERSPEECH , pp. 2951-2954
- Zhao, S.Y.¹

13
- 79959816304
- A multistream multiresolution framework for phoneme recognition
- N. Mesgarani, S. Thomas, and H. Hermansky, "A multistream multiresolution framework for phoneme recognition, " in Proc. INTERSPEECH, 2010, pp. 318-321.
- (2010) Proc. INTERSPEECH , pp. 318-321
- Mesgarani, N.¹ Thomas, S.² Hermansky, H.³

14
- 0034825241
- Multi-stream adaptive evidence combination for noise robust ASR
- DOI 10.1016/S0167-6393(00)00044-3
- A. Morris, A. Hagen, H. Glotin, and H. Bourlard, "Multi-stream adaptive evidence combination for noise robust ASR, " Speech Commun. , vol. 34, pp. 25-40, 2001. (Pubitemid 32874681)
- (2001) Speech Communication , vol.34 , Issue.1-2 , pp. 25-40
- Morris, A.¹ Hagen, A.² Glotin, H.³ Bourlard, H.⁴

15
- 45549100188
- Speech analysis in a model of the central auditory system
- Aug
- J. Woojay and B. Juang, "Speech analysis in a model of the central auditory system, " IEEE Trans. Speech Audio Process. , vol. 15, no. 6, pp. 1802-1817, Aug. 2007.
- (2007) IEEE Trans. Speech Audio Process. , vol.15 , Issue.6 , pp. 1802-1817
- Woojay, J.¹ Juang, B.²

16
- 84871848126
- Spectro-temporal gabor features as a front end for automatic speech recognition
- M. Kleinschmidt, "Spectro-temporal gabor features as a front end for automatic speech recognition, " in Forum Acusticum, 2002.
- (2002) Forum Acusticum
- Kleinschmidt, M.¹

17
- 84865769808
- Comparing different flavors of spectro-temporal features for ASR
- B. Meyer, S. Ravuri, M. Schädler, and N. Morgan, "Comparing different flavors of spectro-temporal features for ASR, " Proc. INTERSPEECH, vol. 1, pp. 1269-1272, 2011.
- (2011) Proc. INTERSPEECH , vol.1 , pp. 1269-1272
- Meyer, B.¹ Ravuri, S.² Schädler, M.³ Morgan, N.⁴

18
- 84867619222
- Spectro-temporal gabor features for speaker recognition
- H. Lei, B. Meyer, and N. Mirghafori, "Spectro-temporal gabor features for speaker recognition, " in Proc. IEEE Conf. Acoust. , Speech, Signal Process. , 2012, pp. 4241-4244.
- (2012) Proc IEEE Conf. Acoust. Speech, Signal Process. , pp. 4241-4244
- Lei, H.¹ Meyer, B.² Mirghafori, N.³

19
- 84865738978
- Multistream bandpass modulation features for robust speech recognition
- S. Nemala, K. Patil, and M. Elhilali, "Multistream bandpass modulation features for robust speech recognition, " in Proc. ISCA, 2011, pp. 1277-1280.
- (2011) Proc. ISCA , pp. 1277-1280
- Nemala, S.¹ Patil, K.² Elhilali, M.³

20
- 0026626445
- Auditory representations of acoustic signals
- Mar
- X. Yang, K. Wang, and S. A. Shamma, "Auditory representations of acoustic signals, " IEEE Trans. Inf. Theory, vol. 38, no. 2, pp. 824-839, Mar. 1992.
- (1992) IEEE Trans. Inf. Theory , vol.38 , Issue.2 , pp. 824-839
- Yang, X.¹ Wang, K.² Shamma, S.A.³

21
- 23744508888
- Multiresolution spectrotemporal analysis of complex sounds
- DOI 10.1121/1.1945807
- T. Chi, P. Ru, and S. Shamma, "Multiresolution spectrotemporal analysis of complex sounds, " J. Acoust. Soc. Amer. , vol. 118, pp. 887-906, 2005. (Pubitemid 41129224)
- (2005) Journal of the Acoustical Society of America , vol.118 , Issue.2 , pp. 887-906
- Chi, T.¹ Ru, P.² Shamma, S.A.³

22
- 34247487053
- The cortical organization of speech processing
- DOI 10.1038/nrn2113, PII NRN2113
- G. Hickock and D. Poeppel, "The cortical organization of speech processing, " Nature Neurosc. Reviews, vol. 8, pp. 393-402, 2007. (Pubitemid 46652465)
- (2007) Nature Reviews Neuroscience , vol.8 , Issue.5 , pp. 393-402
- Hickok, G.¹ Poeppel, D.²

23
- 0032142971
- Cortical processing of complex sounds
- DOI 10.1016/S0959-4388(98)80040-8
- J. P. Rauschecker, "Cortical processing of complex sounds, " Curr. Opin. Neurobiol. , vol. 8, pp. 516-521, 1998. (Pubitemid 28431742)
- (1998) Current Opinion in Neurobiology , vol.8 , Issue.4 , pp. 516-521
- Rauschecker, J.P.¹

24
- 0018564438
- Temporal modulation transfer functions based upon modulation thresholds
- N. F. Viemeister, "Temporal modulation transfer functions based upon modulation thresholds, " J Acoust Soc Amer. , vol. 66, no. 5, pp. 1364-1380, Nov. 1979. (Pubitemid 10098323)
- (1979) Journal of the Acoustical Society of America , vol.66 , Issue.5 , pp. 1364-1380
- Viemeister, N.F.¹

25
- 0039816305
- Cambridge MA Plenum ch. Frequency and the detection of spectral shape change
- D. Green, Auditory Frequency Selectivity. Cambridge, MA: Plenum, 1986, ch. Frequency and the detection of spectral shape change, pp. 351-359.
- (1986) Auditory Frequency Selectivity , pp. 351-359
- Green, D.¹

26
- 0040290402
- Spectrotemporal modulation transfer functions and speech intelligibility
- T. Chi, Y. Gao, M. C. Guyton, P. Ru, and S. A. Shamma, "Spectrotemporal modulation transfer functions and speech intelligibility, " J. Acoust. Soc. Amer. , vol. 106, pp. 2719-2732, 1999.
- (1999) J. Acoust. Soc. Amer. , vol.106 , pp. 2719-2732
- Chi, T.¹ Gao, Y.² Guyton, M.C.³ Ru, P.⁴ Shamma, S.A.⁵

27
- 0027957839
- Effect of temporal envelope smearing on speech reception
- R. Drullman, J. Festen, and R. Plomp, "Effect of temporal envelope smearing on speech reception, " J. Acoust. Soc. Amer. , vol. 95, pp. 1053-1064, 1994. (Pubitemid 24056370)
- (1994) Journal of the Acoustical Society of America , vol.95 , Issue.2 , pp. 1053-1064
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

28
- 0038711696
- A spectro-temporal modulation index (STMI) for assessment of speech intelligibility
- M. Elhilali, T. Chi, and S. A. Shamma, "A spectro-temporal modulation index (STMI) for assessment of speech intelligibility, " Speech Commun. , vol. 41, pp. 331-348, 2003.
- (2003) Speech Commun. , vol.41 , pp. 331-348
- Elhilali, M.¹ Chi, T.² Shamma, S.A.³

29
- 14044252930
- Speech recognition with amplitude and frequency modulations
- DOI 10.1073/pnas.0406460102
- F.-G. Zeng, K. Nie, G. S. Stickney, Y.-Y. Kong, M. Vongphoe, A. Bhargave, C. Wei, and K. Cao, "Speech recognition with amplitude and frequency modulations, " Proc. National Acad. Sci. , USA, vol. 102, no. 7, pp. 2293-2298, Feb. 2005. (Pubitemid 40279369)
- (2005) Proceedings of the National Academy of Sciences of the United States of America , vol.102 , Issue.7 , pp. 2293-2298
- Zeng, F.-G.¹ Nie, K.² Stickney, G.S.³ Kong, Y.-Y.⁴ Vongphoe, M.⁵ Bhargave, A.⁶ Wei, C.⁷ Cao, K.⁸

30
- 63549114783
- The modulation transfer function for speech intelligibility
- T. Elliott and F. Theunissen, "The modulation transfer function for speech intelligibility, " PLoS Comput. Biol. , vol. 5, p. e1000302, 2009.
- (2009) PLoS Comput. Biol. , vol.5
- Elliott, T.¹ Theunissen, F.²

31
- 0003548585
- Philadelphia, PA: Linguistic Data Consortium
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus. Philadelphia, PA: Linguistic Data Consortium, 1993, p. LDC93S1.
- (1993) DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus
- Garofolo, J.S.¹ Lamel, L.F.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵ Dahlgren, N.L.⁶

32
- 0003573244
- Dordrecht The Netherlands Kluwer
- H. Bourlard and N. Morgan, Connectionist Speech Recognition: A Hybrid Approach. Dordrecht, The Netherlands: Kluwer, 1994, p. 348.
- (1994) Connectionist Speech Recognition: A Hybrid Approach , pp. 348
- Bourlard, H.¹ Morgan, N.²

33
- 0024768209
- Speaker-independent phone recognition using hidden Markov models
- Nov
- K. F. Lee and H. W. Hon, "Speaker-independent phone recognition using hidden Markov models, " IEEE Trans. Acoust. , Speech, Signal Process. , vol. 37, no. 11, pp. 1641-1648, Nov. 1989.
- (1989) IEEE Trans. Acoust. , Speech, Signal Process. , vol.37 , Issue.11 , pp. 1641-1648
- Lee, K.F.¹ Hon, H.W.²

34
- 77957731201
- Datadriven and feedback based spectro-temporal features for speech recognition
- Nov.
- S. Garimella, S. Nemala, N. Mesgarani, and H. Hermansky, "Datadriven and feedback based spectro-temporal features for speech recognition, " IEEE Signal Process. Lett. , vol. 17, no. 11, pp. 957-960, Nov. 2010.
- (2010) IEEE Signal Process. Lett. , vol.17 , Issue.11 , pp. 957-960
- Garimella, S.¹ Nemala, S.² Mesgarani, N.³ Hermansky, H.⁴

35
- 11144222882
- Comparison and combination of features in a hybrid HMM/MLP and a HMM/GMM speech recognition system
- DOI 10.1109/TSA.2004.834466
- P. Pujol, S. Pol, C. Nadeu, A. Hagen, and H. Bourlard, "Comparison and combination of features in a hybridHMM/MLP and aHMM/GMM speech recognition system, " IEEE Trans. Speech Audio Process. , vol. 13, no. 1, pp. 14-22, Jan. 2005. (Pubitemid 40049936)
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.1 , pp. 14-22
- Pujol, P.¹ Pol, S.² Nadeu, C.³ Hagen, A.⁴ Bourlard, H.⁵

36
- 0001595997
- Neural network classifiers estimate Bayesian a posteriori probabilities
- M. Richard and R. Lippmann, "Neural network classifiers estimate Bayesian a posteriori probabilities, " Neural Computation, vol. 3, no. 4, pp. 461-483, 1991.
- (1991) Neural Computation , vol.3 , Issue.4 , pp. 461-483
- Richard, M.¹ Lippmann, R.²

37
- 78049251448
- Analyzing MLP. Based Hierarchical Phoneme posterior probability estimator
- J. Pinto, S. Garimella, M. Magimai.-Doss, H. Hermansky, and H. Bourlard, "Analyzing MLP. Based Hierarchical Phoneme posterior probability estimator, " IEEE Trans. Speech and Audio Process. , vol. 19, pp. 225-241, 2011.
- (2011) IEEE Trans. Speech and Audio Process. , vol.19 , pp. 225-241
- Pinto, J.¹ Garimella, S.² Magimai.-Doss, M.³ Hermansky, H.⁴ Bourlard, H.⁵

38
- 70450144093
- Ph. D. dissertation, UC Berkeley, Berkeley, CA
- D. Gelbart, "Ensemble feature selection for multi-stream automatic speech recognition, " Ph. D. dissertation, UC Berkeley, Berkeley, CA, 2008.
- (2008) Ensemble Feature Selection for Multi-stream Automatic Speech Recognition
- Gelbart, D.¹

39
- 0004319968
- Defense Research Agency, Malvern, U. K. Tech. Rep.
- A. Varga, H. Steeneken, M. Tomlinson, and D. Jones, The Noisex-92 study on the effect of additive noise on automatic speech recognition Speech Research Unit, Defense Research Agency, Malvern, U. K. , 1992, Tech. Rep. .
- (1992) The Noisex-92 Study on the Effect of Additive Noise on Automatic Speech Recognition Speech Research Unit
- Varga, A.¹ Steeneken, H.² Tomlinson, M.³ Jones, D.⁴

40
- 79551573428
- [Online]. Available date last viewed 11/25/2011
- H. Hirsch, FaNT: Filtering andNoiseAdding Tool. [Online]. Available: http://dnt. kr. hsnr. de/download. html (date last viewed 11/25/2011), 2005
- (2005) FaNT: Filtering AndNoiseAdding Tool
- Hirsch, H.¹

41
- 84962920708
- Evaluating long-term spectral subtraction for reverberant ASR
- ASRU
- D. Gelbart and N. Morgan, "Evaluating long-term spectral subtraction for reverberant ASR, " in Proc. IEEE Workshop Autom. Speech Recognit. Understanding (ASRU), 2001, pp. 103-106.
- (2001) Proc IEEE Workshop Autom. Speech Recognit. Understanding , pp. 103-106
- Gelbart, D.¹ Morgan, N.²

42
- 84871840545
- Philadelphia PA: Linguistic Data Consortium
- D. Reynolds, HTIMIT. Philadelphia, PA: Linguistic Data Consortium, 1998, p. LDC98S67.
- (1998) HTIMIT
- Reynolds, D.¹

43
- 0042033109
- Burlington, MA: Morgan Kaufmann
- Readings in Speech Recognition, A. Waibel and K. Lee, Eds. Burlington, MA: Morgan Kaufmann, 1990, p. 680.
- (1990) Readings in Speech Recognition , pp. 680
- Waibel, A.¹ Lee, K.²

44
- 0028517164
- RASTA processing of speech
- Oct
- H. Hermansky and N. Morgan, "RASTA processing of speech, " IEEE Trans. Speech Audio Process. , vol. 2, no. 4, pp. 382-395, Oct. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 382-395
- Hermansky, H.¹ Morgan, N.²

45
- 79952171347
- Temporal envelope compensation for robust phoneme recognition using modulation spectrum
- S. Ganapathy, S. Thomas, and H. Hermansky, "Temporal envelope compensation for robust phoneme recognition using modulation spectrum, " J. Acoust. Soc. Amer. , vol. 128, pp. 3769-3780, 2010.
- (2010) J. Acoust. Soc. Amer. , vol.128 , pp. 3769-3780
- Ganapathy, S.¹ Thomas, S.² Hermansky, H.³

46
- 0037828299
- Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers
- DOI 10.1121/1.1579009
- M. K. Qin and A. J. Oxenham, "Effects of simulated cochlear-implant processing on speech reception in fluctuating maskers, " J. Acoust. Soc. Amer. , vol. 114, no. 1, pp. 446-454, Jul. 2003. (Pubitemid 36835514)
- (2003) Journal of the Acoustical Society of America , vol.114 , Issue.1 , pp. 446-454
- Qin, M.K.¹ Oxenham, A.J.²

47
- 84871829391
- Two time scales in speech processing
- New York
- M. Chait, S. Greenberg, T. Arai, J. Simon, and D. Poeppel, "Two time scales in speech processing, " in Proc. Annu. Meeting Cognitive Neurosci. Soc. , New York, 2005.
- (2005) Proc. Annu. Meeting Cognitive Neurosci. Soc.
- Chait, M.¹ Greenberg, S.² Arai, T.³ Simon, J.⁴ Poeppel, D.⁵

48
- 79960669709
- Toward optimizing stream fusion in multistream recognition of speech
- N. Mesgarani, S. Thomas, and H. Hermansky, "Toward optimizing stream fusion in multistream recognition of speech, " J. Acoust. Soc. Amer. , vol. 130, pp. 14-18, 2011.
- (2011) J. Acoust. Soc. Amer. , vol.130 , pp. 14-18
- Mesgarani, N.¹ Thomas, S.² Hermansky, H.³

49
- 84878421888
- Robust phoneme recognition based on biomimetic speech contours
- INTERSPEECH
- M. Carlin, K. Patil, S. Nemala, and M. Elhilali, "Robust phoneme recognition based on biomimetic speech contours, " in Proc. 13th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH, 2012.
- (2012) Proc. 13th Annu. Conf. Int. Speech Commun. Assoc.
- Carlin, M.¹ Patil, K.² Nemala, S.³ Elhilali, M.⁴

50
- 0033709098
- Tandem connectionist feature extraction for conventional HMM systems
- H. Hermansky, D. P. Ellis, and S. Sharma, "Tandem connectionist feature extraction for conventional HMM systems, " in Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , 2000, pp. 1635-1638.
- (2000) Proc IEEE Int. Conf. Acoust. , Speech, Signal Process. , pp. 1635-1638
- Hermansky, H.¹ Ellis, D.P.² Sharma, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.