메뉴 건너뛰기




Volumn 29, Issue 6, 2012, Pages 114-126

Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition

Author keywords

[No Author keywords available]

Indexed keywords

ARCHITECTURAL ACOUSTICS; CONSUMER PRODUCTS; DEEP NEURAL NETWORKS; DIGITAL TELEVISION; INFORMATION SERVICES; INTERACTIVE TELEVISION; MICROPHONES; RESEARCH LABORATORIES; REVERBERATION; SPEECH;

EID: 85032751613     PISSN: 10535888     EISSN: None     Source Type: Journal    
DOI: 10.1109/MSP.2012.2205029     Document Type: Article
Times cited : (235)

References (48)
  • 2
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observation of Markov chains
    • J. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observation of Markov chains," IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 291-298, 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.1    Lee, C.-H.2
  • 3
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. Legetter and P. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Language, vol. 9, no. 2, pp. 171-185, 1995.
    • (1995) Comput. Speech Language , vol.9 , Issue.2 , pp. 171-185
    • Legetter, C.1    Woodland, P.2
  • 4
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • PII S1063667696067120
    • M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 352-359, 1996. (Pubitemid 126753023)
    • (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 352-359
    • Gales, M.J.F.1    Young, S.J.2
  • 6
    • 84901773892 scopus 로고    scopus 로고
    • Environmental robustness
    • J. Benesty, M. M. Sondhi, and Y. Huang, Eds. Berlin: Springer-Verlag
    • J. Droppo and A. Acero, "Environmental robustness," in Springer Handbook of Speech Processing, J. Benesty, M. M. Sondhi, and Y. Huang, Eds. Berlin: Springer-Verlag, 2008, pp. 653-679.
    • (2008) Springer Handbook of Speech Processing , pp. 653-679
    • Droppo, J.1    Acero, A.2
  • 8
    • 0032136330 scopus 로고    scopus 로고
    • Robust speech recognition using the modulation spectrogram
    • PII S0167639398000326
    • B. E. D. Kingsbury, N. Morgan, and S. Greenberg, "Robust speech recognition using the modulation spectrogram," Speech Commun., vol. 25, no. 1-3, pp. 117-132, 1998. (Pubitemid 128413637)
    • (1998) Speech Communication , vol.25 , Issue.1-3 , pp. 117-132
    • Kingsbury, B.E.D.1    Morgan, N.2    Greenberg, S.3
  • 9
    • 85009252959 scopus 로고    scopus 로고
    • Double the trouble: Handling noise and reverberation in far-field automatic speech recognition
    • D. Gelbart and N. Morgan, "Double the trouble: handling noise and reverberation in far-field automatic speech recognition," in Proc. Int. Conf. Spoken Language Process., 2002, pp. 2185-2188.
    • (2002) Proc. Int. Conf. Spoken Language Process. , pp. 2185-2188
    • Gelbart, D.1    Morgan, N.2
  • 10
    • 4344607755 scopus 로고    scopus 로고
    • Likelihood-maximizing beamforming for robust hands-free speech recognition
    • M. L. Seltzer, B. Raj, and R. M. Stern, "Likelihood-maximizing beamforming for robust hands-free speech recognition," IEEE Trans. Speech Audio Process., vol. 12, no. 5, pp. 489-498, 2004.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.5 , pp. 489-498
    • Seltzer, M.L.1    Raj, B.2    Stern, R.M.3
  • 11
    • 84971352567 scopus 로고    scopus 로고
    • 5th ed., Abingdon, Oxon: Spon Press
    • H. Kuttruff, Room Acoustics, 5th ed., Abingdon, Oxon: Spon Press, 2009.
    • (2009) Room Acoustics
    • Kuttruff, H.1
  • 12
    • 0018455820 scopus 로고
    • Image method for efficiently simulating smallroom acoustics
    • J. B. Allen and D. A. Berkley, "Image method for efficiently simulating smallroom acoustics," J. Acoust. Soc. Amer., vol. 65, no. 4, pp. 943-950, 1979.
    • (1979) J. Acoust. Soc. Amer. , vol.65 , Issue.4 , pp. 943-950
    • Allen, J.B.1    Berkley, D.A.2
  • 13
    • 83455165201 scopus 로고    scopus 로고
    • Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria
    • T. Nishiura, Y. Hirano, Y. Denda, and M. Nakayama, "Investigations into early and late reflections on distant-talking speech recognition toward suitable reverberation criteria," in Proc. Interspeech, 2007, pp. 1082-1085.
    • (2007) Proc. Interspeech , pp. 1082-1085
    • Nishiura, T.1    Hirano, Y.2    Denda, Y.3    Nakayama, M.4
  • 15
    • 0022667694 scopus 로고
    • Speaker-independent isolated word recognition using dynamic features of speech spectrum
    • S. Furui, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Process., vol. 34, no. 1, pp. 52-59, 1986. (Pubitemid 16575387)
    • (1986) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-34 , Issue.1 , pp. 52-59
    • Furui Sadaoki1
  • 18
    • 70449360175 scopus 로고    scopus 로고
    • Modulation spectral features for robust far-field speaker identification
    • T. H. Falk and W.-Y. Chan, "Modulation spectral features for robust far-field speaker identification," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 1, pp. 90-100, 2010.
    • (2010) IEEE Trans. Audio, Speech, Language Process. , vol.18 , Issue.1 , pp. 90-100
    • Falk, T.H.1    Chan, W.-Y.2
  • 22
    • 80051612150 scopus 로고    scopus 로고
    • A model-based approach to joint compensation of noise and reverberation for speech recognition
    • D. Kolossa and R. Haeb-Umbach, Eds. Berlin: Springer-Verlag
    • A. Krueger and R. Haeb-Umbach, "A model-based approach to joint compensation of noise and reverberation for speech recognition," in Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications, D. Kolossa and R. Haeb-Umbach, Eds. Berlin: Springer-Verlag, 2011, pp. 257-290.
    • (2011) Robust Speech Recognition of Uncertain or Missing Data: Theory and Applications , pp. 257-290
    • Krueger, A.1    Haeb-Umbach, R.2
  • 23
    • 0003807773 scopus 로고    scopus 로고
    • 4th ed. Englewood Cliffs, NJ: Prentice Hall
    • S. Haykin, Adaptive Filter Theory, 4th ed. Englewood Cliffs, NJ: Prentice Hall, 2001.
    • (2001) Adaptive Filter Theory
    • Haykin, S.1
  • 24
    • 0141479055 scopus 로고    scopus 로고
    • Strategies for improving audible quality and speech recognition accuracy of reverberant speech
    • B. W. Gillespie and L. E. Atlas, "Strategies for improving audible quality and speech recognition accuracy of reverberant speech," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2003, pp. 676-679.
    • (2003) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 676-679
    • Gillespie, B.W.1    Atlas, L.E.2
  • 25
    • 0042362199 scopus 로고    scopus 로고
    • Blind single channel deconvolution using nonstationary signal processing
    • J. R. Hopgood and P. J. W. Rayner, "Blind single channel deconvolution using nonstationary signal processing," IEEE Trans. Speech Audio Process., vol. 11, no. 5, pp. 476-488, 2003.
    • (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.5 , pp. 476-488
    • Hopgood, J.R.1    Rayner, P.J.W.2
  • 26
    • 33947616910 scopus 로고    scopus 로고
    • Delay and predict equalization for blind speech dereverberation
    • 1661221, Audio and Electroacoustics Multimedia Signal Processing Machine Learning for Signal Processing Special Sessions, 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
    • M. Triki and D. T. M. Slock, "Delay and predict equalization for blind speech dereverberation," in Proc. Int. Conf. Acoust, Speech, Signal Process., 2006, pp. V-97-V-100. (Pubitemid 46500976)
    • (2006) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.5
    • Triki, M.1    Slock, D.T.M.2
  • 27
    • 85091450770 scopus 로고    scopus 로고
    • TRINICON for dereverberation of speech and audio signals
    • P. A. Naylor and N. D. Gaubitch, Eds. Berlin: Springer-Verlag
    • H. Buchner and W. Kellermann, "TRINICON for dereverberation of speech and audio signals," in Speech Dereverberation, P. A. Naylor and N. D. Gaubitch, Eds. Berlin: Springer-Verlag, 2010, pp. 311-385.
    • (2010) Speech Dereverberation , pp. 311-385
    • Buchner, H.1    Kellermann, W.2
  • 28
    • 65249167097 scopus 로고    scopus 로고
    • Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction
    • K. Kinoshita, M. Delcroix, T. Nakatani, and M. Miyoshi, "Suppression of late reverberation effect on speech signal using long-term multiple-step linear prediction," IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 4, pp. 534-545, 2009.
    • (2009) IEEE Trans. Audio, Speech, Language Process. , vol.17 , Issue.4 , pp. 534-545
    • Kinoshita, K.1    Delcroix, M.2    Nakatani, T.3    Miyoshi, M.4
  • 30
    • 84867831468 scopus 로고    scopus 로고
    • Variance compensation for recognition of reverberant speech with dereverberation preprocessing
    • R. Haeb-Umbach and D. Kolossa, Eds. Berlin: Springer-Verlag
    • M. Delcroix, S. Watanabe, and T. Nakatani, "Variance compensation for recognition of reverberant speech with dereverberation preprocessing," in Robust Speech Recognition of Uncertain or Missing Data, R. Haeb-Umbach and D. Kolossa, Eds. Berlin: Springer-Verlag, 2011, pp. 225-256.
    • (2011) Robust Speech Recognition of Uncertain or Missing Data , pp. 225-256
    • Delcroix, M.1    Watanabe, S.2    Nakatani, T.3
  • 33
    • 77955680097 scopus 로고    scopus 로고
    • Correlation-based and model-based blind single-channel late-reverberation suppression in noisy time-varying acoustical environments
    • J. S. Erkelens and R. Heusdens, "Correlation-based and model-based blind single-channel late-reverberation suppression in noisy time-varying acoustical environments," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 7, pp. 1746-1765, 2010.
    • (2010) IEEE Trans. Audio, Speech, Language Process. , vol.18 , Issue.7 , pp. 1746-1765
    • Erkelens, J.S.1    Heusdens, R.2
  • 34
    • 70349452200 scopus 로고    scopus 로고
    • Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms
    • H. Kameoka, T. Nakatani, and T. Yoshioka, "Robust speech dereverberation based on non-negativity and sparse nature of speech spectrograms," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2009, pp. 45-48.
    • (2009) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 45-48
    • Kameoka, H.1    Nakatani, T.2    Yoshioka, T.3
  • 36
    • 14344274593 scopus 로고    scopus 로고
    • A new method based on spectral subtraction for speech dereverberation
    • K. Lebart, J. M. Boucher, and P. N. Denbigh, "A new method based on spectral subtraction for speech dereverberation," Acta Acustica United with Acustica, vol. 87, no. 3, pp. 359-366, 2001. (Pubitemid 32699291)
    • (2001) Acta Acustica united with Acustica , vol.87 , Issue.3 , pp. 359-366
    • Lebart, K.1    Boucher, J.M.2    Denbigh, P.N.3
  • 37
    • 70350439261 scopus 로고    scopus 로고
    • Enhanced speech features by single-channel joint compensation of noise and reverberation
    • M. Wölfel, "Enhanced speech features by single-channel joint compensation of noise and reverberation," IEEE Trans. Audio, Speech, Language Process., vol. 17, no. 2, pp. 312-323, 2009.
    • (2009) IEEE Trans. Audio, Speech, Language Process. , vol.17 , Issue.2 , pp. 312-323
    • Wölfel, M.1
  • 38
    • 18744401086 scopus 로고    scopus 로고
    • Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
    • DOI 10.1109/TSA.2005.845814
    • L. Deng, J. Droppo, and A. Acero, "Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion," IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 412-421, 2005. (Pubitemid 40666175)
    • (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 412-421
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 39
    • 85032752225 scopus 로고    scopus 로고
    • Missing-feature approaches in speech recognition
    • DOI 10.1109/MSP.2005.1511828
    • B. Raj and R. M. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Processing Mag., vol. 22, no. 5, pp. 101-116, 2005. (Pubitemid 41488524)
    • (2005) IEEE Signal Processing Magazine , vol.22 , Issue.5 , pp. 101-116
    • Raj, B.1    Stern, R.M.2
  • 40
    • 2942539074 scopus 로고    scopus 로고
    • Techniques for handling convolutional distortion with 'missing data' automatic speech recognition
    • K. J. Palomäki, G. J. Brown, and J. P. Barker, "Techniques for handling convolutional distortion with 'missing data' automatic speech recognition," Speech Commun., vol. 43, no. 1-2, pp. 123-142, 2004.
    • (2004) Speech Commun. , vol.43 , Issue.1-2 , pp. 123-142
    • Palomäki, K.J.1    Brown, G.J.2    Barker, J.P.3
  • 41
    • 33947694706 scopus 로고    scopus 로고
    • Model adaptation for long convolutional distortion by maximum likelihood based state filtering approach
    • 1660225, Speech and Spoken Language Processing, 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing - Proceedings
    • C. K. Raut, T. Nishimoto, and S. Sagayama, "Model adaptation for long convolutional distortion by maximum likelihood based state filtering approach," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2006, pp. I-1133-I-1136. (Pubitemid 46500021)
    • (2006) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1
    • Raut, C.K.1    Nishimoto, T.2    Sagayama, S.3
  • 42
    • 38649115063 scopus 로고    scopus 로고
    • A new approach for the adaptation of HMMs to reverberation and background noise
    • DOI 10.1016/j.specom.2007.09.004, PII S0167639307001513
    • H.-G. Hirsch and H. Finster, "A new approach for the adaptation of HMMs to reverberation and background noise," Speech Commun., vol. 50, no. 3, pp. 244-263, 2008. (Pubitemid 351172473)
    • (2008) Speech Communication , vol.50 , Issue.3 , pp. 244-263
    • Hirsch, H.-G.1    Finster, H.2
  • 43
    • 84863760759 scopus 로고    scopus 로고
    • Adapting HMMs of distant-talking ASR systems using feature-domain reverberation models
    • A. Sehr, M. Gardill, and W. Kellermann, "Adapting HMMs of distant-talking ASR systems using feature-domain reverberation models," in Proc. European Signal Process. Conf., 2009, pp. 540-543.
    • (2009) Proc. European Signal Process. Conf. , pp. 540-543
    • Sehr, A.1    Gardill, M.2    Kellermann, W.3
  • 45
    • 33645784228 scopus 로고    scopus 로고
    • Acoustic model adaptation using first-order linear prediction for reverberant speech
    • T. Takiguchi, M. Nishimura, and Y. Ariki, "Acoustic model adaptation using first-order linear prediction for reverberant speech," IEICE Trans. Inform. Syst., vol. E89-D, no. 3, pp. 908-914, 2006.
    • (2006) IEICE Trans. Inform. Syst. , vol.E89-D , Issue.3 , pp. 908-914
    • Takiguchi, T.1    Nishimura, M.2    Ariki, Y.3
  • 46
    • 77955683144 scopus 로고    scopus 로고
    • Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition
    • A. Sehr, R. Maas, and W. Kellermann, "Reverberation model-based decoding in the logmelspec domain for robust distant-talking speech recognition," IEEE Trans. Audio, Speech, Language Process., vol. 18, no. 7, pp. 1676-1691, 2010.
    • (2010) IEEE Trans. Audio, Speech, Language Process. , vol.18 , Issue.7 , pp. 1676-1691
    • Sehr, A.1    Maas, R.2    Kellermann, W.3
  • 48
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • M. J. F. Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 272-281, 1999.
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.3 , pp. 272-281
    • Gales, M.J.F.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.