메뉴 건너뛰기




Volumn 24, Issue 1, 2010, Pages 77-93

A computational auditory scene analysis system for speech segregation and robust speech recognition

Author keywords

Binary time frequency mask; Computational Auditory Scene Analysis; Robust speech recognition; Speech segregation; Uncertainty decoding

Indexed keywords

BINARY TIME-FREQUENCY MASK; COMPUTATIONAL AUDITORY SCENE ANALYSIS; ROBUST SPEECH RECOGNITION; SPEECH SEGREGATION; UNCERTAINTY DECODING;

EID: 69249159165     PISSN: 08852308     EISSN: 10958363     Source Type: Journal    
DOI: 10.1016/j.csl.2008.03.004     Document Type: Article
Times cited : (103)

References (38)
  • 4
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • Cooke M., Green P., Josifovski L., and Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34 (2001) 267-285
    • (2001) Speech Commun. , vol.34 , pp. 267-285
    • Cooke, M.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 5
    • 18744401086 scopus 로고    scopus 로고
    • Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
    • Deng L., Droppo J., and Acero A. Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion. IEEE Trans. Speech Audio Process. 13 (2005) 412-421
    • (2005) IEEE Trans. Speech Audio Process. , vol.13 , pp. 412-421
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 6
    • 4544369701 scopus 로고    scopus 로고
    • A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel
    • Deoras, A.N., Hasegawa-Johnson, M., 2004. A factorial HMM approach to simultaneous recognition of isolated digits spoken by multiple talkers on one audio channel. In: Proceedings of ICASSP'04, vol. 1. pp. 861-864.
    • (2004) Proceedings of ICASSP'04 , vol.1 , pp. 861-864
    • Deoras, A.N.1    Hasegawa-Johnson, M.2
  • 7
    • 0026843273 scopus 로고
    • A Bayesian estimation approach for speech enhancement using hidden Markov models
    • Ephraim Y. A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Trans. Signal Process. 40 4 (1992) 725-735
    • (1992) IEEE Trans. Signal Process. , vol.40 , Issue.4 , pp. 725-735
    • Ephraim, Y.1
  • 9
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • Gales M.J.F., and Young S.J. Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Process. 4 (1996) 352-359
    • (1996) IEEE Trans. Speech Audio Process. , vol.4 , pp. 352-359
    • Gales, M.J.F.1    Young, S.J.2
  • 11
    • 4644265990 scopus 로고    scopus 로고
    • Monaural speech segregation based on pitch tracking and amplitude modulation
    • Hu G., and Wang D.L. Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Trans. Neural Networks 15 (2004) 1135-1150
    • (2004) IEEE Trans. Neural Networks , vol.15 , pp. 1135-1150
    • Hu, G.1    Wang, D.L.2
  • 12
    • 46049084696 scopus 로고    scopus 로고
    • An auditory scene analysis approach to monaural speech segregation
    • Hansler E., and Schmidt G. (Eds), Springer, Heidelberg
    • Hu G., and Wang D.L. An auditory scene analysis approach to monaural speech segregation. In: Hansler E., and Schmidt G. (Eds). Topics in Acoustic Echo and Noise Control (2006), Springer, Heidelberg 485-515
    • (2006) Topics in Acoustic Echo and Noise Control , pp. 485-515
    • Hu, G.1    Wang, D.L.2
  • 13
    • 38849102154 scopus 로고    scopus 로고
    • Auditory segmentation based on onset and offset analysis
    • Hu G., and Wang D.L. Auditory segmentation based on onset and offset analysis. IEEE Trans. Audio Speech Language Process. 15 (2007) 396-405
    • (2007) IEEE Trans. Audio Speech Language Process. , vol.15 , pp. 396-405
    • Hu, G.1    Wang, D.L.2
  • 15
    • 84899014722 scopus 로고    scopus 로고
    • A probabilistic approach to single channel blind signal separation
    • Becker S., Thrun S., and Obermayer K. (Eds), MIT Press, Cambridge, MA
    • Jang G., and Lee T. A probabilistic approach to single channel blind signal separation. In: Becker S., Thrun S., and Obermayer K. (Eds). Advances in Neural Information Processing Systems, vol. 15 (2003), MIT Press, Cambridge, MA 1173-1180
    • (2003) Advances in Neural Information Processing Systems, vol. 15 , pp. 1173-1180
    • Jang, G.1    Lee, T.2
  • 16
    • 4644257621 scopus 로고    scopus 로고
    • Single microphone source separation using high resolution signal reconstruction
    • Kristjansson, T., Attias, H., Hershey, J., 2004. Single microphone source separation using high resolution signal reconstruction. In: Proceedings of ICASSP'04, vol. 2. pp. 817-820.
    • (2004) Proceedings of ICASSP'04 , vol.2 , pp. 817-820
    • Kristjansson, T.1    Attias, H.2    Hershey, J.3
  • 18
    • 0023944462 scopus 로고
    • Simulation of auditory neural transduction: further studies
    • Meddis R. Simulation of auditory neural transduction: further studies. The Journal of the Acoustical Society of America 83 (1988) 1056-1063
    • (1988) The Journal of the Acoustical Society of America , vol.83 , pp. 1056-1063
    • Meddis, R.1
  • 21
    • 0009804718 scopus 로고
    • Auditory models as preprocessors for speech recognition
    • Schouten M.E.H. (Ed), Mouton de Gruyter, Berlin, Germany (Chapter 1)
    • Patterson R.D., Holdsworth J., and Allerhand M. Auditory models as preprocessors for speech recognition. In: Schouten M.E.H. (Ed). The Auditory Processing of Speech: From Sounds to Words (1992), Mouton de Gruyter, Berlin, Germany 67-83 (Chapter 1)
    • (1992) The Auditory Processing of Speech: From Sounds to Words , pp. 67-83
    • Patterson, R.D.1    Holdsworth, J.2    Allerhand, M.3
  • 22
    • 4644336054 scopus 로고    scopus 로고
    • Reconstruction of missing features for robust speech recognition
    • Raj B., Seltzer M.L., and Stern R.M. Reconstruction of missing features for robust speech recognition. Speech Commun. 43 (2004) 275-296
    • (2004) Speech Commun. , vol.43 , pp. 275-296
    • Raj, B.1    Seltzer, M.L.2    Stern, R.M.3
  • 24
    • 0142026377 scopus 로고    scopus 로고
    • Speech segregation based on sound localization
    • Roman N., Wang D.L., and Brown G.J. Speech segregation based on sound localization. J. Acoust. Soc. Am. 114 (2003) 2236-2252
    • (2003) J. Acoust. Soc. Am. , vol.114 , pp. 2236-2252
    • Roman, N.1    Wang, D.L.2    Brown, G.J.3
  • 25
    • 84892289719 scopus 로고    scopus 로고
    • Automatic speech processing by inference in generative models
    • Divenyi P. (Ed), Kluwer Academic, Norwell, MA
    • Roweis S.T. Automatic speech processing by inference in generative models. In: Divenyi P. (Ed). Speech Separation by Humans and Machines (2005), Kluwer Academic, Norwell, MA 97-134
    • (2005) Speech Separation by Humans and Machines , pp. 97-134
    • Roweis, S.T.1
  • 28
    • 34547499683 scopus 로고    scopus 로고
    • Incorporating auditory feature uncertainties in robust speaker identification
    • Shao, Y., Srinivasan, S., Wang, D.L., 2007. Incorporating auditory feature uncertainties in robust speaker identification. In: Proceedings of ICASSP'07, vol. IV, pp. 277-280.
    • (2007) Proceedings of ICASSP'07 , vol.4 , pp. 277-280
    • Shao, Y.1    Srinivasan, S.2    Wang, D.L.3
  • 29
    • 56249136428 scopus 로고    scopus 로고
    • Transforming binary uncertainties for robust speech recognition
    • Srinivasan S., and Wang D.L. Transforming binary uncertainties for robust speech recognition. IEEE Trans. Audio, Speech Lang. Process. 15 7 (2007) 2130-2140
    • (2007) IEEE Trans. Audio, Speech Lang. Process. , vol.15 , Issue.7 , pp. 2130-2140
    • Srinivasan, S.1    Wang, D.L.2
  • 30
    • 33750311718 scopus 로고    scopus 로고
    • Binary and ratio time-frequency masks for robust speech recognition
    • Srinivasan S., Roman N., and Wang D.L. Binary and ratio time-frequency masks for robust speech recognition. Speech Commun. 48 (2006) 1486-1501
    • (2006) Speech Commun. , vol.48 , pp. 1486-1501
    • Srinivasan, S.1    Roman, N.2    Wang, D.L.3
  • 31
    • 0025681008 scopus 로고
    • Hidden Markov model decomposition of speech and noise
    • Varga, A.P., Moore, R.K., 1990. Hidden Markov model decomposition of speech and noise. In: Proceedings of ICASSP'90, pp. 845-848.
    • (1990) Proceedings of ICASSP'90 , pp. 845-848
    • Varga, A.P.1    Moore, R.K.2
  • 32
    • 0026172104 scopus 로고
    • Watersheds in digital spaces: an efficient algorithm based on immersion simulations
    • Vincent L., and Soille P. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Trans. Pattern Anal. Mach. Intell. 13 6 (1991) 583-598
    • (1991) IEEE Trans. Pattern Anal. Mach. Intell. , vol.13 , Issue.6 , pp. 583-598
    • Vincent, L.1    Soille, P.2
  • 33
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary mask as the computational goal of auditory scene analysis
    • Divenyi P. (Ed), Norwell, MA
    • Wang D.L. On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi P. (Ed). Speech Separation by Humans and Machines (2005), Norwell, MA 181-197
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.L.1
  • 35
    • 0032682770 scopus 로고    scopus 로고
    • Separation of speech from interfering sounds based on oscillatory correlation
    • Wang D.L., and Brown G.J. Separation of speech from interfering sounds based on oscillatory correlation. IEEE Trans. Neural Networks 10 3 (1999) 684-697
    • (1999) IEEE Trans. Neural Networks , vol.10 , Issue.3 , pp. 684-697
    • Wang, D.L.1    Brown, G.J.2
  • 37
    • 84957681902 scopus 로고    scopus 로고
    • Weickert, J., 1997. A review of nonlinear diffusion filtering. In: Romeny, B.H., Florack, L.J.K.a.M.V. (Eds.), Scale-space Theory in Computer Vision. Springer, Berlin, pp. 3-28.
    • Weickert, J., 1997. A review of nonlinear diffusion filtering. In: Romeny, B.H., Florack, L.J.K.a.M.V. (Eds.), Scale-space Theory in Computer Vision. Springer, Berlin, pp. 3-28.
  • 38
    • 69249132867 scopus 로고    scopus 로고
    • Young, S, Kershaw, D, Odell, J, Valtchev, V, Woodland, P, 2000. The HTK Book for HTK Version 3.0, Microsoft Corporation
    • Young, S., Kershaw, D., Odell, J., Valtchev, V., Woodland, P., 2000. The HTK Book (for HTK Version 3.0). Microsoft Corporation.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.