메뉴 건너뛰기




Volumn 24, Issue 1, 2010, Pages 30-44

Monaural speech separation based on MAXVQ and CASA for robust speech recognition

Author keywords

Automatic speech recognition (ASR); Computational auditory scene analysis (CASA); Factorial max vector quantization (MAXVQ); Monaural speech separation

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION (ASR); COMPUTATIONAL AUDITORY SCENE ANALYSIS; COMPUTATIONAL AUDITORY SCENE ANALYSIS (CASA); FACTORIAL-MAX VECTOR QUANTIZATION (MAXVQ); GAUSSIAN MIXTURE MODELS; MONAURAL SPEECH SEPARATION; ROBUST SPEECH RECOGNITION; SPEAKER IDENTIFICATION; SPEECH SEPARATION; TARGET SPEAKER; VECTOR QUANTIZERS;

EID: 69249203845     PISSN: 08852308     EISSN: 10958363     Source Type: Journal    
DOI: 10.1016/j.csl.2008.05.005     Document Type: Article
Times cited : (39)

References (52)
  • 2
    • 69249220880 scopus 로고    scopus 로고
    • Recent advances in speech fragment decoding techniques
    • Barker, J., Coy, A., Ma, N., Cooke, M., 2006. Recent advances in speech fragment decoding techniques. In: ICSLP'2006.
    • (2006) ICSLP
    • Barker, J.1    Coy, A.2    Ma, N.3    Cooke, M.4
  • 3
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • Boll S.F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustic Speech Signal Processing 27 2 (1979) 113-120
    • (1979) IEEE Transactions on Acoustic Speech Signal Processing , vol.27 , Issue.2 , pp. 113-120
    • Boll, S.F.1
  • 6
    • 33644639591 scopus 로고    scopus 로고
    • Separation of speech by computational auditory scene analysis
    • Benesty J., Makino S., and Chen J. (Eds), Springer, New York
    • Brown G.J., and Wang D.L. Separation of speech by computational auditory scene analysis. In: Benesty J., Makino S., and Chen J. (Eds). Speech Enhancement (2005), Springer, New York 371-402
    • (2005) Speech Enhancement , pp. 371-402
    • Brown, G.J.1    Wang, D.L.2
  • 8
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • Cooke M.P., Green P., Josifovski L., and Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34 (2001) 267-285
    • (2001) Speech Communication , vol.34 , pp. 267-285
    • Cooke, M.P.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 9
    • 0035478859 scopus 로고    scopus 로고
    • The auditory organization of speech and other sources in listeners and computational models
    • Cooke M.P., and Ellis D.P.W. The auditory organization of speech and other sources in listeners and computational models. Speech Communication 31 (2001) 141-177
    • (2001) Speech Communication , vol.31 , pp. 141-177
    • Cooke, M.P.1    Ellis, D.P.W.2
  • 11
    • 37849011878 scopus 로고    scopus 로고
    • The foreign language cocktail party problem: energetic and informational masking effects in non-native speech perception
    • Cooke M.P., Garcia Lecumberri M.L., and Barker J.P. The foreign language cocktail party problem: energetic and informational masking effects in non-native speech perception. Journal of the Acoustical Society of America (2008)
    • (2008) Journal of the Acoustical Society of America
    • Cooke, M.P.1    Garcia Lecumberri, M.L.2    Barker, J.P.3
  • 12
    • 0001698589 scopus 로고
    • Auditory grouping
    • The handbook of perception and cognition. Moore B.C.J. (Ed), Academic, London
    • Darwin C.J., and Carlyon R.P. Auditory grouping. In: Moore B.C.J. (Ed). The handbook of perception and cognition. Hearing (1995), Academic, London 387-424
    • (1995) Hearing , pp. 387-424
    • Darwin, C.J.1    Carlyon, R.P.2
  • 13
    • 0033964646 scopus 로고    scopus 로고
    • Effectiveness of spatial cues, prosody, and talker characteristics in selective attention
    • Darwin C.J., and Hukin R.W. Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. Journal of the Acoustical Society of America 107 2 (2000) 977-979
    • (2000) Journal of the Acoustical Society of America , vol.107 , Issue.2 , pp. 977-979
    • Darwin, C.J.1    Hukin, R.W.2
  • 14
    • 0027229711 scopus 로고
    • Influence of background noise and microphone on the performance of the ibm tangora speech recognition system
    • Das, S., Bakis, R., Nadas, A., Nahamoo, D., Picheny, M., 1993. Influence of background noise and microphone on the performance of the ibm tangora speech recognition system. In: Proceedings of the ICASSP'93, pp. 95-98.
    • (1993) Proceedings of the ICASSP'93 , pp. 95-98
    • Das, S.1    Bakis, R.2    Nadas, A.3    Nahamoo, D.4    Picheny, M.5
  • 16
    • 0017804799 scopus 로고
    • On cochlear encoding: potentialities and limitations of the reverse-correlation techniques
    • de Boer E., and de Jongh H.R. On cochlear encoding: potentialities and limitations of the reverse-correlation techniques. Journal of the Acoustical Society of America 63 (1978) 115-135
    • (1978) Journal of the Acoustical Society of America , vol.63 , pp. 115-135
    • de Boer, E.1    de Jongh, H.R.2
  • 17
    • 0032626792 scopus 로고    scopus 로고
    • Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech nonspeech mixtures
    • Ellis D.P.W. Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech nonspeech mixtures. Speech Communication 27 (1999) 281-298
    • (1999) Speech Communication , vol.27 , pp. 281-298
    • Ellis, D.P.W.1
  • 18
    • 69249201885 scopus 로고    scopus 로고
    • ETSI, 2002. ETSI draft standard doc speech processing, transmission and quality aspects; distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm. ETSI ES 202 050 V0.1.0.
    • ETSI, 2002. ETSI draft standard doc speech processing, transmission and quality aspects; distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm. ETSI ES 202 050 V0.1.0.
  • 23
    • 0030263447 scopus 로고    scopus 로고
    • Mean and variance adaptation within the MLLR framework
    • Gales M.J.F., and Woodland P.C. Mean and variance adaptation within the MLLR framework. Computer Speech and Language 10 (1996) 249-264
    • (1996) Computer Speech and Language , vol.10 , pp. 249-264
    • Gales, M.J.F.1    Woodland, P.C.2
  • 24
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • Gales M.J.F. Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 12 (1998) 75-98
    • (1998) Computer Speech and Language , vol.12 , pp. 75-98
    • Gales, M.J.F.1
  • 25
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains
    • Gauvain J.L., and Lee C.H. Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2 2 (1994) 291-298
    • (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.L.1    Lee, C.H.2
  • 26
    • 0032670621 scopus 로고    scopus 로고
    • A blackboard architecture for computational auditory scene analysis
    • Godsmark D., and Brown G.J. A blackboard architecture for computational auditory scene analysis. Speech Communication 27 3-4 (1999) 351-366
    • (1999) Speech Communication , vol.27 , Issue.3-4 , pp. 351-366
    • Godsmark, D.1    Brown, G.J.2
  • 27
    • 0029288202 scopus 로고
    • Speech recognition in noisy environments: a survey
    • Gong Y. Speech recognition in noisy environments: a survey. Speech Communication 16 (1995) 191-261
    • (1995) Speech Communication , vol.16 , pp. 191-261
    • Gong, Y.1
  • 28
    • 78149458724 scopus 로고    scopus 로고
    • Handling missing and unreliable information in speech recognition
    • Green, P., Barker, J., Cooke, M.P., Josifovski, L., 2001. Handling missing and unreliable information in speech recognition. In: AISTATS'2001.
    • (2001) AISTATS
    • Green, P.1    Barker, J.2    Cooke, M.P.3    Josifovski, L.4
  • 29
    • 4644265990 scopus 로고    scopus 로고
    • Monaural speech segregation based on pitch tracking and amplitude modulation
    • Hu G.N., and Wang D.L. Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Network 15 5 (2004) 1135-1150
    • (2004) IEEE Transactions on Neural Network , vol.15 , Issue.5 , pp. 1135-1150
    • Hu, G.N.1    Wang, D.L.2
  • 30
    • 33646786460 scopus 로고    scopus 로고
    • Separation of fricatives and affricates
    • Hu, G.N., Wang, D.L., 2005. Separation of fricatives and affricates. In: ICASSP'2005.
    • (2005) ICASSP
    • Hu, G.N.1    Wang, D.L.2
  • 33
    • 51449085519 scopus 로고    scopus 로고
    • Super-human multi-talker speech recognition: The IBM 2006 speech separation challenge system
    • Kristjansson, T., Hershey, J., Olsen, P., Rennie, S., Gopinath, R., 2006. Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system. In: ICSLP'2006.
    • (2006) ICSLP
    • Kristjansson, T.1    Hershey, J.2    Olsen, P.3    Rennie, S.4    Gopinath, R.5
  • 34
    • 0032651723 scopus 로고    scopus 로고
    • Integrated bias removal techniques for robust speech recognition
    • Lawrence C., and Rahim M. Integrated bias removal techniques for robust speech recognition. Computer Speech and Language 13 (1999) 283-298
    • (1999) Computer Speech and Language , vol.13 , pp. 283-298
    • Lawrence, C.1    Rahim, M.2
  • 35
    • 0032140546 scopus 로고    scopus 로고
    • On stochastic feature and model compensation approaches to robust speech recognition
    • Lee C.H. On stochastic feature and model compensation approaches to robust speech recognition. Speech Communication 25 (1998) 29-47
    • (1998) Speech Communication , vol.25 , pp. 29-47
    • Lee, C.H.1
  • 36
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • Leggetter C.J., and Woodland P.C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9 (1995) 171-185
    • (1995) Computer Speech and Language , vol.9 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 37
    • 40949108726 scopus 로고    scopus 로고
    • Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech
    • Li P., Guan Y., Xu B., and Liu W.J. Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing 14 6 (2006) 2014-2023
    • (2006) IEEE Transactions on Audio, Speech, and Language Processing , vol.14 , Issue.6 , pp. 2014-2023
    • Li, P.1    Guan, Y.2    Xu, B.3    Liu, W.J.4
  • 39
    • 69249204319 scopus 로고    scopus 로고
    • Combining missing-feature theory, speech enhancement and speaker-dependent/-independent modeling for speech separation
    • Ming, J., Hazen, T.J., Glass, J.R., 2006. Combining missing-feature theory, speech enhancement and speaker-dependent/-independent modeling for speech separation. In: ICSLP'2006.
    • (2006) ICSLP
    • Ming, J.1    Hazen, T.J.2    Glass, J.R.3
  • 42
    • 0142056390 scopus 로고
    • An efficient auditory filterbank based on the gammatone function, Applied Psychological Unit, Cambridge University, Cambridge, UK
    • 2341
    • Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P., 1988. An efficient auditory filterbank based on the gammatone function, Applied Psychological Unit, Cambridge University, Cambridge, UK, APU Report 2341.
    • (1988) APU Report
    • Patterson, R.D.1    Nimmo-Smith, I.2    Holdsworth, J.3    Rice, P.4
  • 43
    • 0029769867 scopus 로고    scopus 로고
    • Signal bias removal by maximum lielihood estimation for robust telephone speech recognition
    • Rahim M., and Juang B.H. Signal bias removal by maximum lielihood estimation for robust telephone speech recognition. IEEE Transactions on Speech and Audio Processing 4 (1996) 19-30
    • (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , pp. 19-30
    • Rahim, M.1    Juang, B.H.2
  • 44
    • 0038705102 scopus 로고    scopus 로고
    • One microphone source separation
    • Roweis S. One microphone source separation. In: NIPS' (2000)
    • (2000) In: NIPS'
    • Roweis, S.1
  • 45
    • 85009230793 scopus 로고    scopus 로고
    • Roweis, S., 2003. Factorial models and refiltering for speech separation and denoising. In: Eurospeech' 2003.
    • Roweis, S., 2003. Factorial models and refiltering for speech separation and denoising. In: Eurospeech' 2003.
  • 47
    • 0030149866 scopus 로고    scopus 로고
    • A maximum likelihood approach to stochastic matching for robust speech recognition
    • Sankar A., and Lee C.H. A maximum likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing 4 (1996) 190-202
    • (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , pp. 190-202
    • Sankar, A.1    Lee, C.H.2
  • 48
    • 69249207337 scopus 로고    scopus 로고
    • A computational auditory scene analysis system for robust speech recognition
    • Srinivasan, S., Shao, Y., Jin, Z.Z., Wang, D.L., 2006. A computational auditory scene analysis system for robust speech recognition. In: ICSLP'2006.
    • (2006) ICSLP
    • Srinivasan, S.1    Shao, Y.2    Jin, Z.Z.3    Wang, D.L.4
  • 49
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary mask as the computational goal of auditory scene analysis
    • Divenyi P. (Ed), Kluwer Academic, Norwell MA
    • Wang D.L. On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi P. (Ed). Speech Separation by Humans and Machines (2005), Kluwer Academic, Norwell MA 181-197
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.L.1
  • 50
    • 0032682770 scopus 로고    scopus 로고
    • Separation of speech from interfering sounds based on oscillatory correlation
    • Wang D.L., and Brown G.J. Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks 10 3 (1999) 684-697
    • (1999) IEEE Transactions on Neural Networks , vol.10 , Issue.3 , pp. 684-697
    • Wang, D.L.1    Brown, G.J.2
  • 52
    • 0001459635 scopus 로고    scopus 로고
    • Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises
    • Zhao Y. Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises. IEEE Transactions on Speech and Audio Processing 8 3 (2000) 255-266
    • (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.3 , pp. 255-266
    • Zhao, Y.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.