SCOPUS 정보 검색 플랫폼

Volumn 24, Issue 1, 2010, Pages 30-44

Monaural speech separation based on MAXVQ and CASA for robust speech recognition

(5) Li, Peng a Guan, Yong a Wang, Shijin a Xu, Bo a Liu, Wenju a

Author keywords

Automatic speech recognition (ASR); Computational auditory scene analysis (CASA); Factorial max vector quantization (MAXVQ); Monaural speech separation

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION (ASR); COMPUTATIONAL AUDITORY SCENE ANALYSIS; COMPUTATIONAL AUDITORY SCENE ANALYSIS (CASA); FACTORIAL-MAX VECTOR QUANTIZATION (MAXVQ); GAUSSIAN MIXTURE MODELS; MONAURAL SPEECH SEPARATION; ROBUST SPEECH RECOGNITION; SPEAKER IDENTIFICATION; SPEECH SEPARATION; TARGET SPEAKER; VECTOR QUANTIZERS;

BLIND SOURCE SEPARATION; PATIENT REHABILITATION; REMELTING; SEPARATION; SPEECH ANALYSIS; SPEECH PROCESSING; VECTOR QUANTIZATION; VECTORS;

SPEECH RECOGNITION;

EID: 69249203845 PISSN: 08852308 EISSN: 10958363 Source Type: Journal
DOI: 10.1016/j.csl.2008.05.005 Document Type: Article

Times cited : (39)

References (52)

1
- 0004319970
- Kluwer
- Acero A. Acoustical and Environmental Robustness in Automatic Speech Recognition (1992), Kluwer
- (1992) Acoustical and Environmental Robustness in Automatic Speech Recognition
- Acero, A.¹

2
- 69249220880
- Recent advances in speech fragment decoding techniques
- Barker, J., Coy, A., Ma, N., Cooke, M., 2006. Recent advances in speech fragment decoding techniques. In: ICSLP'2006.
- (2006) ICSLP
- Barker, J.¹ Coy, A.² Ma, N.³ Cooke, M.⁴

3
- 0018455310
- Suppression of acoustic noise in speech using spectral subtraction
- Boll S.F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Transactions on Acoustic Speech Signal Processing 27 2 (1979) 113-120
- (1979) IEEE Transactions on Acoustic Speech Signal Processing , vol.27 , Issue.2 , pp. 113-120
- Boll, S.F.¹

4
- 0003684441
- MIT Press, Cambridge
- Bregman A.S. Auditory Scene Analysis: the Perceptual Organization of Sound (1990), MIT Press, Cambridge
- (1990) Auditory Scene Analysis: the Perceptual Organization of Sound
- Bregman, A.S.¹

5
- 0028531926
- Computational auditory scene analysis
- Brown G.J., and Cooke M.P. Computational auditory scene analysis. Computer Speech and Language 8 (1994) 297-336
- (1994) Computer Speech and Language , vol.8 , pp. 297-336
- Brown, G.J.¹ Cooke, M.P.²

6
- 33644639591
- Separation of speech by computational auditory scene analysis
- Benesty J., Makino S., and Chen J. (Eds), Springer, New York
- Brown G.J., and Wang D.L. Separation of speech by computational auditory scene analysis. In: Benesty J., Makino S., and Chen J. (Eds). Speech Enhancement (2005), Springer, New York 371-402
- (2005) Speech Enhancement , pp. 371-402
- Brown, G.J.¹ Wang, D.L.²

7
- 0003479143
- Ph.D. Thesis, University of Sheffield
- Cooke, M.P., 1991. Modeling auditory processing and organization, Ph.D. Thesis, University of Sheffield.
- (1991) Modeling auditory processing and organization
- Cooke, M.P.¹

8
- 0035342414
- Robust automatic speech recognition with missing and unreliable acoustic data
- Cooke M.P., Green P., Josifovski L., and Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Communication 34 (2001) 267-285
- (2001) Speech Communication , vol.34 , pp. 267-285
- Cooke, M.P.¹ Green, P.² Josifovski, L.³ Vizinho, A.⁴

9
- 0035478859
- The auditory organization of speech and other sources in listeners and computational models
- Cooke M.P., and Ellis D.P.W. The auditory organization of speech and other sources in listeners and computational models. Speech Communication 31 (2001) 141-177
- (2001) Speech Communication , vol.31 , pp. 141-177
- Cooke, M.P.¹ Ellis, D.P.W.²

10
- 69249222117
- The 2006 Speech separation challenge
- Cooke M.P., and Lee T.-W. The 2006 Speech separation challenge. Computer Speech and Language (2008)
- (2008) Computer Speech and Language
- Cooke, M.P.¹ Lee, T.-W.²

11
- 37849011878
- The foreign language cocktail party problem: energetic and informational masking effects in non-native speech perception
- Cooke M.P., Garcia Lecumberri M.L., and Barker J.P. The foreign language cocktail party problem: energetic and informational masking effects in non-native speech perception. Journal of the Acoustical Society of America (2008)
- (2008) Journal of the Acoustical Society of America
- Cooke, M.P.¹ Garcia Lecumberri, M.L.² Barker, J.P.³

12
- 0001698589
- Auditory grouping
- The handbook of perception and cognition. Moore B.C.J. (Ed), Academic, London
- Darwin C.J., and Carlyon R.P. Auditory grouping. In: Moore B.C.J. (Ed). The handbook of perception and cognition. Hearing (1995), Academic, London 387-424
- (1995) Hearing , pp. 387-424
- Darwin, C.J.¹ Carlyon, R.P.²

13
- 0033964646
- Effectiveness of spatial cues, prosody, and talker characteristics in selective attention
- Darwin C.J., and Hukin R.W. Effectiveness of spatial cues, prosody, and talker characteristics in selective attention. Journal of the Acoustical Society of America 107 2 (2000) 977-979
- (2000) Journal of the Acoustical Society of America , vol.107 , Issue.2 , pp. 977-979
- Darwin, C.J.¹ Hukin, R.W.²

14
- 0027229711
- Influence of background noise and microphone on the performance of the ibm tangora speech recognition system
- Das, S., Bakis, R., Nadas, A., Nahamoo, D., Picheny, M., 1993. Influence of background noise and microphone on the performance of the ibm tangora speech recognition system. In: Proceedings of the ICASSP'93, pp. 95-98.
- (1993) Proceedings of the ICASSP'93 , pp. 95-98
- Das, S.¹ Bakis, R.² Nadas, A.³ Nahamoo, D.⁴ Picheny, M.⁵

15
- 0020795461
- On the effects of varying filter bank parameters on isolated word recognition
- Daytrich B.A., Rabiner L.R., and Martin T.B. On the effects of varying filter bank parameters on isolated word recognition. IEEE Transactions on Acoustics, Speech and Signal Processing 31 4 (1983) 793-897
- (1983) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.31 , Issue.4 , pp. 793-897
- Daytrich, B.A.¹ Rabiner, L.R.² Martin, T.B.³

16
- 0017804799
- On cochlear encoding: potentialities and limitations of the reverse-correlation techniques
- de Boer E., and de Jongh H.R. On cochlear encoding: potentialities and limitations of the reverse-correlation techniques. Journal of the Acoustical Society of America 63 (1978) 115-135
- (1978) Journal of the Acoustical Society of America , vol.63 , pp. 115-135
- de Boer, E.¹ de Jongh, H.R.²

17
- 0032626792
- Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech nonspeech mixtures
- Ellis D.P.W. Using knowledge to organize sound: the prediction-driven approach to computational auditory scene analysis and its application to speech nonspeech mixtures. Speech Communication 27 (1999) 281-298
- (1999) Speech Communication , vol.27 , pp. 281-298
- Ellis, D.P.W.¹

18
- 69249201885
- ETSI, 2002. ETSI draft standard doc speech processing, transmission and quality aspects; distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm. ETSI ES 202 050 V0.1.0.
- ETSI, 2002. ETSI draft standard doc speech processing, transmission and quality aspects; distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithm. ETSI ES 202 050 V0.1.0.

19
- 0012265819
- Robust speech recognition under adverse conditions
- Furui, S., 1992. Robust speech recognition under adverse conditions. In: Proceedings of the ESCA Workshop on Speech Processing in Adverse Conditions, pp. 31-42.
- (1992) Proceedings of the ESCA Workshop on Speech Processing in Adverse Conditions , pp. 31-42
- Furui, S.¹

20
- 0002960982
- Recent advances in robust speech recognition
- Furui, S., 1997. Recent advances in robust speech recognition. In: Proceedings of ESCANATO Workshop on Robust Speech Recognition for Unknown Communication Channels, pp. 11-20.
- (1997) Proceedings of ESCANATO Workshop on Robust Speech Recognition for Unknown Communication Channels , pp. 11-20
- Furui, S.¹

21
- 0003671941
- Ph.D. Thesis, Cambridge University, Cambridge, England
- Gales, M.J.F., 1995. Model-based techniques for noise robust speech recognition. Ph.D. Thesis, Cambridge University, Cambridge, England.
- (1995) Model-based techniques for noise robust speech recognition
- Gales, M.J.F.¹

22
- 0030245128
- Robust speech recognition using parallel model combination
- Gales M.J.F., and Yound S.J. Robust speech recognition using parallel model combination. IEEE Transactions on Speech and Audio Processing 4 5 (1996) 352-359
- (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 352-359
- Gales, M.J.F.¹ Yound, S.J.²

23
- 0030263447
- Mean and variance adaptation within the MLLR framework
- Gales M.J.F., and Woodland P.C. Mean and variance adaptation within the MLLR framework. Computer Speech and Language 10 (1996) 249-264
- (1996) Computer Speech and Language , vol.10 , pp. 249-264
- Gales, M.J.F.¹ Woodland, P.C.²

24
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- Gales M.J.F. Maximum likelihood linear transformations for HMM-based speech recognition. Computer Speech and Language 12 (1998) 75-98
- (1998) Computer Speech and Language , vol.12 , pp. 75-98
- Gales, M.J.F.¹

25
- 0028419019
- Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains
- Gauvain J.L., and Lee C.H. Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Transactions on Speech and Audio Processing 2 2 (1994) 291-298
- (1994) IEEE Transactions on Speech and Audio Processing , vol.2 , Issue.2 , pp. 291-298
- Gauvain, J.L.¹ Lee, C.H.²

26
- 0032670621
- A blackboard architecture for computational auditory scene analysis
- Godsmark D., and Brown G.J. A blackboard architecture for computational auditory scene analysis. Speech Communication 27 3-4 (1999) 351-366
- (1999) Speech Communication , vol.27 , Issue.3-4 , pp. 351-366
- Godsmark, D.¹ Brown, G.J.²

27
- 0029288202
- Speech recognition in noisy environments: a survey
- Gong Y. Speech recognition in noisy environments: a survey. Speech Communication 16 (1995) 191-261
- (1995) Speech Communication , vol.16 , pp. 191-261
- Gong, Y.¹

28
- 78149458724
- Handling missing and unreliable information in speech recognition
- Green, P., Barker, J., Cooke, M.P., Josifovski, L., 2001. Handling missing and unreliable information in speech recognition. In: AISTATS'2001.
- (2001) AISTATS
- Green, P.¹ Barker, J.² Cooke, M.P.³ Josifovski, L.⁴

29
- 4644265990
- Monaural speech segregation based on pitch tracking and amplitude modulation
- Hu G.N., and Wang D.L. Monaural speech segregation based on pitch tracking and amplitude modulation. IEEE Transactions on Neural Network 15 5 (2004) 1135-1150
- (2004) IEEE Transactions on Neural Network , vol.15 , Issue.5 , pp. 1135-1150
- Hu, G.N.¹ Wang, D.L.²

30
- 33646786460
- Separation of fricatives and affricates
- Hu, G.N., Wang, D.L., 2005. Separation of fricatives and affricates. In: ICASSP'2005.
- (2005) ICASSP
- Hu, G.N.¹ Wang, D.L.²

31
- 38849102154
- Auditory segmentation based on onset and offset analysis
- Hu G.N., and Wang D.L. Auditory segmentation based on onset and offset analysis. IEEE Transactions on Audio, Speech, and Language Processing 15 2 (2007) 396-405
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.2 , pp. 396-405
- Hu, G.N.¹ Wang, D.L.²

32
- 0003770709
- Kluwer
- Junqua J.C., and Haton J.P. Robustness in Automatic Speech Recognition (1996), Kluwer
- (1996) Robustness in Automatic Speech Recognition
- Junqua, J.C.¹ Haton, J.P.²

33
- 51449085519
- Super-human multi-talker speech recognition: The IBM 2006 speech separation challenge system
- Kristjansson, T., Hershey, J., Olsen, P., Rennie, S., Gopinath, R., 2006. Super-human multi-talker speech recognition: the IBM 2006 speech separation challenge system. In: ICSLP'2006.
- (2006) ICSLP
- Kristjansson, T.¹ Hershey, J.² Olsen, P.³ Rennie, S.⁴ Gopinath, R.⁵

34
- 0032651723
- Integrated bias removal techniques for robust speech recognition
- Lawrence C., and Rahim M. Integrated bias removal techniques for robust speech recognition. Computer Speech and Language 13 (1999) 283-298
- (1999) Computer Speech and Language , vol.13 , pp. 283-298
- Lawrence, C.¹ Rahim, M.²

35
- 0032140546
- On stochastic feature and model compensation approaches to robust speech recognition
- Lee C.H. On stochastic feature and model compensation approaches to robust speech recognition. Speech Communication 25 (1998) 29-47
- (1998) Speech Communication , vol.25 , pp. 29-47
- Lee, C.H.¹

36
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- Leggetter C.J., and Woodland P.C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech and Language 9 (1995) 171-185
- (1995) Computer Speech and Language , vol.9 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

37
- 40949108726
- Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech
- Li P., Guan Y., Xu B., and Liu W.J. Monaural speech separation based on computational auditory scene analysis and objective quality assessment of speech. IEEE Transactions on Audio, Speech, and Language Processing 14 6 (2006) 2014-2023
- (2006) IEEE Transactions on Audio, Speech, and Language Processing , vol.14 , Issue.6 , pp. 2014-2023
- Li, P.¹ Guan, Y.² Xu, B.³ Liu, W.J.⁴

38
- 0019009880
- Speech enhancement using a soft-decision noise suppression filter
- Mcaylay R.J., and Malpass M.L. Speech enhancement using a soft-decision noise suppression filter. IEEE Transactions on Acoustic Speech Signal Processing 28 (1980) 137-145
- (1980) IEEE Transactions on Acoustic Speech Signal Processing , vol.28 , pp. 137-145
- Mcaylay, R.J.¹ Malpass, M.L.²

39
- 69249204319
- Combining missing-feature theory, speech enhancement and speaker-dependent/-independent modeling for speech separation
- Ming, J., Hazen, T.J., Glass, J.R., 2006. Combining missing-feature theory, speech enhancement and speaker-dependent/-independent modeling for speech separation. In: ICSLP'2006.
- (2006) ICSLP
- Ming, J.¹ Hazen, T.J.² Glass, J.R.³

40
- 0003789815
- Academic, San Diego, CA
- Moore B.C.J. An Introduction to the Psychology of Hearing. fourth ed. (1997), Academic, San Diego, CA
- (1997) An Introduction to the Psychology of Hearing. fourth ed.
- Moore, B.C.J.¹

41
- 0024753593
- Speech recognition using noise-adaptive prototypes
- Nadas A., Nahamoo D., and Picheny M. Speech recognition using noise-adaptive prototypes. IEEE Transactions on Acoustics, Speech and Signal Processing 37 10 (1989) 1495-1503
- (1989) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.37 , Issue.10 , pp. 1495-1503
- Nadas, A.¹ Nahamoo, D.² Picheny, M.³

42
- 0142056390
- An efficient auditory filterbank based on the gammatone function, Applied Psychological Unit, Cambridge University, Cambridge, UK
- 2341
- Patterson, R.D., Nimmo-Smith, I., Holdsworth, J., Rice, P., 1988. An efficient auditory filterbank based on the gammatone function, Applied Psychological Unit, Cambridge University, Cambridge, UK, APU Report 2341.
- (1988) APU Report
- Patterson, R.D.¹ Nimmo-Smith, I.² Holdsworth, J.³ Rice, P.⁴

43
- 0029769867
- Signal bias removal by maximum lielihood estimation for robust telephone speech recognition
- Rahim M., and Juang B.H. Signal bias removal by maximum lielihood estimation for robust telephone speech recognition. IEEE Transactions on Speech and Audio Processing 4 (1996) 19-30
- (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , pp. 19-30
- Rahim, M.¹ Juang, B.H.²

44
- 0038705102
- One microphone source separation
- Roweis S. One microphone source separation. In: NIPS' (2000)
- (2000) In: NIPS'
- Roweis, S.¹

45
- 85009230793
- Roweis, S., 2003. Factorial models and refiltering for speech separation and denoising. In: Eurospeech' 2003.
- Roweis, S., 2003. Factorial models and refiltering for speech separation and denoising. In: Eurospeech' 2003.

46
- 0034274946
- Noise-compensated hidden Markov models
- Sanches I. Noise-compensated hidden Markov models. IEEE Transactions on Speech and Audio Processing 8 5 (2000) 533-540
- (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.5 , pp. 533-540
- Sanches, I.¹

47
- 0030149866
- A maximum likelihood approach to stochastic matching for robust speech recognition
- Sankar A., and Lee C.H. A maximum likelihood approach to stochastic matching for robust speech recognition. IEEE Transactions on Speech and Audio Processing 4 (1996) 190-202
- (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , pp. 190-202
- Sankar, A.¹ Lee, C.H.²

48
- 69249207337
- A computational auditory scene analysis system for robust speech recognition
- Srinivasan, S., Shao, Y., Jin, Z.Z., Wang, D.L., 2006. A computational auditory scene analysis system for robust speech recognition. In: ICSLP'2006.
- (2006) ICSLP
- Srinivasan, S.¹ Shao, Y.² Jin, Z.Z.³ Wang, D.L.⁴

49
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- Divenyi P. (Ed), Kluwer Academic, Norwell MA
- Wang D.L. On ideal binary mask as the computational goal of auditory scene analysis. In: Divenyi P. (Ed). Speech Separation by Humans and Machines (2005), Kluwer Academic, Norwell MA 181-197
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

50
- 0032682770
- Separation of speech from interfering sounds based on oscillatory correlation
- Wang D.L., and Brown G.J. Separation of speech from interfering sounds based on oscillatory correlation. IEEE Transactions on Neural Networks 10 3 (1999) 684-697
- (1999) IEEE Transactions on Neural Networks , vol.10 , Issue.3 , pp. 684-697
- Wang, D.L.¹ Brown, G.J.²

51
- 0003982501
- Ph.D. Dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA
- Weintraub, M., 1985. A theory and computational model of auditory monaural sound separation. Ph.D. Dissertation, Department of Electrical Engineering, Stanford University, Stanford, CA.
- (1985) A theory and computational model of auditory monaural sound separation
- Weintraub, M.¹

52
- 0001459635
- Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises
- Zhao Y. Frequency-domain maximum likelihood estimation for automatic speech recognition in additive and convolutive noises. IEEE Transactions on Speech and Audio Processing 8 3 (2000) 255-266
- (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.3 , pp. 255-266
- Zhao, Y.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.