SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 1, 2013, Pages 168-177

Towards generalizing classification based speech separation

(2) Han, Kun a Wang, Deliang a

a The Ohio State University (United States)

Author keywords

Generalization; rethresholding; speech separation; support vector machine (SVM)

Indexed keywords

SEPARATION; SIGNAL TO NOISE RATIO; SOURCE SEPARATION; SPEECH ANALYSIS; SPEECH RECOGNITION; SUPERVISED LEARNING;

DISTRIBUTION FITTING; GENERALIZATION; IDEAL BINARY MASK (IBM); RETHRESHOLDING; SPEECH SEPARATION; SUPERVISED CLASSIFICATION; SYSTEMATIC EVALUATION; VOICE ACTIVITY DETECTION;

SUPPORT VECTOR MACHINES;

EID: 84869416544 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2215596 Document Type: Article

Times cited : (32)

References (43)

1
- 33748523481
- Determination of the potential benefit of time-frequency gain manipulation
- DOI 10.1097/01.aud.0000233891.86809.df, PII 0000344620061000000004
- M. C. Anzalone, L. Calandruccio, K. A. Doherty, and L. H. Carney, "Determination of the potential benefit of time-frequency gain manipulation," Ear Hear., vol. 27, no. 5, pp. 480-492, 2006. (Pubitemid 44371244)
- (2006) Ear and Hearing , vol.27 , Issue.5 , pp. 480-492
- Anzalone, M.C.¹ Calandruccio, L.² Doherty, K.A.³ Carney, L.H.⁴

2
- 0038120523
- P. Boersma and D.Weenink, 2007, PRAAT: Doing Phonetics by Computer (Version 4.5) [Online]. Available: http://www.fon.hum.uva.nl/praat
- (2007) PRAAT: Doing Phonetics by Computer (Version 4.5)
- Boersma, P.¹ Weenink, D.²

3
- 0004031293
- 3rd ed. New York: McGraw-Hill
- D. C. Boes, F. A. Graybill, and A.M.Mood, Introduction to the Theory of Statistics, 3rd ed. New York: McGraw-Hill, 1974.
- (1974) Introduction to the Theory of Statistics
- Boes, D.C.¹ Graybill, F.A.² Mood, A.M.³

4
- 18744396499
- Training text classifiers with SVM on very few positive examples
- Tech. Rep. MSR-TR-2003-34
- J. Brank,M. Grobelnik, N.Milic-Frayling, and D.Mladenic, "Training text classifiers with SVM on very few positive examples," Microsoft Corp., Tech. Rep. MSR-TR-2003-34, 2003.
- (2003) Microsoft Corp.
- Brank, J.¹ Grobelnik, M.² Milic-Frayling, N.³ Mladenic, D.⁴

5
- 0003684441
- Cambridge MA: MIT Press ch. 1
- A. S. Bregman, Auditory Scene Analysis. Cambridge, MA: MIT Press, 1990, ch. 1.
- (1990) Auditory Scene Analysis
- Bregman, A.S.¹

6
- 33845354768
- Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
- DOI 10.1121/1.2363929
- D. S. Brungart, P. S. Chang, B. D. Simpson, and D. L. Wang, "Isolating the energetic component of speech-on-speechmaskingwith ideal time-frequency segregation," J. Acoust. Soc. Amer., vol. 120, no. 6, pp. 4007-4018, 2006. (Pubitemid 44888096)
- (2006) Journal of the Acoustical Society of America , vol.120 , Issue.6 , pp. 4007-4018
- Brungart, D.S.¹ Chang, P.S.² Simpson, B.D.³ Wang, D.⁴

7
- 0003710380
- C. C. Chang and C. J. Lin, 2001, LIBSVM: A Library for Support Vector Machines, [Online]. Available: http://www.csie.ntu.edu.tw/cjlin/libsvm
- (2001) LIBSVM: A Library for Support Vector Machines
- Chang, C.C.¹ Lin, C.J.²

8
- 0021645331
- Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
- Dec.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.
- (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
- Ephraim, Y.¹ Malah, D.²

9
- 51449104842
- Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors
- Aug.
- J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, "Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.6 , pp. 1741-1752
- Erkelens, J.S.¹ Hendriks, R.C.² Heusdens, R.³ Jensen, J.⁴

10
- 80051643459
- An SVM based classification approach to speech separation
- K. Han and D. L. Wang, "An SVM based classification approach to speech separation," in Proc. IEEE ICASSP, 2011, pp. 5212-5215.
- (2011) Proc. IEEE ICASSP , pp. 5212-5215
- Han, K.¹ Wang, D.L.²

11
- 78049364397
- MMSE based noise PSD tracking with low complexity
- R. C. Hendriks, R. Heusdens, and J. Jensen, "MMSE based noise PSD tracking with low complexity," in Proc. IEEE ICASSP, 2010, pp. 4266-4269.
- (2010) Proc. IEEE ICASSP , pp. 4266-4269
- Hendriks, R.C.¹ Heusdens, R.² Jensen, J.³

12
- 0028517164
- RASTA processing of speech
- Oct.
- H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 578-589, Oct. 1994.
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 578-589
- Hermansky, H.¹ Morgan, N.²

13
- 84867209387
- G. Hu, 100 Nonspeech Sounds, 2006 [Online]. Available: http://www. cse.ohio-state.edu/pnl/corpus/HuCorpus.html
- (2006) 100 Nonspeech Sounds
- Hu, G.¹

14
- 4644265990
- Monaural speech segregation based on pitch tracking and amplitude modulation
- Sep.
- G. Hu and D. L. Wang, "Monaural speech segregation based on pitch tracking and amplitude modulation," IEEE Trans. Neural Netw., vol. 15, no. 5, pp. 1135-1150, Sep. 2004.
- (2004) IEEE Trans. Neural Netw. , vol.15 , Issue.5 , pp. 1135-1150
- Hu, G.¹ Wang, D.L.²

15
- 38849102154
- Auditory segmentation based on onset and offset analysis
- Feb.
- G. Hu and D. L. Wang, "Auditory segmentation based on onset and offset analysis," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 2, pp. 396-405, Feb. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.2 , pp. 396-405
- Hu, G.¹ Wang, D.L.²

16
- 85008581724
- Spectral magnitude minimum meansquare error estimation using binary and continuous gain functions
- Jan.
- J. Jensen and R. C. Hendriks, "Spectral magnitude minimum meansquare error estimation using binary and continuous gain functions," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 92-102, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 92-102
- Jensen, J.¹ Hendriks, R.C.²

17
- 65249103478
- A supervised learning approach to monaural segregation of reverberant speech
- May
- Z. Jin and D. L. Wang, "A supervised learning approach to monaural segregation of reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 625-638, May 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.4 , pp. 625-638
- Jin, Z.¹ Wang, D.L.²

18
- 85008056718
- HMM-based multipitch tracking for noisy and reverberant speech
- Jul.
- Z. Jin and D. L.Wang, "HMM-based multipitch tracking for noisy and reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1091-1102, Jul. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1091-1102
- Jin, Z.¹ Wang, D.L.²

19
- 77956547397
- Improving speech intelligibility in noise using environment-optimized algorithms
- Nov.
- G. Kim and P. C. Loizou, "Improving speech intelligibility in noise using environment-optimized algorithms," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2080-2090, Nov. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2080-2090
- Kim, G.¹ Loizou, P.C.²

20
- 70349093614
- An algorithm that improves speech intelligibility in noise for normal-hearing listeners
- G. Kim, Y. Lu, Y. Hu, and P. C. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Amer., vol. 126, pp. 1486-1494, 2009.
- (2009) J. Acoust. Soc. Amer. , vol.126 , pp. 1486-1494
- Kim, G.¹ Lu, Y.² Hu, Y.³ Loizou, P.C.⁴

21
- 40749125179
- Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction
- DOI 10.1121/1.2832617
- N. Li and P. C. Loizou, "Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction," J. Acoust. Soc. Amer., vol. 123, pp. 1673-1682, 2008. (Pubitemid 351379593)
- (2008) Journal of the Acoustical Society of America , vol.123 , Issue.3 , pp. 1673-1682
- Li, N.¹ Loizou, P.C.²

22
- 80052714543
- A unifying view on dataset shift in classification
- J.G.Moreno-Torres, T. Raeder,R.Alaiz-Rodríguez, N. V. Chawla, and F. Herrera, "A unifying view on dataset shift in classification," Pattern Recognit., vol. 45, no. 1, pp. 521-530, 2012.
- (2012) Pattern Recognit. , vol.45 , Issue.1 , pp. 521-530
- Moreno-Torres, J.G.¹ Raeder, T.² Alaiz-Rodríguez, R.³ Chawla, N.V.⁴ Herrera, F.⁵

23
- 84869471437
- Dept. of Comput. Sci. and Eng., The Ohio State Univ., Tech. Rep. TR36
- A. Narayanan and D. L. Wang, "A CASA based system for SNR estimation," Dept. of Comput. Sci. and Eng., The Ohio State Univ., Tech. Rep. TR36, 2011.
- (2011) A CASA Based System for SNR Estimation
- Narayanan, A.¹ Wang, D.L.²

24
- 0003513556
- (2nd ed.). Upper Saddle River NJ: Prentice-Hall
- A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal Processing (2nd ed.). Upper Saddle River, NJ: Prentice-Hall, 1999.
- (1999) Discrete-Time Signal Processing
- Oppenheim, A.V.¹ Schafer, R.W.² Buck, J.R.³

25
- 51449094735
- Adaptation of bayesian models for single-channel source separation and its application to voice/music separation in popular songs
- Jul.
- A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, "Adaptation of bayesian models for single-channel source separation and its application to voice/music separation in popular songs," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1564-1578, Jul. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.5 , pp. 1564-1578
- Ozerov, A.¹ Philippe, P.² Bimbot, F.³ Gribonval, R.⁴

26
- 84897584695
- A general flexible framework for the handling of prior information in audio source separation
- May
- A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 4, pp. 1118-1133, May 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.4 , pp. 1118-1133
- Ozerov, A.¹ Vincent, E.² Bimbot, F.³

27
- 0142056390
- An efficient auditory filterbank based on the gammatone function
- Tech. Rep. 2341
- R. D. Patterson, I. Nimmo-Smith, J. Holdsworth, and P. Rice, "An efficient auditory filterbank based on the gammatone function," MRC Applied Psychology Unit, Tech. Rep. No. 2341, 1988.
- (1988) MRC Applied Psychology Unit
- Patterson, R.D.¹ Nimmo-Smith, I.² Holdsworth, J.³ Rice, P.⁴

28
- 0003243224
- Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
- Cambridge, MA: MIT Press
- J. C. Platt, "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods," in Advances in Large Margin Classifiers. Cambridge, MA: MIT Press, 1999, pp. 61-74.
- (1999) Advances in Large Margin Classifiers , pp. 61-74
- Platt, J.C.¹

29
- 0142026377
- Speech segregation based on sound localization
- DOI 10.1121/1.1610463
- N. Roman, D. L. Wang, and G. J. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, no. 4, pp. 2236-2252, 2003. (Pubitemid 37266649)
- (2003) Journal of the Acoustical Society of America , vol.114 , Issue.4 , pp. 2236-2252
- Roman, N.¹ Wang, D.² Brown, G.J.³

30
- 0014568991
- IEEE recommended practice for speech quality measurements
- E. H. Rothauser,W. D. Chapman, N. Guttman, K. S. Nordby, H. R. Silbiger, G. E. Urbanek, and M.Weinstock, "IEEE recommended practice for speech quality measurements," IEEE Trans. Audio Electroacoust., vol. 19, pp. 227-246, 1969.
- (1969) IEEE Trans. Audio Electroacoust. , vol.19 , pp. 227-246
- Rothauser, E.H.¹ Chapman, W.D.² Guttman, N.³ Nordby, K.S.⁴ Silbiger, H.R.⁵ Urbanek, G.E.⁶ Weinstock, M.⁷

31
- 0032166087
- HMM-based strategies for enhancement of speech signals embedded in nonstationary noise
- Sep.
- H. Sameti, H. Sheikhzadeh, L. Deng, and R. L. Brennan, "HMM-based strategies for enhancement of speech signals embedded in nonstationary noise," IEEE Trans. Speech Audio Process., vol. 6, no. 5, pp. 445-455, Sep. 1998.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.5 , pp. 445-455
- Sameti, H.¹ Sheikhzadeh, H.² Deng, L.³ Brennan, R.L.⁴

32
- 4644317224
- A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
- M. L. Seltzer, B. Raj, and R. M. Stern, "A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition," Speech Commun., vol. 43, no. 4, pp. 379-393, 2004.
- (2004) Speech Commun. , vol.43 , Issue.4 , pp. 379-393
- Seltzer, M.L.¹ Raj, B.² Stern, R.M.³

33
- 0032762471
- A statistical model-based voice activity detection
- Jan.
- J. Sohn, N. S. Kim, and W. Sung, "A statistical model-based voice activity detection," IEEE Signal Process. Lett., vol. 6, no. 1, pp. 1-3, Jan. 1999.
- (1999) IEEE Signal Process. Lett. , vol.6 , Issue.1 , pp. 1-3
- Sohn, J.¹ Kim, N.S.² Sung, W.³

34
- 70350565063
- On strategies for imbalanced text classification using SVM: A comparative study
- A. Sun, E. P. Lim, and Y. Liu, "On strategies for imbalanced text classification using SVM: A comparative study," Decision Support Syst., vol. 48, no. 1, pp. 191-201, 2009.
- (2009) Decision Support Syst. , vol.48 , Issue.1 , pp. 191-201
- Sun, A.¹ Lim, E.P.² Liu, Y.³

35
- 0038712550
- SNR estimation based on amplitude modulation analysis with applications to noise suppression
- Mar.
- J. Tchorz and B. Kollmeier, "SNR estimation based on amplitude modulation analysis with applications to noise suppression," IEEE Trans. Speech Audio Process., vol. 11, no. 3, pp. 184-192, Mar. 2003.
- (2003) IEEE Trans. Speech Audio Process. , vol.11 , Issue.3 , pp. 184-192
- Tchorz, J.¹ Kollmeier, B.²

36
- 0003450542
- NewYork: Springer
- V. N. Vapnik, The Nature of Statistical Learning Theory. NewYork: Springer, 2000.
- (2000) The Nature of Statistical Learning Theory
- Vapnik, V.N.¹

37
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- P. Divenyi, Ed. Norwell, MA, USA: Kluwer ch. 12
- D. L. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA, USA: Kluwer, 2005, ch. 12, pp. 181-197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

38
- 82255178542
- Hoboken NJ: Wiley
- D. L. Wang and G. J. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Hoboken, NJ: Wiley, 2006.
- (2006) Computational Auditory Scene Analysis: Principles Algorithms and Applications
- Wang, D.L.¹ Brown, G.J.²

39
- 64649103540
- Speech intelligibility in background noise with ideal binary time-frequency masking
- D. L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary time-frequency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
- (2009) J. Acoust. Soc. Amer. , vol.125 , pp. 2336-2347
- Wang, D.L.¹ Kjems, U.² Pedersen, M.S.³ Boldt, J.B.⁴ Lunner, T.⁵

40
- 84869471439
- Dept. of Comput. Sci. and Eng., The Ohio State Univ., Tech. Rep. TR37
- Y. Wang, K. Han, and D. L. Wang, "Exploring monaural features for classification-based speech segregation," Dept. of Comput. Sci. and Eng., The Ohio State Univ., Tech. Rep. TR37, 2011.
- (2011) Exploring Monaural Features for Classification-based Speech Segregation
- Wang, Y.¹ Han, K.² Wang, D.L.³

41
- 80051659047
- Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio
- Y. Wang and Z. Ou, "Combining HMM-based melody extraction and NMF-based soft masking for separating voice and accompaniment from monaural audio," in Proc. IEEE ICASSP, 2011, pp. 1-4.
- (2011) Proc. IEEE ICASSP , pp. 1-4
- Wang, Y.¹ Ou, Z.²

42
- 48149090146
- Estimating single-channel source separation masks: Relevance vector machine classifiers vs. pitch-based masking
- R. J. Weiss and D. P. W. Ellis, "Estimating single-channel source separation masks: Relevance vector machine classifiers vs. pitch-based masking," in Proc. Workshop Statist. Percept. Audition, 2006, pp. 31-36.
- (2006) Proc. Workshop Statist. Percept. Audition , pp. 31-36
- Weiss, R.J.¹ Ellis, D.P.W.²

43
- 51449116166
- HMM-based gainmodeling for enhancement of speech in noise
- Mar.
- D.Y.Zhao and W. B. Kleijn, "HMM-based gainmodeling for enhancement of speech in noise," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 3, pp. 882-892, Mar. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.3 , pp. 882-892
- Zhao, D.Y.¹ Kleijn, W.B.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.