SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 7, 2013, Pages 1381-1390

Towards scaling up classification-based speech separation

Author keywords

Computational auditory scene analysis (CASA); deep belief networks; feature learning; monaural speech separation; support vector machines

Indexed keywords

BINARY CLASSIFICATION PROBLEMS; COMPUTATIONAL AUDITORY SCENE ANALYSIS; DEEP BELIEF NETWORKS; DISCRIMINATIVE FEATURES; FEATURE LEARNING; SEPARATION PERFORMANCE; SPEECH SEPARATION; SUPPORT VECTOR MACHINE (SVMS);

SEPARATION; SOURCE SEPARATION; SPEECH ANALYSIS;

SUPPORT VECTOR MACHINES;

EID: 84875678689 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2013.2250961 Document Type: Article

Times cited : (498)

References (47)

1
- 69349090197
- Learning deep architectures for AI
- Y. Bengio, "Learning deep architectures for AI," Foundat. Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, 2009.
- (2009) Foundat. Trends Mach. Learn. , vol.2 , Issue.1 , pp. 1-127
- Bengio, Y.¹

2
- 0038120523
- P. Boersma and D. Weenink, Praat: Doing Phonetics by Computer (Version 4.3.14) 2005 [Online]. Available: http://www.fon.hum.uva.nl/praat
- (2005) Praat: Doing Phonetics by Computer (Version 4.3.14)
- Boersma, P.¹ Weenink, D.²

3
- 0018455310
- Suppression of acoustic noise in speech using spectral subtraction
- S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Audio, Speech, Lang. Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979. (Pubitemid 9467471)
- (1979) IEEE Trans Acoust Speech Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
- Boll Steven, F.¹

4
- 25444522689
- Fast kernel classifiers with online and active learning
- A. Bordes, S. Ertekin, J. Weston, and L. Bottou, "Fast kernel classifiers with online and active learning," J. Mach. Learn. Res., vol. 6, pp. 1579-1619, 2005.
- (2005) J. Mach. Learn. Res. , vol.6 , pp. 1579-1619
- Bordes, A.¹ Ertekin, S.² Weston, J.³ Bottou, L.⁴

5
- 85162035281
- The tradeoffs of large scale learning
- L. Bottou and O. Bousquet, "The tradeoffs of large scale learning," in Adv. Neural Inf. Process. Syst. 20, 2008, pp. 161-168.
- (2008) Adv. Neural Inf. Process. Syst. , vol.20 , pp. 161-168
- Bottou, L.¹ Bousquet, O.²

6
- 33845354768
- Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
- DOI 10.1121/1.2363929
- D. Brungart, P. Chang, B. Simpson, and D. Wang, "Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation," J. Acoust. Soc. Amer., vol. 120, pp. 4007-4018, 2006. (Pubitemid 44888096)
- (2006) Journal of the Acoustical Society of America , vol.120 , Issue.6 , pp. 4007-4018
- Brungart, D.S.¹ Chang, P.S.² Simpson, B.D.³ Wang, D.⁴

7
- 79955702502
- LIBSVM: A library for support vector machines
- C. Chang and C. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27-27, 2011.
- (2011) ACM Trans. Intell. Syst. Technol. , vol.2 , Issue.3 , pp. 27-27
- Chang, C.¹ Lin, C.²

8
- 79959845286
- The CHiME corpus: A resource and a challenge for computational hearing in multisource environments
- H. Christensen, J. Barker, N. Ma, and P. Green, "The CHiME corpus: A resource and a challenge for computational hearing in multisource environments," in Proc. Interspeech, 2010.
- (2010) Proc. Interspeech
- Christensen, H.¹ Barker, J.² Ma, N.³ Green, P.⁴

9
- 84864250650
- An analysis of single-layer networks in unsupervised feature learning
- A. Coates, H. Lee, and A. Ng, "An analysis of single-layer networks in unsupervised feature learning," in Proc. 14th Int. Conf. Artif. Intell. Statist., 2011.
- (2011) Proc. 14th Int. Conf. Artif. Intell. Statist.
- Coates, A.¹ Lee, H.² Ng, A.³

10
- 80053442434
- The importance of encoding versus training with sparse coding and vector quantization
- A. Coates and A. Ng, "The importance of encoding versus training with sparse coding and vector quantization," in Proc. 28th Int. Conf. Mach. Learn., 2011.
- (2011) Proc. 28th Int. Conf. Mach. Learn.
- Coates, A.¹ Ng, A.²

11
- 84055222005
- Context-dependent pre-trained deep neural networks for large vocabulary speech recognition
- Jan
- G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.¹ Yu, D.² Deng, L.³ Acero, A.⁴

12
- 0021645331
- Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
- Dec.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. Audio, Speech, Lang. Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.
- (1984) IEEE Trans. Audio, Speech, Lang. Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
- Ephraim, Y.¹ Malah, D.²

13
- 77949522811
- Why does unsupervised pre-training help deep learning?
- D. Erhan, Y. Bengio, A. Courville, P. Manzagol, P. Vincent, and S. Bengio, "Why does unsupervised pre-training help deep learning?," J. Mach. Learn. Res., vol. 11, pp. 625-660, 2010.
- (2010) J. Mach. Learn. Res. , vol.11 , pp. 625-660
- Erhan, D.¹ Bengio, Y.² Courville, A.³ Manzagol, P.⁴ Vincent, P.⁵ Bengio, S.⁶

14
- 50949133669
- LIBLINEAR: A library for large linear classification
- R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, "LIBLINEAR: A library for large linear classification," J. Mach. Learn. Res., vol. 9, pp. 1871-1874, 2008.
- (2008) J. Mach. Learn. Res. , vol.9 , pp. 1871-1874
- Fan, R.¹ Chang, K.² Hsieh, C.³ Wang, X.⁴ Lin, C.⁵

15
- 63249085556
- Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis
- C. Févotte, N. Bertin, and J.-L. Durrieu, "Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis," Neural Comput., vol. 21, no. 3, pp. 793-830, 2009.
- (2009) Neural Comput. , vol.21 , Issue.3 , pp. 793-830
- Févotte, C.¹ Bertin, N.² Durrieu, J.-L.³

16
- 0003548585
- Gaithersburg, MD, USA: National Inst. of Standards and Technology
- J. Garofolo, DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus. Gaithersburg, MD, USA: National Inst. of Standards and Technology, 1993.
- (1993) DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus
- Garofolo, J.¹

17
- 84869105129
- A classification approach to speech segregation
- K. Han and D. Wang, "A classification approach to speech segregation," J. Acoust. Soc. Amer., vol. 132, pp. 3475-3483, 2012.
- (2012) J. Acoust. Soc. Amer. , vol.132 , pp. 3475-3483
- Han, K.¹ Wang, D.²

18
- 80051633766
- Investigations into the incorporation of the ideal binary mask in ASR
- W. Hartmann and E. Fosler-Lussier, "Investigations into the incorporation of the ideal binary mask in ASR," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2011, pp. 4804-4807.
- (2011) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 4804-4807
- Hartmann, W.¹ Fosler-Lussier, E.²

19
- 78049364397
- MMSE based noise PSD tracking with low complexity
- R. Hendriks, R. Heusdens, and J. Jensen, "MMSE based noise PSD tracking with low complexity," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process., 2010, pp. 4266-4269.
- (2010) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Process. , pp. 4266-4269
- Hendriks, R.¹ Heusdens, R.² Jensen, J.³

20
- 0013344078
- Training products of experts by minimizing contrastive divergence
- G. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Comput., vol. 14, no. 8, pp. 1771-1800, 2002.
- (2002) Neural Comput. , vol.14 , Issue.8 , pp. 1771-1800
- Hinton, G.¹

21
- 33745805403
- A fast learning algorithm for deep belief nets
- DOI 10.1162/neco.2006.18.7.1527
- G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006. (Pubitemid 44024729)
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

22
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-504, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-504
- Hinton, G.¹ Salakhutdinov, R.²

23
- 49249107353
- Segregation of unvoiced speech from nonspeech interference
- G. Hu and D. Wang, "Segregation of unvoiced speech from nonspeech interference," J. Acoust. Soc. Amer., vol. 124, pp. 1306-1319, 2008.
- (2008) J. Acoust. Soc. Amer. , vol.124 , pp. 1306-1319
- Hu, G.¹ Wang, D.²

24
- 77955695149
- A tandem algorithm for pitch estimation and voiced speech segregation
- Nov
- G. Hu and D. Wang, "A tandem algorithm for pitch estimation and voiced speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2067-2079, Nov. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2067-2079
- Hu, G.¹ Wang, D.²

25
- 84890467815
- G. Hu, 100 nonspeech environmental sounds 2004 [Online]. Available: http://www.cse.ohio-state.edu/pnl/corpus/HuCorpus.html
- 100 Nonspeech Environmental Sounds 2004
- Hu, G.¹

26
- 0014568991
- IEEE recommended practice for speech quality measurements
- Sep.
- "IEEE recommended practice for speech quality measurements," IEEE Trans. Audio Electroacoust., vol. 17, no. 3, pp. 225-246, Sep. 1969.
- (1969) IEEE Trans. Audio Electroacoust. , vol.17 , Issue.3 , pp. 225-246

27
- 65249103478
- A supervised learning approach to monaural segregation of reverberant speech
- May
- Z. Jin and D. Wang, "A supervised learning approach to monaural segregation of reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 625-638, May 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.4 , pp. 625-638
- Jin, Z.¹ Wang, D.²

28
- 85008056718
- HMM-based multipitch tracking for noisy and reverberant speech
- Jul
- Z. Jin and D. Wang, "HMM-based multipitch tracking for noisy and reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1091-1102, Jul. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1091-1102
- Jin, Z.¹ Wang, D.²

29
- 77956547397
- Improving speech intelligibility in noise using environment-optimized algorithms
- Nov
- G. Kim and P. Loizou, "Improving speech intelligibility in noise using environment-optimized algorithms," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2080-2090, Nov. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2080-2090
- Kim, G.¹ Loizou, P.²

30
- 70349093614
- An algorithm that improves speech intelligibility in noise for normal-hearing listeners
- G. Kim, Y. Lu, Y. Hu, and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Amer., vol. 126, pp. 1486-1494, 2009.
- (2009) J. Acoust. Soc. Amer. , vol.126 , pp. 1486-1494
- Kim, G.¹ Lu, Y.² Hu, Y.³ Loizou, P.⁴

31
- 78649325568
- Mask classification for missing-feature reconstruction for robust speech recognition with unknown background noise
- W. Kim and R. Stern, "Mask classification for missing-feature reconstruction for robust speech recognition with unknown background noise," Speech Commun., vol. 53, no. 1, pp. 1-11, 2011.
- (2011) Speech Commun. , vol.53 , Issue.1 , pp. 1-11
- Kim, W.¹ Stern, R.²

32
- 59449087310
- Exploring strategies for training deep neural networks
- H. Larochelle, Y. Bengio, J. Louradour, and P. Lamblin, "Exploring strategies for training deep neural networks," J. Mach. Learn. Res., vol. 10, pp. 1-40, 2009.
- (2009) J. Mach. Learn. Res. , vol.10 , pp. 1-40
- Larochelle, H.¹ Bengio, Y.² Louradour, J.³ Lamblin, P.⁴

33
- 71149119164
- Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
- H. Lee, R. Grosse, R. Ranganath, and A. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations," in Proc. 26th Int. Conf. Mach. Learn., 2009, pp. 609-616.
- (2009) Proc. 26th Int. Conf. Mach. Learn. , pp. 609-616
- Lee, H.¹ Grosse, R.² Ranganath, R.³ Ng, A.⁴

34
- 40749125179
- Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction
- DOI 10.1121/1.2832617
- N. Li and P. Loizou, "Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction," J. Acoust. Soc. Amer., vol. 123, no. 3, pp. 1673-1682, 2008. (Pubitemid 351379593)
- (2008) Journal of the Acoustical Society of America , vol.123 , Issue.3 , pp. 1673-1682
- Li, N.¹ Loizou, P.C.²

35
- 84055211743
- Acoustic modeling using deep belief networks
- Jan
- A. Mohamed, G. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-21, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 14-21
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

36
- 84897584695
- A general flexible framework for the handling of prior information in audio source separation
- May
- A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 4, pp. 1118-1133, May 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.4 , pp. 1118-1133
- Ozerov, A.¹ Vincent, E.² Bimbot, F.³

37
- 0003243224
- Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
- J. Platt, "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods," Adv. Large Margin Classifiers, pp. 61-74, 1999.
- (1999) Adv. Large Margin Classifiers , pp. 61-74
- Platt, J.¹

38
- 0142026377
- Speech segregation based on sound localization
- DOI 10.1121/1.1610463
- N. Roman, D. Wang, and G. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, pp. 2236-2252, 2003. (Pubitemid 37266649)
- (2003) Journal of the Acoustical Society of America , vol.114 , Issue.4 , pp. 2236-2252
- Roman, N.¹ Wang, D.² Brown, G.J.³

39
- 4644317224
- A Bayesian classifier for spec-trographic mask estimation for missing feature speech recognition
- M. Seltzer, B. Raj, and R. Stern, "A Bayesian classifier for spec-trographic mask estimation for missing feature speech recognition," Speech Commun., vol. 43, no. 4, pp. 379-393, 2004.
- (2004) Speech Commun. , vol.43 , Issue.4 , pp. 379-393
- Seltzer, M.¹ Raj, B.² Stern, R.³

40
- 34547964973
- Pegasos: Primal estimated sub-gradient solver for SVM
- S. Shalev-Shwartz, Y. Singer, and N. Srebro, "Pegasos: Primal estimated sub-gradient solver for SVM," in Proc. 24th Int. Conf. Mach. learn., 2007, pp. 807-814.
- (2007) Proc. 24th Int. Conf. Mach. Learn. , pp. 807-814
- Shalev-Shwartz, S.¹ Singer, Y.² Srebro, N.³

41
- 0347379706
- Multiresolution estimates of classification complexity
- Dec
- S. Singh, "Multiresolution estimates of classification complexity," IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1534-1539, Dec. 2003.
- (2003) IEEE Trans. Pattern Anal. Mach. Intell. , vol.25 , Issue.12 , pp. 1534-1539
- Singh, S.¹

42
- 33846500112
- Distances between data sets based on summary statistics
- N. Tatti, "Distances between data sets based on summary statistic," J. Mach. Learn. Res., vol. 8, pp. 131-154, 2007. (Pubitemid 46168465)
- (2007) Journal of Machine Learning Research , vol.8 , pp. 131-154
- Tatti, N.¹

43
- 0027623210
- Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, pp. 247-251, 1993.
- (1993) Speech Commun. , vol.12 , pp. 247-251
- Varga, A.¹ Steeneken, H.²

44
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- P. Divenyi, Ed. Norwell, MA, USA: Kluwer
- D. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA, USA: Kluwer, 2005, pp. 181-197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.¹

45
- 82255178542
- Hoboken, NJ, USA: Wiley-IEEE Press
- , D. Wang and G. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms and Applications. Hoboken, NJ, USA: Wiley-IEEE Press, 2006.
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms and Applications
- Wang, D.¹ Brown, G.²

46
- 84870477511
- Exploring monaural features for classification-based speech segregation
- Feb
- Y. Wang, K. Han, and D. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 2, pp. 270-279, Feb. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.2 , pp. 270-279
- Wang, Y.¹ Han, K.² Wang, D.³

47
- 84875681333
- Cocktail party processing via structured prediction
- Y. Wang and D. Wang, "Cocktail party processing via structured prediction," in Adv. Neural Inf. Process. Syst. 25, 2012, pp. 224-232.
- (2012) Adv. Neural Inf. Process. Syst. , vol.25 , pp. 224-232
- Wang, Y.¹ Wang, D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.