메뉴 건너뛰기




Volumn 21, Issue 7, 2013, Pages 1381-1390

Towards scaling up classification-based speech separation

Author keywords

Computational auditory scene analysis (CASA); deep belief networks; feature learning; monaural speech separation; support vector machines

Indexed keywords

BINARY CLASSIFICATION PROBLEMS; COMPUTATIONAL AUDITORY SCENE ANALYSIS; DEEP BELIEF NETWORKS; DISCRIMINATIVE FEATURES; FEATURE LEARNING; SEPARATION PERFORMANCE; SPEECH SEPARATION; SUPPORT VECTOR MACHINE (SVMS);

EID: 84875678689     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2013.2250961     Document Type: Article
Times cited : (498)

References (47)
  • 1
    • 69349090197 scopus 로고    scopus 로고
    • Learning deep architectures for AI
    • Y. Bengio, "Learning deep architectures for AI," Foundat. Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, 2009.
    • (2009) Foundat. Trends Mach. Learn. , vol.2 , Issue.1 , pp. 1-127
    • Bengio, Y.1
  • 3
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • S. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Audio, Speech, Lang. Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979. (Pubitemid 9467471)
    • (1979) IEEE Trans Acoust Speech Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll Steven, F.1
  • 4
    • 25444522689 scopus 로고    scopus 로고
    • Fast kernel classifiers with online and active learning
    • A. Bordes, S. Ertekin, J. Weston, and L. Bottou, "Fast kernel classifiers with online and active learning," J. Mach. Learn. Res., vol. 6, pp. 1579-1619, 2005.
    • (2005) J. Mach. Learn. Res. , vol.6 , pp. 1579-1619
    • Bordes, A.1    Ertekin, S.2    Weston, J.3    Bottou, L.4
  • 6
    • 33845354768 scopus 로고    scopus 로고
    • Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
    • DOI 10.1121/1.2363929
    • D. Brungart, P. Chang, B. Simpson, and D. Wang, "Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation," J. Acoust. Soc. Amer., vol. 120, pp. 4007-4018, 2006. (Pubitemid 44888096)
    • (2006) Journal of the Acoustical Society of America , vol.120 , Issue.6 , pp. 4007-4018
    • Brungart, D.S.1    Chang, P.S.2    Simpson, B.D.3    Wang, D.4
  • 7
    • 79955702502 scopus 로고    scopus 로고
    • LIBSVM: A library for support vector machines
    • C. Chang and C. Lin, "LIBSVM: A library for support vector machines," ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, pp. 27-27, 2011.
    • (2011) ACM Trans. Intell. Syst. Technol. , vol.2 , Issue.3 , pp. 27-27
    • Chang, C.1    Lin, C.2
  • 8
    • 79959845286 scopus 로고    scopus 로고
    • The CHiME corpus: A resource and a challenge for computational hearing in multisource environments
    • H. Christensen, J. Barker, N. Ma, and P. Green, "The CHiME corpus: A resource and a challenge for computational hearing in multisource environments," in Proc. Interspeech, 2010.
    • (2010) Proc. Interspeech
    • Christensen, H.1    Barker, J.2    Ma, N.3    Green, P.4
  • 10
    • 80053442434 scopus 로고    scopus 로고
    • The importance of encoding versus training with sparse coding and vector quantization
    • A. Coates and A. Ng, "The importance of encoding versus training with sparse coding and vector quantization," in Proc. 28th Int. Conf. Mach. Learn., 2011.
    • (2011) Proc. 28th Int. Conf. Mach. Learn.
    • Coates, A.1    Ng, A.2
  • 11
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large vocabulary speech recognition
    • Jan
    • G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pre-trained deep neural networks for large vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 12
    • 0021645331 scopus 로고
    • Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
    • Dec.
    • Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator," IEEE Trans. Audio, Speech, Lang. Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.
    • (1984) IEEE Trans. Audio, Speech, Lang. Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
    • Ephraim, Y.1    Malah, D.2
  • 14
    • 50949133669 scopus 로고    scopus 로고
    • LIBLINEAR: A library for large linear classification
    • R. Fan, K. Chang, C. Hsieh, X. Wang, and C. Lin, "LIBLINEAR: A library for large linear classification," J. Mach. Learn. Res., vol. 9, pp. 1871-1874, 2008.
    • (2008) J. Mach. Learn. Res. , vol.9 , pp. 1871-1874
    • Fan, R.1    Chang, K.2    Hsieh, C.3    Wang, X.4    Lin, C.5
  • 15
    • 63249085556 scopus 로고    scopus 로고
    • Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis
    • C. Févotte, N. Bertin, and J.-L. Durrieu, "Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis," Neural Comput., vol. 21, no. 3, pp. 793-830, 2009.
    • (2009) Neural Comput. , vol.21 , Issue.3 , pp. 793-830
    • Févotte, C.1    Bertin, N.2    Durrieu, J.-L.3
  • 17
    • 84869105129 scopus 로고    scopus 로고
    • A classification approach to speech segregation
    • K. Han and D. Wang, "A classification approach to speech segregation," J. Acoust. Soc. Amer., vol. 132, pp. 3475-3483, 2012.
    • (2012) J. Acoust. Soc. Amer. , vol.132 , pp. 3475-3483
    • Han, K.1    Wang, D.2
  • 20
    • 0013344078 scopus 로고    scopus 로고
    • Training products of experts by minimizing contrastive divergence
    • G. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Comput., vol. 14, no. 8, pp. 1771-1800, 2002.
    • (2002) Neural Comput. , vol.14 , Issue.8 , pp. 1771-1800
    • Hinton, G.1
  • 21
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • DOI 10.1162/neco.2006.18.7.1527
    • G. Hinton, S. Osindero, and Y. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006. (Pubitemid 44024729)
    • (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.-W.3
  • 22
    • 33746600649 scopus 로고    scopus 로고
    • Reducing the dimensionality of data with neural networks
    • G. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-504, 2006.
    • (2006) Science , vol.313 , Issue.5786 , pp. 504-504
    • Hinton, G.1    Salakhutdinov, R.2
  • 23
    • 49249107353 scopus 로고    scopus 로고
    • Segregation of unvoiced speech from nonspeech interference
    • G. Hu and D. Wang, "Segregation of unvoiced speech from nonspeech interference," J. Acoust. Soc. Amer., vol. 124, pp. 1306-1319, 2008.
    • (2008) J. Acoust. Soc. Amer. , vol.124 , pp. 1306-1319
    • Hu, G.1    Wang, D.2
  • 24
    • 77955695149 scopus 로고    scopus 로고
    • A tandem algorithm for pitch estimation and voiced speech segregation
    • Nov
    • G. Hu and D. Wang, "A tandem algorithm for pitch estimation and voiced speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2067-2079, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2067-2079
    • Hu, G.1    Wang, D.2
  • 26
    • 0014568991 scopus 로고
    • IEEE recommended practice for speech quality measurements
    • Sep.
    • "IEEE recommended practice for speech quality measurements," IEEE Trans. Audio Electroacoust., vol. 17, no. 3, pp. 225-246, Sep. 1969.
    • (1969) IEEE Trans. Audio Electroacoust. , vol.17 , Issue.3 , pp. 225-246
  • 27
    • 65249103478 scopus 로고    scopus 로고
    • A supervised learning approach to monaural segregation of reverberant speech
    • May
    • Z. Jin and D. Wang, "A supervised learning approach to monaural segregation of reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 4, pp. 625-638, May 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.4 , pp. 625-638
    • Jin, Z.1    Wang, D.2
  • 28
    • 85008056718 scopus 로고    scopus 로고
    • HMM-based multipitch tracking for noisy and reverberant speech
    • Jul
    • Z. Jin and D. Wang, "HMM-based multipitch tracking for noisy and reverberant speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1091-1102, Jul. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1091-1102
    • Jin, Z.1    Wang, D.2
  • 29
    • 77956547397 scopus 로고    scopus 로고
    • Improving speech intelligibility in noise using environment-optimized algorithms
    • Nov
    • G. Kim and P. Loizou, "Improving speech intelligibility in noise using environment-optimized algorithms," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2080-2090, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2080-2090
    • Kim, G.1    Loizou, P.2
  • 30
    • 70349093614 scopus 로고    scopus 로고
    • An algorithm that improves speech intelligibility in noise for normal-hearing listeners
    • G. Kim, Y. Lu, Y. Hu, and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust. Soc. Amer., vol. 126, pp. 1486-1494, 2009.
    • (2009) J. Acoust. Soc. Amer. , vol.126 , pp. 1486-1494
    • Kim, G.1    Lu, Y.2    Hu, Y.3    Loizou, P.4
  • 31
    • 78649325568 scopus 로고    scopus 로고
    • Mask classification for missing-feature reconstruction for robust speech recognition with unknown background noise
    • W. Kim and R. Stern, "Mask classification for missing-feature reconstruction for robust speech recognition with unknown background noise," Speech Commun., vol. 53, no. 1, pp. 1-11, 2011.
    • (2011) Speech Commun. , vol.53 , Issue.1 , pp. 1-11
    • Kim, W.1    Stern, R.2
  • 33
    • 71149119164 scopus 로고    scopus 로고
    • Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations
    • H. Lee, R. Grosse, R. Ranganath, and A. Ng, "Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations," in Proc. 26th Int. Conf. Mach. Learn., 2009, pp. 609-616.
    • (2009) Proc. 26th Int. Conf. Mach. Learn. , pp. 609-616
    • Lee, H.1    Grosse, R.2    Ranganath, R.3    Ng, A.4
  • 34
    • 40749125179 scopus 로고    scopus 로고
    • Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction
    • DOI 10.1121/1.2832617
    • N. Li and P. Loizou, "Factors influencing intelligibility of ideal binary-masked speech: Implications for noise reduction," J. Acoust. Soc. Amer., vol. 123, no. 3, pp. 1673-1682, 2008. (Pubitemid 351379593)
    • (2008) Journal of the Acoustical Society of America , vol.123 , Issue.3 , pp. 1673-1682
    • Li, N.1    Loizou, P.C.2
  • 36
    • 84897584695 scopus 로고    scopus 로고
    • A general flexible framework for the handling of prior information in audio source separation
    • May
    • A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 4, pp. 1118-1133, May 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.4 , pp. 1118-1133
    • Ozerov, A.1    Vincent, E.2    Bimbot, F.3
  • 37
    • 0003243224 scopus 로고    scopus 로고
    • Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
    • J. Platt, "Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods," Adv. Large Margin Classifiers, pp. 61-74, 1999.
    • (1999) Adv. Large Margin Classifiers , pp. 61-74
    • Platt, J.1
  • 39
    • 4644317224 scopus 로고    scopus 로고
    • A Bayesian classifier for spec-trographic mask estimation for missing feature speech recognition
    • M. Seltzer, B. Raj, and R. Stern, "A Bayesian classifier for spec-trographic mask estimation for missing feature speech recognition," Speech Commun., vol. 43, no. 4, pp. 379-393, 2004.
    • (2004) Speech Commun. , vol.43 , Issue.4 , pp. 379-393
    • Seltzer, M.1    Raj, B.2    Stern, R.3
  • 41
    • 0347379706 scopus 로고    scopus 로고
    • Multiresolution estimates of classification complexity
    • Dec
    • S. Singh, "Multiresolution estimates of classification complexity," IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 12, pp. 1534-1539, Dec. 2003.
    • (2003) IEEE Trans. Pattern Anal. Mach. Intell. , vol.25 , Issue.12 , pp. 1534-1539
    • Singh, S.1
  • 42
    • 33846500112 scopus 로고    scopus 로고
    • Distances between data sets based on summary statistics
    • N. Tatti, "Distances between data sets based on summary statistic," J. Mach. Learn. Res., vol. 8, pp. 131-154, 2007. (Pubitemid 46168465)
    • (2007) Journal of Machine Learning Research , vol.8 , pp. 131-154
    • Tatti, N.1
  • 43
    • 0027623210 scopus 로고
    • Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
    • A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, pp. 247-251, 1993.
    • (1993) Speech Commun. , vol.12 , pp. 247-251
    • Varga, A.1    Steeneken, H.2
  • 44
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary mask as the computational goal of auditory scene analysis
    • P. Divenyi, Ed. Norwell, MA, USA: Kluwer
    • D. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA, USA: Kluwer, 2005, pp. 181-197.
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.1
  • 46
    • 84870477511 scopus 로고    scopus 로고
    • Exploring monaural features for classification-based speech segregation
    • Feb
    • Y. Wang, K. Han, and D. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 2, pp. 270-279, Feb. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.2 , pp. 270-279
    • Wang, Y.1    Han, K.2    Wang, D.3
  • 47
    • 84875681333 scopus 로고    scopus 로고
    • Cocktail party processing via structured prediction
    • Y. Wang and D. Wang, "Cocktail party processing via structured prediction," in Adv. Neural Inf. Process. Syst. 25, 2012, pp. 224-232.
    • (2012) Adv. Neural Inf. Process. Syst. , vol.25 , pp. 224-232
    • Wang, Y.1    Wang, D.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.