SCOPUS 정보 검색 플랫폼

IEEE/ACM Transactions on Audio Speech and Language Processing

Volumn 22, Issue 12, 2014, Pages 1849-1858

On training targets for supervised speech separation

(3) Wang, Yuxuan a Narayanan, Arun a Wang, De Liang a

a Ohio State University (United States)

Author keywords

Deep neural networks; Speech separation; Supervised learning; Training targets

Indexed keywords

FACTORIZATION; FAST FOURIER TRANSFORMS; SEPARATION; SPEECH; SPEECH ANALYSIS; SPEECH ENHANCEMENT; SUPERVISED LEARNING;

DEEP NEURAL NETWORKS; IDEAL BINARY MASK (IBM); NONNEGATIVE MATRIX FACTORIZATION; SHORT TIME FOURIER TRANSFORMS; SPECTRAL MAGNITUDES; SPEECH SEPARATION; SUPERVISED LEARNING PROBLEMS; TIME-FREQUENCY REPRESENTATIONS;

SPEECH INTELLIGIBILITY;

EID: 84921740463 PISSN: 23299290 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2014.2352935 Document Type: Article

Times cited : (1216)

References (39)

1
- 33748523481
- Determination of the potential benefit of time-frequency gain manipulation
- M. Anzalone, L. Calandruccio, K. Doherty, and L. Carney, "Determination of the potential benefit of time-frequency gain manipulation," Ear Hear., vol. 27, no. 5, pp. 480-492, 2006.
- (2006) Ear Hear , vol.27 , Issue.5 , pp. 480-492
- Anzalone, M.¹ Calandruccio, L.² Doherty, K.³ Carney, L.⁴

2
- 33845354768
- Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
- D. Brungart, P. Chang, B. Simpson, and D. Wang, "Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation," J. Acoust. Soc. Amer., vol. 120, pp. 4007-4018, 2006.
- (2006) J. Acoust. Soc. Amer , vol.120 , pp. 4007-4018
- Brungart, D.¹ Chang, P.² Simpson, B.³ Wang, D.⁴

3
- 42549139762
- MVA processing of speech features
- Jan
- C. Chen and J. Bilmes, "MVA processing of speech features," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 257-270, Jan.2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.1 , pp. 257-270
- Chen, C.¹ Bilmes, J.²

4
- 84905233552
- A feature study for classificationbased speech separation at very low signal-to-noise ratio
- J. Chen, Y. Wang, and D. Wang, "A feature study for classificationbased speech separation at very low signal-to-noise ratio," in Proc.ICASSP, 2014, pp. 7059-7063.
- (2014) Proc.ICASSP , pp. 7059-7063
- Chen, J.¹ Wang, Y.² Wang, D.³

5
- 80052250414
- Adaptive subgradient methods for online learning and stochastic optimization
- J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," J. Mach. Learn. Res., pp.2121-2159, 2011.
- (2011) J. Mach. Learn. Res , pp. 2121-2159
- Duchi, J.¹ Hazan, E.² Singer, Y.³

6
- 51449104842
- Minimummeansquare error estimation of discrete fourier coefficients with generalized gamma priors
- Aug
- J. Erkelens,R.Hendriks, R. Heusdens, and J. Jensen, "Minimummeansquare error estimation of discrete fourier coefficients with generalized gamma priors," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.6 , pp. 1741-1752
- Erkelens, J.¹ Hendriks, R.² Heusdens, R.³ Jensen, J.⁴

7
- 0003548585
- Gaithersburg, MD, USA: Nat. Inst. of Standards Technol
- J. Garofolo, DARPA TIMIT acoustic-phonetic continuous speech corpus. Gaithersburg, MD, USA: Nat. Inst. of Standards Technol., 1993.
- (1993) DARPA TIMIT Acoustic-phonetic Continuous Speech Corpus
- Garofolo, J.¹

8
- 84862294866
- Deep sparse rectifier networks
- X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier networks," in Proc. 14th Int. Conf. Artif. Intell. Statist. JMLR W&CP Volume,2011, vol. 15, pp. 315-323.
- (2011) Proc. 14th Int. Conf. Artif. Intell. Statist. JMLR W&CP Volume , vol.15 , pp. 315-323
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

9
- 85083951034
- Knowledge matters: Importance of prior information for optimization
- C. Gulcehre and Y. Bengio, "Knowledge matters: Importance of prior information for optimization," in Proc. Int. Conf. Learn. Representat.(ICLR), 2013.
- (2013) Proc. Int. Conf. Learn. Representat.(ICLR)
- Gulcehre, C.¹ Bengio, Y.²

10
- 84869105129
- A classification based approach to speech segregation
- K. Han and D. Wang, "A classification based approach to speech segregation," J. Acoust. Soc. Amer., vol. 132, pp. 3475-3483, 2012.
- (2012) J. Acoust. Soc. Amer , vol.132 , pp. 3475-3483
- Han, K.¹ Wang, D.²

11
- 84905268759
- Learning spectralmapping for speech dereverberation
- K. Han, Y.Wang, and D.Wang, "Learning spectralmapping for speech dereverberation," in Proc. ICASSP, 2014, pp. 4648-4652.
- (2014) Proc. ICASSP , pp. 4648-4652
- Han, K.¹ Wang, Y.² Wang, D.³

12
- 84885412715
- An algorithm to improve speech recognition in noise for hearing-impaired listeners
- E. Healy, S. Yoho, Y. Wang, and D. Wang, "An algorithm to improve speech recognition in noise for hearing-impaired listeners," J. Acous.Soc. Amer., pp. 3029-3038, 2013.
- (2013) J. Acous.Soc. Amer , pp. 3029-3038
- Healy, E.¹ Yoho, S.² Wang, Y.³ Wang, D.⁴

13
- 78049364397
- MMSE based noise psd tracking with low complexity
- R. Hendriks, R. Heusdens, and J. Jensen, "MMSE based noise PSD tracking with low complexity," in Proc. ICASSP, 2010, pp. 4266-4269.
- (2010) Proc. ICASSP , pp. 4266-4269
- Hendriks, R.¹ Heusdens, R.² Jensen, J.³

14
- 69249222720
- Super-human multi-talker speech recognition: A graphical modeling approach
- J. R. Hershey, S. J. Rennie, P. A. Olsen, and T. T. Kristjansson, "Super-human multi-talker speech recognition: A graphical modeling approach," Comput. Speech Lang., pp. 45-66, 2010.
- (2010) Comput. Speech Lang , pp. 45-66
- Hershey, J.R.¹ Rennie, S.J.² Olsen, P.A.³ Kristjansson, T.T.⁴

15
- 84890466217
- Improving neural networks by preventing co-adaptation of feature detectors
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580, 2012.
- (2012) ArXiv Preprint arXiv:1207.0580
- Hinton, G.E.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.R.⁵

16
- 65249103478
- A supervised learning approach to monaural segregation of reverberant speech
- May
- Z. Jin and D.Wang, "A supervised learning approach to monaural segregation of reverberant speech," IEEE Trans. Audio, Speech, Lang.Process., vol. 17, no. 4, pp. 625-638, May 2009.
- (2009) IEEE Trans. Audio, Speech, Lang.Process , vol.17 , Issue.4 , pp. 625-638
- Jin, Z.¹ Wang, D.²

17
- 70349093614
- An algorithm that improves speech intelligibility in noise for normal-hearing listeners
- G. Kim, Y. Lu, Y. Hu, and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust.Soc. Amer., pp. 1486-1494, 2009.
- (2009) J. Acoust.Soc. Amer , pp. 1486-1494
- Kim, G.¹ Lu, Y.² Hu, Y.³ Loizou, P.⁴

18
- 70349161218
- Role of mask pattern in intelligibility of ideal binary-masked noisy speech
- U. Kjems, J. Boldt, M. Pedersen, T. Lunner, and D. Wang, "Role of mask pattern in intelligibility of ideal binary-masked noisy speech," J.Acoust. Soc. Amer., vol. 126, pp. 1415-1426, 2009.
- (2009) J.Acoust. Soc. Amer , vol.126 , pp. 1415-1426
- Kjems, U.¹ Boldt, J.² Pedersen, M.³ Lunner, T.⁴ Wang, D.⁵

19
- 40749125179
- Factors influencing intelligibility of ideal binary- masked speech: Implications for noise reduction
- N. Li and P. Loizou, "Factors influencing intelligibility of ideal binary- masked speech: Implications for noise reduction," J. Acoust. Soc.Amer., vol. 123, no. 3, pp. 1673-1682, 2008.
- (2008) J. Acoust. Soc.Amer , vol.123 , Issue.3 , pp. 1673-1682
- Li, N.¹ Loizou, P.²

20
- 58149196390
- On the optimality of ideal binary time-frequency masks
- Y. Li and D.Wang, "On the optimality of ideal binary time-frequency masks," Speech Commun., pp. 230-239, 2009.
- (2009) Speech Commun , pp. 230-239
- Li, Y.¹ Wang, D.²

21
- 34447100796
- Boca Raton, FL, USA: CRC
- P. C. Loizou, Speech Enhancement: Theory and Practice. Boca Raton, FL, USA: CRC, 2007.
- (2007) Speech Enhancement: Theory and Practice
- Loizou, P.C.¹

22
- 84881053943
- Supervised and unsupervised speech enhancement approaches using nonnegative matrix factorization
- Oct
- N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and unsupervised speech enhancement approaches using nonnegative matrix factorization," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no.10, pp. 2140-2151, Oct. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.10 , pp. 2140-2151
- Mohammadiha, N.¹ Smaragdis, P.² Leijon, A.³

23
- 84890493989
- Ideal ratio mask estimation using deep neural networks for robust speech recognition
- A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural networks for robust speech recognition," in Proc. ICASSP, 2013, pp. 7092-7096.
- (2013) Proc. ICASSP , pp. 7092-7096
- Narayanan, A.¹ Wang, D.²

24
- 84877621926
- The role of binary mask patterns in automatic speech recognition in background noise
- A. Narayanan and D.Wang, "The role of binary mask patterns in automatic speech recognition in background noise," J. Acoust. Soc. Amer., pp. 3083-3093, 2013.
- (2013) J. Acoust. Soc. Amer , pp. 3083-3093
- Narayanan, A.¹ Wang, D.²

25
- 0344578884
- Mahwah, NJ, USA: Lawrence Erlbaum Associates
- R. Plomp, The Intelligent Ear: On the Nature of Sound Perception.Mahwah, NJ, USA: Lawrence Erlbaum Associates, 2002.
- (2002) The Intelligent Ear: On the Nature of Sound Perception
- Plomp, R.¹

26
- 56249144712
- Soft mask methods for single-channel speaker separation
- Aug
- A. M. Reddy and B. Raj, "Soft mask methods for single-channel speaker separation," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no. 6, pp. 1766-1776, Aug. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.6 , pp. 1766-1776
- Reddy, A.M.¹ Raj, B.²

27
- 0034847662
- Perceptual evaluation of speech quality (pesq) - A new method for speech quality assessment of telephone networks and codecs
- A. Rix, J. Beerends, M. Hollier, and A. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," in Proc. ICASSP, 2001, pp.749-752.
- (2001) Proc. ICASSP , pp. 749-752
- Rix, A.¹ Beerends, J.² Hollier, M.³ Hekstra, A.⁴

28
- 33750311718
- Binary and ratio time-frequency masks for robust speech recognition
- S. Srinivasan, N. Roman, and D. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol.48, no. 11, pp. 1486-1501, 2006.
- (2006) Speech Commun , vol.48 , Issue.11 , pp. 1486-1501
- Srinivasan, S.¹ Roman, N.² Wang, D.³

29
- 79960916745
- An algorithm for intelligibility prediction of time-frequency weighted noisy speech
- Sep
- C. Taal, R. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp.2125-2136, Sep. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.7 , pp. 2125-2136
- Taal, C.¹ Hendriks, R.² Heusdens, R.³ Jensen, J.⁴

30
- 0027623210
- Assessment for automatic speech recognition: II. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems
- A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, pp. 247-251, 1993.
- (1993) Speech Commun , vol.12 , pp. 247-251
- Varga, A.¹ Steeneken, H.²

31
- 84886818613
- Active-set newton algorithm for overcomplete non-negative representations of audio
- Nov
- T. Virtanen, J. Gemmeke, and B. Raj, "Active-set Newton algorithm for overcomplete non-negative representations of audio," IEEE Trans.Audio, Speech, Lang. Process., vol. 21, no. 11, pp. 2277-2289, Nov.2013.
- (2013) IEEE Trans.Audio, Speech, Lang. Process , vol.21 , Issue.11 , pp. 2277-2289
- Virtanen, T.¹ Gemmeke, J.² Raj, B.³

32
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- P.Divenyi, Ed. Norwell, MA, USA: Kluwer
- D.Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P.Divenyi, Ed. Norwell, MA, USA: Kluwer, 2005, pp. 181-197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.¹

33
- 64649103540
- Speech intelligibility in background noise with ideal binary time-frequency masking
- D. Wang, U. Kjems, M. Pedersen, J. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary time-frequency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
- (2009) J. Acoust. Soc. Amer , vol.125 , pp. 2336-2347
- Wang, D.¹ Kjems, U.² Pedersen, M.³ Boldt, J.⁴ Lunner, T.⁵

34
- 84870477511
- Exploring monaural features for classification-based speech segregation
- Feb
- Y. Wang, K. Han, and D. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 2, pp. 270-279, Feb. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.2 , pp. 270-279
- Wang, Y.¹ Han, K.² Wang, D.³

35
- 84875681333
- Cocktail party processing via structured prediction
- Y. Wang and D. Wang, "Cocktail party processing via structured prediction," in Proc. NIPS, 2012, pp. 224-232.
- (2012) Proc. NIPS , pp. 224-232
- Wang, Y.¹ Wang, D.²

36
- 84875678689
- Towards scaling up classification-based speech separation
- Jul
- Y. Wang and D. Wang, "Towards scaling up classification-based speech separation," IEEE Trans. Audio, Speech, Lang. Process., vol.21, no. 7, pp. 1381-1390, Jul. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.7 , pp. 1381-1390
- Wang, Y.¹ Wang, D.²

37
- 84905262918
- A structure-preserving training target for supervised speech separation
- Y. Wang and D. Wang, "A structure-preserving training target for supervised speech separation," in Proc. ICASSP, 2014, pp. 6127-6131.
- (2014) Proc. ICASSP , pp. 6127-6131
- Wang, Y.¹ Wang, D.²

38
- 84877628172
- Ph.D. dissertation, The Ohio State Univ., Columbus, OH, USA
- J. Woodruff, "Integrating monaural and binaural cues for sound localization and segregation in reverberant environments," Ph.D. dissertation, The Ohio State Univ., Columbus, OH, USA, 2012.
- (2012) Integrating Monaural and Binaural Cues for Sound Localization and Segregation in Reverberant Environments
- Woodruff, J.¹

39
- 84889257121
- An experimental study on speech enhancement based on deep neural networks
- Jan
- Y. Xu, J. Du, L. Dai, and C. Lee, "An experimental study on speech enhancement based on deep neural networks," IEEE Signal Processing Lett., vol. 21, no. 1, pp. 66-68, Jan. 2014.
- (2014) IEEE Signal Processing Lett , vol.21 , Issue.1 , pp. 66-68
- Xu, Y.¹ Du, J.² Dai, L.³ Lee, C.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.