메뉴 건너뛰기




Volumn 22, Issue 12, 2014, Pages 1849-1858

On training targets for supervised speech separation

Author keywords

Deep neural networks; Speech separation; Supervised learning; Training targets

Indexed keywords

FACTORIZATION; FAST FOURIER TRANSFORMS; SEPARATION; SPEECH; SPEECH ANALYSIS; SPEECH ENHANCEMENT; SUPERVISED LEARNING;

EID: 84921740463     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2014.2352935     Document Type: Article
Times cited : (1216)

References (39)
  • 1
    • 33748523481 scopus 로고    scopus 로고
    • Determination of the potential benefit of time-frequency gain manipulation
    • M. Anzalone, L. Calandruccio, K. Doherty, and L. Carney, "Determination of the potential benefit of time-frequency gain manipulation," Ear Hear., vol. 27, no. 5, pp. 480-492, 2006.
    • (2006) Ear Hear , vol.27 , Issue.5 , pp. 480-492
    • Anzalone, M.1    Calandruccio, L.2    Doherty, K.3    Carney, L.4
  • 2
    • 33845354768 scopus 로고    scopus 로고
    • Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
    • D. Brungart, P. Chang, B. Simpson, and D. Wang, "Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation," J. Acoust. Soc. Amer., vol. 120, pp. 4007-4018, 2006.
    • (2006) J. Acoust. Soc. Amer , vol.120 , pp. 4007-4018
    • Brungart, D.1    Chang, P.2    Simpson, B.3    Wang, D.4
  • 4
    • 84905233552 scopus 로고    scopus 로고
    • A feature study for classificationbased speech separation at very low signal-to-noise ratio
    • J. Chen, Y. Wang, and D. Wang, "A feature study for classificationbased speech separation at very low signal-to-noise ratio," in Proc.ICASSP, 2014, pp. 7059-7063.
    • (2014) Proc.ICASSP , pp. 7059-7063
    • Chen, J.1    Wang, Y.2    Wang, D.3
  • 5
    • 80052250414 scopus 로고    scopus 로고
    • Adaptive subgradient methods for online learning and stochastic optimization
    • J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," J. Mach. Learn. Res., pp.2121-2159, 2011.
    • (2011) J. Mach. Learn. Res , pp. 2121-2159
    • Duchi, J.1    Hazan, E.2    Singer, Y.3
  • 6
    • 51449104842 scopus 로고    scopus 로고
    • Minimummeansquare error estimation of discrete fourier coefficients with generalized gamma priors
    • Aug
    • J. Erkelens,R.Hendriks, R. Heusdens, and J. Jensen, "Minimummeansquare error estimation of discrete fourier coefficients with generalized gamma priors," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.6 , pp. 1741-1752
    • Erkelens, J.1    Hendriks, R.2    Heusdens, R.3    Jensen, J.4
  • 10
    • 84869105129 scopus 로고    scopus 로고
    • A classification based approach to speech segregation
    • K. Han and D. Wang, "A classification based approach to speech segregation," J. Acoust. Soc. Amer., vol. 132, pp. 3475-3483, 2012.
    • (2012) J. Acoust. Soc. Amer , vol.132 , pp. 3475-3483
    • Han, K.1    Wang, D.2
  • 11
    • 84905268759 scopus 로고    scopus 로고
    • Learning spectralmapping for speech dereverberation
    • K. Han, Y.Wang, and D.Wang, "Learning spectralmapping for speech dereverberation," in Proc. ICASSP, 2014, pp. 4648-4652.
    • (2014) Proc. ICASSP , pp. 4648-4652
    • Han, K.1    Wang, Y.2    Wang, D.3
  • 12
    • 84885412715 scopus 로고    scopus 로고
    • An algorithm to improve speech recognition in noise for hearing-impaired listeners
    • E. Healy, S. Yoho, Y. Wang, and D. Wang, "An algorithm to improve speech recognition in noise for hearing-impaired listeners," J. Acous.Soc. Amer., pp. 3029-3038, 2013.
    • (2013) J. Acous.Soc. Amer , pp. 3029-3038
    • Healy, E.1    Yoho, S.2    Wang, Y.3    Wang, D.4
  • 13
    • 78049364397 scopus 로고    scopus 로고
    • MMSE based noise psd tracking with low complexity
    • R. Hendriks, R. Heusdens, and J. Jensen, "MMSE based noise PSD tracking with low complexity," in Proc. ICASSP, 2010, pp. 4266-4269.
    • (2010) Proc. ICASSP , pp. 4266-4269
    • Hendriks, R.1    Heusdens, R.2    Jensen, J.3
  • 14
  • 16
    • 65249103478 scopus 로고    scopus 로고
    • A supervised learning approach to monaural segregation of reverberant speech
    • May
    • Z. Jin and D.Wang, "A supervised learning approach to monaural segregation of reverberant speech," IEEE Trans. Audio, Speech, Lang.Process., vol. 17, no. 4, pp. 625-638, May 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang.Process , vol.17 , Issue.4 , pp. 625-638
    • Jin, Z.1    Wang, D.2
  • 17
    • 70349093614 scopus 로고    scopus 로고
    • An algorithm that improves speech intelligibility in noise for normal-hearing listeners
    • G. Kim, Y. Lu, Y. Hu, and P. Loizou, "An algorithm that improves speech intelligibility in noise for normal-hearing listeners," J. Acoust.Soc. Amer., pp. 1486-1494, 2009.
    • (2009) J. Acoust.Soc. Amer , pp. 1486-1494
    • Kim, G.1    Lu, Y.2    Hu, Y.3    Loizou, P.4
  • 18
    • 70349161218 scopus 로고    scopus 로고
    • Role of mask pattern in intelligibility of ideal binary-masked noisy speech
    • U. Kjems, J. Boldt, M. Pedersen, T. Lunner, and D. Wang, "Role of mask pattern in intelligibility of ideal binary-masked noisy speech," J.Acoust. Soc. Amer., vol. 126, pp. 1415-1426, 2009.
    • (2009) J.Acoust. Soc. Amer , vol.126 , pp. 1415-1426
    • Kjems, U.1    Boldt, J.2    Pedersen, M.3    Lunner, T.4    Wang, D.5
  • 19
    • 40749125179 scopus 로고    scopus 로고
    • Factors influencing intelligibility of ideal binary- masked speech: Implications for noise reduction
    • N. Li and P. Loizou, "Factors influencing intelligibility of ideal binary- masked speech: Implications for noise reduction," J. Acoust. Soc.Amer., vol. 123, no. 3, pp. 1673-1682, 2008.
    • (2008) J. Acoust. Soc.Amer , vol.123 , Issue.3 , pp. 1673-1682
    • Li, N.1    Loizou, P.2
  • 20
    • 58149196390 scopus 로고    scopus 로고
    • On the optimality of ideal binary time-frequency masks
    • Y. Li and D.Wang, "On the optimality of ideal binary time-frequency masks," Speech Commun., pp. 230-239, 2009.
    • (2009) Speech Commun , pp. 230-239
    • Li, Y.1    Wang, D.2
  • 22
    • 84881053943 scopus 로고    scopus 로고
    • Supervised and unsupervised speech enhancement approaches using nonnegative matrix factorization
    • Oct
    • N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and unsupervised speech enhancement approaches using nonnegative matrix factorization," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no.10, pp. 2140-2151, Oct. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.10 , pp. 2140-2151
    • Mohammadiha, N.1    Smaragdis, P.2    Leijon, A.3
  • 23
    • 84890493989 scopus 로고    scopus 로고
    • Ideal ratio mask estimation using deep neural networks for robust speech recognition
    • A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural networks for robust speech recognition," in Proc. ICASSP, 2013, pp. 7092-7096.
    • (2013) Proc. ICASSP , pp. 7092-7096
    • Narayanan, A.1    Wang, D.2
  • 24
    • 84877621926 scopus 로고    scopus 로고
    • The role of binary mask patterns in automatic speech recognition in background noise
    • A. Narayanan and D.Wang, "The role of binary mask patterns in automatic speech recognition in background noise," J. Acoust. Soc. Amer., pp. 3083-3093, 2013.
    • (2013) J. Acoust. Soc. Amer , pp. 3083-3093
    • Narayanan, A.1    Wang, D.2
  • 26
    • 56249144712 scopus 로고    scopus 로고
    • Soft mask methods for single-channel speaker separation
    • Aug
    • A. M. Reddy and B. Raj, "Soft mask methods for single-channel speaker separation," IEEE Trans. Audio, Speech, Lang. Process., vol.15, no. 6, pp. 1766-1776, Aug. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.6 , pp. 1766-1776
    • Reddy, A.M.1    Raj, B.2
  • 27
    • 0034847662 scopus 로고    scopus 로고
    • Perceptual evaluation of speech quality (pesq) - A new method for speech quality assessment of telephone networks and codecs
    • A. Rix, J. Beerends, M. Hollier, and A. Hekstra, "Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs," in Proc. ICASSP, 2001, pp.749-752.
    • (2001) Proc. ICASSP , pp. 749-752
    • Rix, A.1    Beerends, J.2    Hollier, M.3    Hekstra, A.4
  • 28
    • 33750311718 scopus 로고    scopus 로고
    • Binary and ratio time-frequency masks for robust speech recognition
    • S. Srinivasan, N. Roman, and D. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol.48, no. 11, pp. 1486-1501, 2006.
    • (2006) Speech Commun , vol.48 , Issue.11 , pp. 1486-1501
    • Srinivasan, S.1    Roman, N.2    Wang, D.3
  • 29
    • 79960916745 scopus 로고    scopus 로고
    • An algorithm for intelligibility prediction of time-frequency weighted noisy speech
    • Sep
    • C. Taal, R. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp.2125-2136, Sep. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.7 , pp. 2125-2136
    • Taal, C.1    Hendriks, R.2    Heusdens, R.3    Jensen, J.4
  • 30
    • 0027623210 scopus 로고
    • Assessment for automatic speech recognition: II. Noisex-92: A database and an experiment to study the effect of additive noise on speech recognition systems
    • A. Varga and H. Steeneken, "Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, pp. 247-251, 1993.
    • (1993) Speech Commun , vol.12 , pp. 247-251
    • Varga, A.1    Steeneken, H.2
  • 31
    • 84886818613 scopus 로고    scopus 로고
    • Active-set newton algorithm for overcomplete non-negative representations of audio
    • Nov
    • T. Virtanen, J. Gemmeke, and B. Raj, "Active-set Newton algorithm for overcomplete non-negative representations of audio," IEEE Trans.Audio, Speech, Lang. Process., vol. 21, no. 11, pp. 2277-2289, Nov.2013.
    • (2013) IEEE Trans.Audio, Speech, Lang. Process , vol.21 , Issue.11 , pp. 2277-2289
    • Virtanen, T.1    Gemmeke, J.2    Raj, B.3
  • 32
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary mask as the computational goal of auditory scene analysis
    • P.Divenyi, Ed. Norwell, MA, USA: Kluwer
    • D.Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P.Divenyi, Ed. Norwell, MA, USA: Kluwer, 2005, pp. 181-197.
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.1
  • 33
    • 64649103540 scopus 로고    scopus 로고
    • Speech intelligibility in background noise with ideal binary time-frequency masking
    • D. Wang, U. Kjems, M. Pedersen, J. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary time-frequency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
    • (2009) J. Acoust. Soc. Amer , vol.125 , pp. 2336-2347
    • Wang, D.1    Kjems, U.2    Pedersen, M.3    Boldt, J.4    Lunner, T.5
  • 34
    • 84870477511 scopus 로고    scopus 로고
    • Exploring monaural features for classification-based speech segregation
    • Feb
    • Y. Wang, K. Han, and D. Wang, "Exploring monaural features for classification-based speech segregation," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 2, pp. 270-279, Feb. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.2 , pp. 270-279
    • Wang, Y.1    Han, K.2    Wang, D.3
  • 35
    • 84875681333 scopus 로고    scopus 로고
    • Cocktail party processing via structured prediction
    • Y. Wang and D. Wang, "Cocktail party processing via structured prediction," in Proc. NIPS, 2012, pp. 224-232.
    • (2012) Proc. NIPS , pp. 224-232
    • Wang, Y.1    Wang, D.2
  • 36
    • 84875678689 scopus 로고    scopus 로고
    • Towards scaling up classification-based speech separation
    • Jul
    • Y. Wang and D. Wang, "Towards scaling up classification-based speech separation," IEEE Trans. Audio, Speech, Lang. Process., vol.21, no. 7, pp. 1381-1390, Jul. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.7 , pp. 1381-1390
    • Wang, Y.1    Wang, D.2
  • 37
    • 84905262918 scopus 로고    scopus 로고
    • A structure-preserving training target for supervised speech separation
    • Y. Wang and D. Wang, "A structure-preserving training target for supervised speech separation," in Proc. ICASSP, 2014, pp. 6127-6131.
    • (2014) Proc. ICASSP , pp. 6127-6131
    • Wang, Y.1    Wang, D.2
  • 39
    • 84889257121 scopus 로고    scopus 로고
    • An experimental study on speech enhancement based on deep neural networks
    • Jan
    • Y. Xu, J. Du, L. Dai, and C. Lee, "An experimental study on speech enhancement based on deep neural networks," IEEE Signal Processing Lett., vol. 21, no. 1, pp. 66-68, Jan. 2014.
    • (2014) IEEE Signal Processing Lett , vol.21 , Issue.1 , pp. 66-68
    • Xu, Y.1    Du, J.2    Dai, L.3    Lee, C.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.