SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 1508-1512

Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement

(5) Xu, Yong a Du, Jun a Huang, Zhen b Dai, Li Rong a Lee, Chin Hui b

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b GEORGIA INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Binary mask; Deep neural network; Minimum mean square error; Multi objective learning; Speech enhancement

Indexed keywords

BINS; MEAN SQUARE ERROR; SPEECH; SPEECH ENHANCEMENT; SPEECH RECOGNITION;

BINARY MASKS; DEEP NEURAL NETWORKS; IDEAL BINARY MASK (IBM); JOINT OPTIMIZATION; LISTENING QUALITIES; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; MINIMUM MEAN SQUARE ERRORS; MULTI-OBJECTIVE LEARNING;

SPEECH COMMUNICATION;

EID: 84959100788 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (69)

References (40)

1
- 0018455310
- Suppression of acoustic noise in speech using spectralsubtraction
- S. Boll, "Suppression of acoustic noise in speech using spectralsubtraction, " IEEE Transactions on Acoustics, Speech and SignalProcessing, vol. 27, no. 2, pp. 113-120, 1979.
- (1979) IEEE Transactions on Acoustics, Speech and SignalProcessing , vol.27 , Issue.2 , pp. 113-120
- Boll, S.¹

2
- 0021645331
- Speech enhancement using aminimum-mean square error short-time spectral amplitude esti-mator
- Y. Ephraim and D. Malah, "Speech enhancement using aminimum-mean square error short-time spectral amplitude esti-mator, " IEEE Transactions on Acoustics, Speech and Signal Pro-cessing, vol. 32, no. 6, pp. 1109-1121, 1984.
- (1984) IEEE Transactions on Acoustics, Speech and Signal Pro-cessing , vol.32 , Issue.6 , pp. 1109-1121
- Ephraim, Y.¹ Malah, D.²

3
- 0021892216
- Speech enhancement using a minimum mean-square errorlog-spectral amplitude estimator
- -, "Speech enhancement using a minimum mean-square errorlog-spectral amplitude estimator, " IEEE Transactions on Acous-tics, Speech and Signal Processing, vol. 33, no. 2, pp. 443-445, 1985.
- (1985) IEEE Transactions on Acous-tics, Speech and Signal Processing , vol.33 , Issue.2 , pp. 443-445
- Ephraim, Y.¹ Malah, D.²

4
- 0035500783
- Speech enhancement for non-stationary noise environments
- I. Cohen and B. Berdugo, "Speech enhancement for non-stationary noise environments, " Signal processing, vol. 81, no. 11, pp. 2403-2418, 2001.
- (2001) Signal Processing , vol.81 , Issue.11 , pp. 2403-2418
- Cohen, I.¹ Berdugo, B.²

5
- 0041360463
- Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
- I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, " IEEE Transac-tions on Acoustics, Speech and Signal Processing, vol. 11, no. 5, pp. 466-475, 2003.
- (2003) IEEE Transac-tions on Acoustics, Speech and Signal Processing , vol.11 , Issue.5 , pp. 466-475
- Cohen, I.¹

6
- 0027623210
- Assessment for automatic speechrecognition: II. Noisex-92: A database and an experiment tostudy the effect of additive noise on speech recognition systems
- A. Varga and H. J. Steeneken, "Assessment for automatic speechrecognition: II. noisex-92: A database and an experiment tostudy the effect of additive noise on speech recognition systems, "Speech communication, vol. 12, no. 3, pp. 247-251, 1993.
- (1993) Speech Communication , vol.12 , Issue.3 , pp. 247-251
- Varga, A.¹ Steeneken, H.J.²

7
- 84881053943
- Supervised and unsupervised speech enhancement using nonnegative matrix fac-torization
- N. Mohammadiha, P. Smaragdis, and A. Leijon, "Supervised and unsupervised speech enhancement using nonnegative matrix fac-torization, " IEEE Transactions on Acoustics, Speech and SignalProcessing, vol. 21, no. 10, pp. 2140-2151, 2013.
- (2013) IEEE Transactions on Acoustics, Speech and SignalProcessing , vol.21 , Issue.10 , pp. 2140-2151
- Mohammadiha, N.¹ Smaragdis, P.² Leijon, A.³

8
- 84867198451
- Regularized non-negative matrix factorization with temporal dependencies forspeech denoising
- K. W. Wilson, B. Raj, and P. Smaragdis, "Regularized non-negative matrix factorization with temporal dependencies forspeech denoising. " in INTERSPEECH, 2008, pp. 411-414.
- (2008) INTERSPEECH , pp. 411-414
- Wilson, K.W.¹ Raj, B.² Smaragdis, P.³

9
- 85032751458
- Deepneural networks for acoustic modeling in speech recognition: Theshared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. E. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. N. Sainath et al., "Deepneural networks for acoustic modeling in speech recognition: Theshared views of four research groups, " IEEE Signal ProcessingMagazine, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal ProcessingMagazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰

10
- 84055222005
- Context-dependentpre-trained deep neural networks for large-vocabulary speechrecognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependentpre-trained deep neural networks for large-vocabulary speechrecognition, " IEEE Transactions on Audio, Speech, and LanguageProcessing, vol. 20, no. 1, pp. 30-42, 2012.
- (2012) IEEE Transactions on Audio, Speech, and LanguageProcessing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

11
- 84889263385
- Denoising deep neural networks basedvoice activity detection
- X.-L. Zhang and J. Wu, "Denoising deep neural networks basedvoice activity detection, " in ICASSP, 2013, pp. 853-857.
- (2013) ICASSP , pp. 853-857
- Zhang, X.-L.¹ Wu, J.²

12
- 84923289508
- A regression approachto speech enhancement based on deep neural networks
- Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, "A regression approachto speech enhancement based on deep neural networks, " IEEETransactions on Acoustics, Speech and Signal Processing, vol. 23, no. 1, pp. 7-19, 2015.
- (2015) IEEETransactions on Acoustics, Speech and Signal Processing , vol.23 , Issue.1 , pp. 7-19
- Xu, Y.¹ Du, J.² Dai, L.-R.³ Lee, C.-H.⁴

13
- 84889257121
- An experimental study on speech enhancement based ondeep neural networks
- -, "An experimental study on speech enhancement based ondeep neural networks, " IEEE Signal Processing Letters, vol. 21, no. 1, pp. 65-68, 2014.
- (2014) IEEE Signal Processing Letters , vol.21 , Issue.1 , pp. 65-68
- Xu, Y.¹ Du, J.² Dai, L.-R.³ Lee, C.-H.⁴

14
- 84910038203
- Dynamic noise aware training for speech enhancementbased on deep neural networks
- -, "Dynamic noise aware training for speech enhancementbased on deep neural networks. " in INTERSPEECH, 2014, pp. 2670-2674.
- (2014) INTERSPEECH , pp. 2670-2674
- Xu, Y.¹ Du, J.² Dai, L.-R.³ Lee, C.-H.⁴

15
- 84867202951
- A speech enhancement approach using piece-wise linear approximation of an explicit model of environmentaldistortions
- J. Du and Q. Huo, "A speech enhancement approach using piece-wise linear approximation of an explicit model of environmentaldistortions. " in INTERSPEECH, 2008, pp. 569-572.
- (2008) INTERSPEECH , pp. 569-572
- Du, J.¹ Huo, Q.²

16
- 84906262433
- Speech enhancementbased on deep denoising autoencoder
- X. Lu, Y. Tsao, S. Matsuda, and C. Hori, "Speech enhancementbased on deep denoising autoencoder. " in INTERSPEECH, 2013, pp. 436-440.
- (2013) INTERSPEECH , pp. 436-440
- Lu, X.¹ Tsao, Y.² Matsuda, S.³ Hori, C.⁴

17
- 84896537574
- Wiener filtering based speech enhancementwith weighted denoising auto-encoder and noise classification
- B. Xia and C. Bao, "Wiener filtering based speech enhancementwith weighted denoising auto-encoder and noise classification, "Speech Communication, vol. 60, pp. 13-29, 2014.
- (2014) Speech Communication , vol.60 , pp. 13-29
- Xia, B.¹ Bao, C.²

18
- 84905240926
- Deep learning for monaural speech separation
- P. S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Deep learning for monaural speech separation, " in ICASSP, 2014, pp. 1562-1566.
- (2014) ICASSP , pp. 1562-1566
- Huang, P.S.¹ Kim, M.² Hasegawa-Johnson, M.³ Smaragdis, P.⁴

19
- 84910049527
- Experiments on deep learningfor speech denoising
- D. Liu, P. Smaragdis, and M. Kim, "Experiments on deep learningfor speech denoising, " in INTERSPEECH, 2014, pp. 2685-2689.
- (2014) INTERSPEECH , pp. 2685-2689
- Liu, D.¹ Smaragdis, P.² Kim, M.³

20
- 0004257992
- Courier Corpo-ration
- S. Kullback, Information theory and statistics. Courier Corpo-ration, 1997.
- (1997) Information Theory and Statistics
- Kullback, S.¹

21
- 0014698310
- Analysis synthesis telephony based on themaximum likelihood method
- F. Itakura and S. Saito, "Analysis synthesis telephony based on themaximum likelihood method, " in Proceedings of the 6th Interna-tional Congress on Acoustics, 1968, pp. 17-20.
- (1968) Proceedings of the 6th Interna-tional Congress on Acoustics , pp. 17-20
- Itakura, F.¹ Saito, S.²

22
- 84921740463
- On training tar-gets for supervised speech separation
- Y. X. Wang, A. Narayanan, and D. L. Wang, "On training tar-gets for supervised speech separation, " IEEE/ACM Transactionson Acoustics, Speech and Signal Processing, vol. 22, no. 12, pp. 1849-1858, 2014.
- (2014) IEEE/ACM Transactionson Acoustics, Speech and Signal Processing , vol.22 , Issue.12 , pp. 1849-1858
- Wang, Y.X.¹ Narayanan, A.² Wang, D.L.³

23
- 82255178542
- Wiley-IEEEPress
- D. L. Wang and G. J. Brown, Computational auditory scene anal-ysis: Principles, algorithms, and applications. Wiley-IEEEPress, 2006.
- (2006) Computational Auditory Scene Anal-ysis: Principles, Algorithms, and Applications
- Wang, D.L.¹ Brown, G.J.²

24
- 84875678689
- Towards scaling up classification-based speech separation
- Y. X. Wang and D. L. Wang, "Towards scaling up classification-based speech separation, " IEEE Transactions on Acoustics, Speech and Signal Processing, vol. 21, no. 7, pp. 1381-1390, 2013.
- (2013) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.21 , Issue.7 , pp. 1381-1390
- Wang, Y.X.¹ Wang, D.L.²

25
- 84890493989
- Ideal ratio mask estimation usingdeep neural networks for robust speech recognition
- A. Narayanan and D. L. Wang, "Ideal ratio mask estimation usingdeep neural networks for robust speech recognition, " in ICASSP, 2013, pp. 7092-7096.
- (2013) ICASSP , pp. 7092-7096
- Narayanan, A.¹ Wang, D.L.²

26
- 0031189914
- Multitask learning: A knowledge-based source of in-ductive bias
- R. Caruna, "Multitask learning: A knowledge-based source of in-ductive bias, " in ICML, 1993, pp. 41-48.
- (1993) ICML , pp. 41-48
- Caruna, R.¹

27
- 84890545600
- Multi-task learning in deep neuralnetworks for improved phoneme recognition
- M. L. Seltzer and J. Droppo, "Multi-task learning in deep neuralnetworks for improved phoneme recognition, " in ICASSP, 2013, pp. 6965-6969.
- (2013) ICASSP , pp. 6965-6969
- Seltzer, M.L.¹ Droppo, J.²

28
- 84959169347
- submitted to INTERSPEECH
- Z. Huang, J. Li, S. M. Siniscalchi, I.-F. Chen, J. Wu, and C.-H. Lee, "Rapid adaptation for deep neural networks through multi-task learning, " 2015, submitted to INTERSPEECH.
- (2015) Rapid Adaptation for Deep Neural Networks Through Multi-task Learning
- Huang, Z.¹ Li, J.² Siniscalchi, S.M.³ Chen, I.-F.⁴ Wu, J.⁵ Lee, C.-H.⁶

29
- 0032595188
- Generalizedmel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition
- R. Vergin, D. O'shaughnessy, and A. Farhat, "Generalizedmel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition, " IEEE Transactionson Speech and Audio Processing, vol. 7, no. 5, pp. 525-532, 1999.
- (1999) IEEE Transactionson Speech and Audio Processing , vol.7 , Issue.5 , pp. 525-532
- Vergin, R.¹ O'Shaughnessy, D.² Farhat, A.³

30
- 30444446629
- Combining evidence fromresidual phase and mfcc features for speaker recognition
- K. S. R. Murty and B. Yegnanarayana, "Combining evidence fromresidual phase and mfcc features for speaker recognition, " IEEESignal Processing Letters, vol. 13, no. 1, pp. 52-55, 2006.
- (2006) IEEESignal Processing Letters , vol.13 , Issue.1 , pp. 52-55
- Murty, K.S.R.¹ Yegnanarayana, B.²

31
- 0009985115
- Mel frequency cepstral coefficients for musicmodeling
- B. Logan et al., "Mel frequency cepstral coefficients for musicmodeling. " in ISMIR, 2000.
- (2000) ISMIR
- Logan, B.¹

32
- 84902748454
- Discrete cosine trans-form
- N. Ahmed, T. Natarajan, and K. R. Rao, "Discrete cosine trans-form, " IEEE Transactions on Computers, vol. 100, no. 1, pp. 90-93, 1974.
- (1974) IEEE Transactions on Computers , vol.100 , Issue.1 , pp. 90-93
- Ahmed, N.¹ Natarajan, T.² Rao, K.R.³

33
- 84870477511
- Exploring monaural fea-tures for classification-based speech segregation
- Y. X. Wang, K. Han, and D. L. Wang, "Exploring monaural fea-tures for classification-based speech segregation, " IEEE Transac-tions on Acoustics, Speech and Signal Processing, vol. 21, no. 2, pp. 270-279, 2013.
- (2013) IEEE Transac-tions on Acoustics, Speech and Signal Processing , vol.21 , Issue.2 , pp. 270-279
- Wang, Y.X.¹ Han, K.² Wang, D.L.³

34
- 84890467815
- G. Hu, "100 nonspeech environmental sounds, 2004 [on-line], " http: //web. cse. ohio-state. edu/pnl/corpus/HuNonspeech/HuCorpus. html, 2004.
- (2004) 100 Nonspeech Environmental Sounds, 2004 [On-line]
- Hu, G.¹

35
- 0008861179
- Getting started with the DARPA timit cd-rom: An acoustic phonetic continuous speech database
- Gaithersburgh, MD
- J. S. Garofolo et al., "Getting started with the darpa timit cd-rom: An acoustic phonetic continuous speech database, " National In-stitute of Stand ards and Technology (NIST), Gaithersburgh, MD, vol. 107, 1988.
- (1988) National In-stitute of Stand Ards and Technology (NIST) , vol.107
- Garofolo, J.S.¹

36
- 0034847662
- Perceptual evaluation of speech quality (pesq)-a new method forspeech quality assessment of telephone networks and codecs
- A. W. Rix, J. G. Beerends, M. P. Hollier, and A. P. Hekstra, "Perceptual evaluation of speech quality (pesq)-a new method forspeech quality assessment of telephone networks and codecs, " inICASSP, 2001, pp. 749-752.
- (2001) ICASSP , pp. 749-752
- Rix, A.W.¹ Beerends, J.G.² Hollier, M.P.³ Hekstra, A.P.⁴

37
- 79960916745
- An al-gorithm for intelligibility prediction of time-frequency weightednoisy speech
- C. H. Taal, R. C. Hendriks, R. Heusdens, and J. Jensen, "An al-gorithm for intelligibility prediction of time-frequency weightednoisy speech, " IEEE Transactions on Acoustics, Speech and Sig-nal Processing, vol. 19, no. 7, pp. 2125-2136, 2011.
- (2011) IEEE Transactions on Acoustics, Speech and Sig-nal Processing , vol.19 , Issue.7 , pp. 2125-2136
- Taal, C.H.¹ Hendriks, R.C.² Heusdens, R.³ Jensen, J.⁴

38
- 84890527827
- Improving deepneural networks for lvcsr using rectified linear units and dropout
- G. E. Dahl, T. N. Sainath, and G. E. Hinton, "Improving deepneural networks for lvcsr using rectified linear units and dropout, "in ICASSP, 2013, pp. 8609-8613.
- (2013) ICASSP , pp. 8609-8613
- Dahl, G.E.¹ Sainath, T.N.² Hinton, G.E.³

39
- 84904163933
- Dropout: A simple way to prevent neural net-works from overfitting
- N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: A simple way to prevent neural net-works from overfitting, " The Journal of Machine Learning Re-search, vol. 15, no. 1, pp. 1929-1958, 2014.
- (2014) The Journal of Machine Learning Re-search , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.E.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

40
- 84890492030
- An investigation of deepneural networks for noise robust speech recognition
- M. L. Seltzer, D. Yu, and Y. Wang, "An investigation of deepneural networks for noise robust speech recognition, " in ICASSP, 2013, pp. 7398-7402.
- (2013) ICASSP , pp. 7398-7402
- Seltzer, M.L.¹ Yu, D.² Wang, Y.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.