SCOPUS 정보 검색 플랫폼

IEEE/ACM Transactions on Audio Speech and Language Processing

Volumn 23, Issue 12, 2015, Pages 2136-2147

Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation

(4) Huang, Po Sen a,b Kim, Minje c Hasegawa Johnson, Mark a Smaragdis, Paris c,d

a United States of America (United States)

b Clarifai (United States)

c University of Illinois at Urbana Champaign (United States)

d ADOBE RESEARCH (United States)

Author keywords

Deep recurrent neural network (DRNN); discriminative training; monaural source separation; time frequency masking

Indexed keywords

RECURRENT NEURAL NETWORKS; SEPARATION; SPEECH; SPEECH ANALYSIS;

DEEP RECURRENT NEURAL NETWORK (DRNN); DISCRIMINATIVE TRAINING; JOINT OPTIMIZATION; SEPARATION PERFORMANCE; SINGING VOICE SEPARATIONS; SPEECH DENOISING; SPEECH SEPARATION; TIME-FREQUENCY MASKING;

SOURCE SEPARATION;

EID: 84941334839 PISSN: 23299290 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2015.2468583 Document Type: Article

Times cited : (510)

References (40)

1
- 84867585384
- Singing-voice separation from monaural recordings using robust principal component analysis
- P.-S. Huang, S. D. Chen, P. Smaragdis, and M. Hasegawa-Johnson, "Singing-voice separation from monaural recordings using robust principal component analysis, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 57-60.
- (2012) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 57-60
- Huang, P.-S.¹ Chen, S.D.² Smaragdis, P.³ Hasegawa-Johnson, M.⁴

2
- 84878409063
- Recurrent neural networks for noise reduction in robust ASR
- A. L. Maas, Q. V. Le, T. M. O'Neil, O. Vinyals, P. Nguyen, and A. Y. Ng, "Recurrent neural networks for noise reduction in robust ASR, " in Proc. 13th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2012, pp. 22-25.
- (2012) Proc. 13th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH) , pp. 22-25
- Maas, A.L.¹ Le, Q.V.² O'Neil, T.M.³ Vinyals, O.⁴ Nguyen, P.⁵ Ng, A.Y.⁶

3
- 84873423755
- Real-time online singing voice separation from monaural recordings using robust low-rank modeling
- P. Sprechmann, A. Bronstein, and G. Sapiro, "Real-time online singing voice separation from monaural recordings using robust low-rank modeling, " in Proc. 13th Int. Soc. Music Inf. Retrieval (ISMIR), 2012.
- (2012) Proc. 13th Int. Soc. Music Inf. Retrieval (ISMIR)
- Sprechmann, P.¹ Bronstein, A.² Sapiro, G.³

4
- 84946061883
- Low-rank representation of both singing voice and music accompaniment via learned dictionaries
- Y.-H. Yang, "Low-rank representation of both singing voice and music accompaniment via learned dictionaries, " in Proc. 14th Int. Soc. Music Inf. Retrieval Conf. (ISMIR), 2013.
- (2013) Proc. 14th Int. Soc. Music Inf. Retrieval Conf. (ISMIR)
- Yang, Y.-H.¹

5
- 84871362349
- On sparse and low-rank matrix decomposition for singing voice separation
- Y.-H. Yang, "On sparse and low-rank matrix decomposition for singing voice separation, " in Proc. 20th ACM Int. Conf. Multimedia, 2012, pp. 757-760.
- (2012) Proc. 20th ACM Int. Conf. Multimedia , pp. 757-760
- Yang, Y.-H.¹

6
- 0018455310
- Suppression of acoustic noise in speech using spectral subtraction
- Apr.
- S. Boll, "Suppression of acoustic noise in speech using spectral subtraction, " IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
- (1979) IEEE Trans. Acoust., Speech, Signal Process., Vol. ASSP-27 , Issue.2 , pp. 113-120
- Boll, S.¹

7
- 0021645331
- Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator
- Dec.
- Y. Ephraim and D. Malah, "Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, " IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.
- (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
- Ephraim, Y.¹ Malah, D.²

8
- 0033592606
- Learning the parts of objects by non-negative matrix factorization
- Oct.
- D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization, " Nature, vol. 401, no. 6755, pp. 788-791, Oct. 1999.
- (1999) Nature , vol.401 , Issue.6755 , pp. 788-791
- Lee, D.D.¹ Seung, H.S.²

9
- 85026972772
- Probabilistic latent semantic indexing
- T. Hofmann, "Probabilistic latent semantic indexing, " in Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 1999, pp. 50-57.
- (1999) Proc. 22nd Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval , pp. 50-57
- Hofmann, T.¹

10
- 84857289846
- A probabilistic latent variable model for acoustic modeling
- P. Smaragdis, B. Raj, and M. Shashanka, "A probabilistic latent variable model for acoustic modeling, " in Proc. Adv. Models Acoust. Process., Neural Inf. Process. Syst. Workshop, 2006, vol. 148.
- (2006) Proc. Adv. Models Acoust. Process., Neural Inf. Process. Syst. Workshop , vol.148
- Smaragdis, P.¹ Raj, B.² Shashanka, M.³

11
- 84905240926
- Deep learning for monaural speech separation
- P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Deep learning for monaural speech separation, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 1562-1566.
- (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 1562-1566
- Huang, P.-S.¹ Kim, M.² Hasegawa-Johnson, M.³ Smaragdis, P.⁴

12
- 85046988721
- Singing-voice separation from monaural recordings using deep recurrent neural networks
- P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Singing-voice separation from monaural recordings using deep recurrent neural networks, " in Proc. 15th Int. Soc. Music Inf. Retrieval (ISMIR), 2014.
- (2014) Proc. 15th Int. Soc. Music Inf. Retrieval (ISMIR)
- Huang, P.-S.¹ Kim, M.² Hasegawa-Johnson, M.³ Smaragdis, P.⁴

13
- 84863765370
- Montreal, QC, Canada, Tech. Rep.
- P. Kabal, TSP speech database McGill Univ., Montreal, QC, Canada, Tech. Rep. 2002.
- (2002) TSP Speech Database McGill Univ
- Kabal, P.¹

14
- 85008542938
- On the improvement of singing voice separation for monaural recordings using the MIR-1 K dataset
- Feb.
- C.-L. Hsu and J.-S. Jang, "On the improvement of singing voice separation for monaural recordings using the MIR-1 K dataset, " IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 2, pp. 310-319, Feb. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.2 , pp. 310-319
- Hsu, C.-L.¹ Jang, J.-S.²

15
- 0003548585
- Philadelphia, PA, USA: Linguistic Data Consortium
- J. Garofolo, L. Lamel, W. Fisher, J. Fiscus, D. Pallett, N. Dahlgren, and V. Zue, TIMIT: Acoustic-Phonetic Continuous Speech Corpus. Philadelphia, PA, USA: Linguistic Data Consortium, 1993.
- (1993) TIMIT: Acoustic-Phonetic Continuous Speech Corpus
- Garofolo, J.¹ Lamel, L.² Fisher, W.³ Fiscus, J.⁴ Pallett, D.⁵ Dahlgren, N.⁶ Zue, V.⁷

16
- 84923289508
- A regression approach to speech enhancement based on deep neural networks
- Jan.
- Y. Xu, J. Du, L.-R. Dai, and C.-H. Lee, "A regression approach to speech enhancement based on deep neural networks, " IEEE Trans. Audio, Speech, Lang. Process., vol. 23, no. 1, pp. 7-19, Jan. 2015.
- (2015) IEEE Trans. Audio, Speech, Lang. Process. , vol.23 , Issue.1 , pp. 7-19
- Xu, Y.¹ Du, J.² Dai, L.-R.³ Lee, C.-H.⁴

17
- 84910049527
- Experiments on deep learning for speech denoising
- D. Liu, P. Smaragdis, and M. Kim, "Experiments on deep learning for speech denoising, " in Proc. 15th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2014, pp. 2685-2689.
- (2014) Proc. 15th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH) , pp. 2685-2689
- Liu, D.¹ Smaragdis, P.² Kim, M.³

18
- 56249144201
- Time-frequency masking for speech separation and its potential for hearing aid design
- D. Wang, "Time-frequency masking for speech separation and its potential for hearing aid design, " Trends in Amplificat., vol. 12, pp. 332-353, 2008.
- (2008) Trends in Amplificat. , vol.12 , pp. 332-353
- Wang, D.¹

19
- 84905284062
- Single-channel speech separation with memory-enhanced recurrent neural networks
- F. Weninger, F. Eyben, and B. Schuller, "Single-channel speech separation with memory-enhanced recurrent neural networks, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 3709-3713.
- (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 3709-3713
- Weninger, F.¹ Eyben, F.² Schuller, B.³

20
- 84905248198
- Deep stacking networks with time series for speech separation
- S. Nie, H. Zhang, X. Zhang, and W. Liu, "Deep stacking networks with time series for speech separation, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 6667-6671.
- (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 6667-6671
- Nie, S.¹ Zhang, H.² Zhang, X.³ Liu, W.⁴

21
- 84890493989
- Ideal ratio mask estimation using deep neural networks for robust speech recognition
- A. Narayanan and D. Wang, "Ideal ratio mask estimation using deep neural networks for robust speech recognition, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 7092-7096.
- (2013) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 7092-7096
- Narayanan, A.¹ Wang, D.²

22
- 84875678689
- Towards scaling up classification-based speech separation
- Jul.
- Y. Wang and D. Wang, "Towards scaling up classification-based speech separation, " IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 7, pp. 1381-1390, Jul. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.7 , pp. 1381-1390
- Wang, Y.¹ Wang, D.²

23
- 84921740463
- On training targets for supervised speech separation
- Dec.
- Y. Wang, A. Narayanan, and D. Wang, "On training targets for supervised speech separation, " IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1849-1858, Dec. 2014.
- (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.22 , Issue.12 , pp. 1849-1858
- Wang, Y.¹ Narayanan, A.² Wang, D.³

24
- 84988231190
- Deep neural network based speech separation for robust speech recognition
- Y. Tu, J. Du, Y. Xu, L. Dai, and C.-H. Lee, "Deep neural network based speech separation for robust speech recognition, " in Proc. Int. Symp. Chinese Spoken Lang. Process., 2014, pp. 532-536.
- (2014) Proc. Int. Symp. Chinese Spoken Lang. Process. , pp. 532-536
- Tu, Y.¹ Du, J.² Xu, Y.³ Dai, L.⁴ Lee, C.-H.⁵

25
- 84905251795
- Deep neural networks for single channel source separation
- E. Grais, M. Sen, and H. Erdogan, "Deep neural networks for single channel source separation, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 3734-3738.
- (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 3734-3738
- Grais, E.¹ Sen, M.² Erdogan, H.³

26
- 84898931970
- Training and analysing deep recurrent neural networks
- M. Hermans and B. Schrauwen, "Training and analysing deep recurrent neural networks, " in Proc. Adv. Neural Inf. Process. Syst. (NIPS), 2013, pp. 190-198.
- (2013) Proc. Adv. Neural Inf. Process. Syst. (NIPS) , pp. 190-198
- Hermans, M.¹ Schrauwen, B.²

27
- 85083951919
- How to construct deep recurrent neural networks
- R. Pascanu, C. Gulcehre, K. Cho, and Y. Bengio, "How to construct deep recurrent neural networks, " in Proc. Int. Conf. Learn. Represent., 2014.
- (2014) Proc. Int. Conf. Learn. Represent.
- Pascanu, R.¹ Gulcehre, C.² Cho, K.³ Bengio, Y.⁴

28
- 84862294866
- Deep sparse rectifier neural networks
- X. Glorot, A. Bordes, and Y. Bengio, "Deep sparse rectifier neural networks, " in Proc. 14th Int. Conf. Artif. Intell. Statist.(AISTATS), 2011, vol. 15, pp. 315-323.
- (2011) Proc. 14th Int. Conf. Artif. Intell. Statist.(AISTATS) , vol.15 , pp. 315-323
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

29
- 0008554931
- A focused back-propagation algorithm for temporal pattern recognition
- M. C. Mozer, "A focused back-propagation algorithm for temporal pattern recognition, " Complex Syst., vol. 3, no. 4, pp. 349-381, 1989.
- (1989) Complex Syst. , vol.3 , Issue.4 , pp. 349-381
- Mozer, M.C.¹

30
- 0025503558
- Backpropagation through time: What it does and how to do it
- Oct.
- P. J. Werbos, "Backpropagation through time: What it does and how to do it, " Proc. IEEE, vol. 78, no. 10, pp. 1550-1560, Oct. 1990.
- (1990) Proc. IEEE , vol.78 , Issue.10 , pp. 1550-1560
- Werbos, P.J.¹

31
- 0001765578
- Gradient-based learning algorithms for recurrent networks and their computational complexity
- Mahwah, NJ, USA: Lawrence Erlbaum Associates
- R. J. Williams and D. Zipser, "Gradient-based learning algorithms for recurrent networks and their computational complexity, " in Back-propagation: Theory, Architectures, and Applications. Mahwah, NJ, USA: Lawrence Erlbaum Associates, 1995, pp. 433-486.
- (1995) Back-propagation: Theory, Architectures, and Applications , pp. 433-486
- Williams, R.J.¹ Zipser, D.²

32
- 3142694930
- Blind separation of speech mixtures via time-frequency masking
- Jul.
- O. Yilmaz and S. Rickard, "Blind separation of speech mixtures via time-frequency masking, " IEEE Trans. Signal Process., vol. 52, no. 7, pp. 1830-1847, Jul. 2004.
- (2004) IEEE Trans. Signal Process. , vol.52 , Issue.7 , pp. 1830-1847
- Yilmaz, O.¹ Rickard, S.²

33
- 33744975847
- Performance measurement in blind audio source separation
- Jul.
- E. Vincent, R. Gribonval, and C. Fevotte, "Performance measurement in blind audio source separation, " IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1462-1469, Jul. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.4 , pp. 1462-1469
- Vincent, E.¹ Gribonval, R.² Fevotte, C.³

34
- 84941339663
- Source separation with scattering non-negative matrix factorization
- J. Bruna, P. Sprechmann, and Y. Lecun, "Source separation with scattering non-negative matrix factorization, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2015, pp. 1876-1880.
- (2015) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 1876-1880
- Bruna, J.¹ Sprechmann, P.² Lecun, Y.³

35
- 79960916745
- An algorithm for intelligibility prediction of time-frequency weighted noisy speech
- Sep.
- C. Taal, R. Hendriks, R. Heusdens, and J. Jensen, "An algorithm for intelligibility prediction of time-frequency weighted noisy speech, " IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2125-2136, Sep. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.7 , pp. 2125-2136
- Taal, C.¹ Hendriks, R.² Heusdens, R.³ Jensen, J.⁴

36
- 0000732463
- A limited memory algorithm for bound constrained optimization
- Sep.
- R. H. Byrd, P. Lu, J. Nocedal, and C. Zhu, "A limited memory algorithm for bound constrained optimization, " SIAM J. Sci. Comput., vol. 16, no. 5, pp. 1190-1208, Sep. 1995.
- (1995) SIAM J. Sci. Comput. , vol.16 , Issue.5 , pp. 1190-1208
- Byrd, R.H.¹ Lu, P.² Nocedal, J.³ Zhu, C.⁴

37
- 84874282188
- Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM
- J. Li, D. Yu, J.-T. Huang, and Y. Gong, "Improving wideband speech recognition using mixed-bandwidth training data in CD-DNN-HMM, " in Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2012, pp. 131-136.
- (2012) Proc. IEEE Spoken Lang. Technol. Workshop (SLT) , pp. 131-136
- Li, J.¹ Yu, D.² Huang, J.-T.³ Gong, Y.⁴

38
- 51449094735
- Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs
- Jul.
- A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, "Adaptation of Bayesian models for single-channel source separation and its application to voice/music separation in popular songs, " IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 5, pp. 1564-1578, Jul. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.5 , pp. 1564-1578
- Ozerov, A.¹ Philippe, P.² Bimbot, F.³ Gribonval, R.⁴

39
- 84941334311
- Discrimi-natively trained recurrent neural networks for single-channel speech separation
- F. Weninger, J. R. Hershey, J. Le Roux, and B. Schuller, "Discrimi-natively trained recurrent neural networks for single-channel speech separation, " in Proc. IEEE Global Conf. Signal and Inf. Process. (GlobalSIP) Symp. Mach. Learn. Applicat. Speech Process., 2014, pp. 577-581.
- (2014) Proc. IEEE Global Conf. Signal and Inf. Process. (GlobalSIP) Symp. Mach. Learn. Applicat. Speech Process. , pp. 577-581
- Weninger, F.¹ Hershey, J.R.² Le Roux, J.³ Schuller, B.⁴

40
- 0031573117
- Long short-term memory
- Nov.
- S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural Comput., vol. 9, no. 8, pp. 1735-1780, Nov. 1997.
- (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.