SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2017, Pages 261-265

Improving music source separation based on deep neural networks through data augmentation and network blending

(7) Uhlich, Stefan a Porcu, Marcello a Giron, Franck a Enenkl, Michael a Kemp, Thomas a Takahashi, Naoya b Mitsufuji, Yuki b

a Sony Deutschland GmbH (Germany)

b SONY CORPORATION (Japan)

Author keywords

Blending; Deep neural network (DNN); Long short term memory (LSTM); Music source separation (MSS)

Indexed keywords

EID: 85023774072 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2017.7952158 Document Type: Conference Paper

Times cited : (251)

References (43)

1
- 84867975114
- Repeating pattern extraction technique (REPET): A simple method for music/voice separation
- Z. Rafii and B. Pardo, "Repeating pattern extraction technique (REPET): A simple method for music/voice separation," IEEE Trans, on Audio, Speech, and Language Processing, Vol. 21, no. 1, pp. 73-84, 2013.
- (2013) IEEE Trans, on Audio, Speech, and Language Processing , vol.21 , Issue.1 , pp. 73-84
- Rafii, Z.¹ Pardo, B.²

2
- 80052984197
- A musically motivated mid-level representation for pitch estimation and musical audio source separation
- J.-L. Durrieu, B. David, and G. Richard, "A musically motivated mid-level representation for pitch estimation and musical audio source separation," IEEE Journal on Selected Topics on Signal Processing, Vol. 5, pp. 1180-1191, 2011.
- (2011) IEEE Journal on Selected Topics on Signal Processing , vol.5 , pp. 1180-1191
- Durrieu, J.-L.¹ David, B.² Richard, G.³

3
- 84866049514
- Stereo music source separation for 3-D upmixing
- H. Shim, J. S. Abel, and K.-M. Sung, "Stereo music source separation for 3-D upmixing," in 127th AES Convention, 2009.
- (2009) 127th AES Convention
- Shim, H.¹ Abel, J.S.² Sung, K.-M.³

4
- 80053163908
- Upmixing from mono - A source separation approach
- D. FitzGerald, "Upmixing from mono - a source separation approach," Proc. Digital Signal Processing, 2011.
- (2011) Proc. Digital Signal Processing
- Fitzgerald, D.¹

5
- 84946087197
- The good vibrations problem
- D. FitzGerald, "The good vibrations problem," 134th AES Convention, e-brief, 2013.
- (2013) 134th AES Convention, E-brief
- Fitzgerald, D.¹

6
- 85023758493
- "SiSEC MUS Homepage," https://sisec.inria.fr/home/2016-professionally-produced-music-recordings/.
- SiSEC MUS Homepage

7
- 84944677084
- The 2015 signal separation evaluation campaign
- N. Ono, Z. Rafii, D. Kitamura, N. Ito, and A. Liutkus, "The 2015 signal separation evaluation campaign," in Proc. LVA/ICA, 2015, pp. 387-395.
- (2015) Proc. LVA/ICA , pp. 387-395
- Ono, N.¹ Rafii, Z.² Kitamura, D.³ Ito, N.⁴ Liutkus, A.⁵

8
- 84944679371
- Multichannel audio source separation with deep neural networks
- A. A. Nugraha, A. Liutkus, and E. Vincent, "Multichannel audio source separation with deep neural networks," INRIA Technical Report, 2015.
- (2015) INRIA Technical Report
- Nugraha, A.A.¹ Liutkus, A.² Vincent, E.³

9
- 85013463259
- Multichannel music separation with deep neural networks
- A. A. Nugraha, A. Liutkus, and E. Vincent, "Multichannel music separation with deep neural networks," in Proc. EUSIPCO, 2016.
- (2016) Proc. EUSIPCO
- Nugraha, A.A.¹ Liutkus, A.² Vincent, E.³

10
- 84944676216
- Deep neural network based instrument extraction from music
- S. Uhlich, F. Giron, and Y. Mitsufuji, "Deep neural network based instrument extraction from music," in Proc. ICASSP, 2015, pp. 2135-2139.
- (2015) Proc. ICASSP , pp. 2135-2139
- Uhlich, S.¹ Giron, F.² Mitsufuji, Y.³

11
- 84930630277
- Deep learning
- Y. LeCun, Y Bengio, and G. Hinton, "Deep learning," Nature, Vol. 521, no. 7553, pp. 436-444, 2015.
- (2015) Nature , vol.521 , Issue.7553 , pp. 436-444
- Lecun, Y.¹ Bengio, Y.² Hinton, G.³

12
- 84905251795
- Deep neural networks for single channel source separation
- E. M. Grais, M. U. Sen, and H. Erdogan, "Deep neural networks for single channel source separation," Proc. IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 3734-3738, 2014.
- (2014) Proc. IEEE Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 3734-3738
- Grais, E.M.¹ Sen, M.U.² Erdogan, H.³

13
- 85046988721
- Singing-voice separation from monaural recordings using deep recurrent neural networks
- P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Singing-voice separation from monaural recordings using deep recurrent neural networks.," in Proc. ISMIR, 2014, pp. 477-482.
- (2014) Proc. ISMIR , pp. 477-482
- Huang, P.-S.¹ Kim, M.² Hasegawa-Johnson, M.³ Smaragdis, P.⁴

14
- 84941334839
- Joint optimization of masks and deep recurrent neural networks for monaural source separation
- P.-S. Huang, M. Kim, M. Hasegawa-Johnson, and P. Smaragdis, "Joint optimization of masks and deep recurrent neural networks for monaural source separation," IEEE/ACM Trans, on Audio, Speech, and Language Processing, Vol. 23, no. 12, pp. 2136-2147, 2015.
- (2015) IEEE/ACM Trans, on Audio, Speech, and Language Processing , vol.23 , Issue.12 , pp. 2136-2147
- Huang, P.-S.¹ Kim, M.² Hasegawa-Johnson, M.³ Smaragdis, P.⁴

15
- 84944681228
- Deep karaoke: Extracting vocals from musical mixtures using a convolutional deep neural network
- A. J. Simpson, G. Roma, and M. D. Plumbley, "Deep karaoke: Extracting vocals from musical mixtures using a convolutional deep neural network," in Proc. LVA/ICA, 2015, pp. 429-436.
- (2015) Proc. LVA/ICA , pp. 429-436
- Simpson, A.J.¹ Roma, G.² Plumbley, M.D.³

16
- 84893573842
- Ensemble learning for speech enhancement
- J. Le Roux, S. Watanabe, and J. R. Hershey, "Ensemble learning for speech enhancement," in Proc. WASPAA, 2013, pp. 1-4.
- (2013) Proc. WASPAA , pp. 1-4
- Le Roux, J.¹ Watanabe, S.² Hershey, J.R.³

17
- 84893323332
- Introducing a simple fusion framework for audio source separation
- X. Jaureguiberry, G. Richard, P. Leveau, R. Hennequin, and E. Vincent, "Introducing a simple fusion framework for audio source separation," in Proc. MLSP, 2013, pp. 1-6.
- (2013) Proc. MLSP , pp. 1-6
- Jaureguiberry, X.¹ Richard, G.² Leveau, P.³ Hennequin, R.⁴ Vincent, E.⁵

18
- 84978138127
- Fusion methods for speech enhancement and audio source separation
- X. Jaureguiberry, E. Vincent, and G. Richard, "Fusion methods for speech enhancement and audio source separation," IEEE/ACM Trans, on Audio, Speech, and Language Processing, Vol. 24, no. 7, pp. 1266-1279, 2016.
- (2016) IEEE/ACM Trans, on Audio, Speech, and Language Processing , vol.24 , Issue.7 , pp. 1266-1279
- Jaureguiberry, X.¹ Vincent, E.² Richard, G.³

19
- 84994242533
- Combining mask estimates for single channel audio source separation using deep neural networks
- E. M. Grais, G. Roma, A. J. R. Simpson, and M. D. Plumbley, "Combining mask estimates for single channel audio source separation using deep neural networks," in Proc. Interspeech, 2016.
- (2016) Proc. Interspeech
- Grais, E.M.¹ Roma, G.² Simpson, A.J.R.³ Plumbley, M.D.⁴

20
- 84983499009
- Single-channel audio source separation using deep neural network ensembles
- E. M. Grais, G. Roma, A. J. R. Simpson, and M. D. Plumbley, "Single-channel audio source separation using deep neural network ensembles," in 140th AES Convention, 2016.
- (2016) 140th AES Convention
- Grais, E.M.¹ Roma, G.² Simpson, A.J.R.³ Plumbley, M.D.⁴

21
- 84959157364
- Multi-resolution stacking for speech separation based on boosted DNN
- X.-L. Zhang and D. Wang, "Multi-resolution stacking for speech separation based on boosted DNN," in Proc. Inter-speech, 2015, pp. 1745-1749.
- (2015) Proc. Inter-speech , pp. 1745-1749
- Zhang, X.-L.¹ Wang, D.²

22
- 84971350832
- A deep ensemble learning method for monaural speech separation
- X.-L. Zhang and D. Wang, "A deep ensemble learning method for monaural speech separation," IEEE/ACM Trans, on Audio, Speech, and Language Processing, Vol. 24, no. 5, pp. 967-977, 2016.
- (2016) IEEE/ACM Trans, on Audio, Speech, and Language Processing , vol.24 , Issue.5 , pp. 967-977
- Zhang, X.-L.¹ Wang, D.²

23
- 84946079770
- Cross-domain cooperative deep stacking network for speech separation
- IEEE
- W Jiang, S. Liang, L. Dong, H. Yang, W Liu, and Y Wang, "Cross-domain cooperative deep stacking network for speech separation," in Proc. ICASSP. IEEE, 2015, pp. 5083-5087.
- (2015) Proc. ICASSP , pp. 5083-5087
- Jiang, W.¹ Liang, S.² Dong, L.³ Yang, H.⁴ Liu, W.⁵ Wang, Y.⁶

24
- 10944227316
- Sparse coding and NMF
- J. Eggert and E. Körner, "Sparse coding and NMF," in Proc. Neural Networks, 2004, vol. 4, pp. 2529-2533.
- (2004) Proc. Neural Networks , vol.4 , pp. 2529-2533
- Eggert, J.¹ Körner, E.²

25
- 63249085556
- Nonnegative matrix factorization with the itakura-saito divergence: With application to music analysis
- C. Févotte, N. Bertin, and J.-L. Durrieu, "Nonnegative matrix factorization with the Itakura-Saito divergence: With application to music analysis," Neural computation, Vol. 21, no. 3, pp. 793-830, 2009.
- (2009) Neural Computation , vol.21 , Issue.3 , pp. 793-830
- Févotte, C.¹ Bertin, N.² Durrieu, J.-L.³

26
- 84910065215
- Discriminative NMF and its application to single-channel source separation
- F. Weninger, J. Le Roux, J. R. Hershey, and S. Watanabe, "Discriminative NMF and its application to single-channel source separation," in Proc. Interspeech, 2014, pp. 865-869.
- (2014) Proc. Interspeech , pp. 865-869
- Weninger, F.¹ Le Roux, J.² Hershey, J.R.³ Watanabe, S.⁴

27
- 84946011374
- Deep NMF for speech separation
- J. Le Roux, J. R. Hershey, and F. Weninger, "Deep NMF for speech separation," in Proc. ICASSP, 2015, pp. 66-70.
- (2015) Proc. ICASSP , pp. 66-70
- Le Roux, J.¹ Hershey, J.R.² Weninger, F.³

28
- 33744975847
- Performance measurement in blind audio source separation
- E. Vincent, R. Gribonval, and C. Févotte, "Performance measurement in blind audio source separation," IEEE Trans, on Audio, Speech and Language Processing, Vol. 14, no. 4, pp. 1462-1469, 2006.
- (2006) IEEE Trans, on Audio, Speech and Language Processing , vol.14 , Issue.4 , pp. 1462-1469
- Vincent, E.¹ Gribonval, R.² Févotte, C.³

29
- 77955675017
- Under-determined reverberant audio source separation using a full-rank spatial covariance model
- N. Q. Duong, E. Vincent, and R. Gribonval, "Under-determined reverberant audio source separation using a full-rank spatial covariance model," IEEE Trans, on Audio, Speech, and Language Processing, Vol. 18, no. 7, pp. 1830-1840, 2010.
- (2010) IEEE Trans, on Audio, Speech, and Language Processing , vol.18 , Issue.7 , pp. 1830-1840
- Duong, N.Q.¹ Vincent, E.² Gribonval, R.³

30
- 84897584695
- A general flexible framework for the handling of prior information in audio source separation
- A. Ozerov, E. Vincent, and F. Bimbot, "A general flexible framework for the handling of prior information in audio source separation," IEEE Trans, on Audio, Speech, and Language Processing, Vol. 20, no. 4, pp. 1118-1133, 2012.
- (2012) IEEE Trans, on Audio, Speech, and Language Processing , vol.20 , Issue.4 , pp. 1118-1133
- Ozerov, A.¹ Vincent, E.² Bimbot, F.³

31
- 85023754420
- Estimation Theory, Prentice-Hall
- S.M. Kay, Fundamentals oI'Statistical Signal Processing, Volume 1: Estimation Theory, Prentice-Hall, 1993.
- (1993) Fundamentals oi'Statistical Signal Processing , vol.1
- Kay, S.M.¹

32
- 84964555729
- Robust ASR using neural network based speech enhancement and feature simulation
- S. Sivasankaran, A. A. Nugraha, E. Vincent, J. A. Morales-Cordovilla, S. Dalmia, I. Illina, and A. Liutkus, "Robust ASR using neural network based speech enhancement and feature simulation," in Proc. ASRU, 2015, pp. 482-489.
- (2015) Proc. ASRU , pp. 482-489
- Sivasankaran, S.¹ Nugraha, A.A.² Vincent, E.³ Morales-Cordovilla, J.A.⁴ Dalmia, S.⁵ Illina, I.⁶ Liutkus, A.⁷

33
- 84862294866
- Deep sparse rectifier networks
- X. Glorot, A. Bordes, and Y Bengio, "Deep sparse rectifier networks," Proc. AISTATS, Vol. 15, pp. 315-323, 2011.
- (2011) Proc. AISTATS , vol.15 , pp. 315-323
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

34
- 85057325970
- MedleyDB: A multitrack dataset for annotation-intensive MIR research
- R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello, "MedleyDB: A multitrack dataset for annotation-intensive MIR research.," in Proc. ISMIR, 2014, pp. 155-160.
- (2014) Proc. ISMIR , pp. 155-160
- Bittner, R.M.¹ Salamon, J.² Tierney, M.³ Mauch, M.⁴ Cannam, C.⁵ Bello, J.P.⁶

35
- 0003472470
- John Wiley & Sons
- R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification and Scene Analysis, John Wiley & Sons, 2001.
- (2001) Pattern Classification and Scene Analysis
- Duda, R.O.¹ Hart, P.E.² Stork, D.G.³

36
- 27744588611
- Framewise phoneme classification with bidirectional LSTM and other neural network architectures
- A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures," Neural Networks, Vol. 18, no. 5, pp. 602-610, 2005.
- (2005) Neural Networks , vol.18 , Issue.5 , pp. 602-610
- Graves, A.¹ Schmidhuber, J.²

37
- 85023742699
- "Lasagne GitHub," https://github.com/Lasagne/Lasagne.
- Lasagne GitHub

38
- 85023748945
- "Theano GitHub," https://github.com/Theano/Theano.
- Theano GitHub

39
- 84979557463
- arXiv preprintarXiv: 1605.02688
- The Theano Development Team, "Theano: A python framework for fast computation of mathematical expressions," arXiv preprintarXiv: 1605.02688, 2016.
- (2016) Theano: A Python Framework for Fast Computation of Mathematical Expressions

40
- 84973376414
- Exploring data augmentation for improved singing voice detection with neural networks
- J. Schlüter and T. Grill, "Exploring data augmentation for improved singing voice detection with neural networks," in Proc. ISMIR, 2015.
- (2015) Proc. ISMIR
- Schlüter, J.¹ Grill, T.²

41
- 84996516893
- A software framework for musical data augmentation
- B. McFee, E. J. Humphrey, and J. P. Bello, "A software framework for musical data augmentation," in Proc. ISMIR, 2015.
- (2015) Proc. ISMIR
- McFee, B.¹ Humphrey, E.J.² Bello, J.P.³

42
- 58049174287
- The netflix prize
- J. Bennett and S. Lanning, "The Netflix prize," in Proc. KDD Cup and Workshop, 2007, vol. 2007, p. 35.
- (2007) Proc. KDD Cup and Workshop , vol.2007 , pp. 35
- Bennett, J.¹ Lanning, S.²

43
- 57349146373
- Lessons from the netflix prize challenge
- R. M. Bell and Y Koren, "Lessons from the Netflix prize challenge," ACM SIGKDD Explorations Newsletter, Vol. 9, no. 2, pp. 75-79, 2007.
- (2007) ACM SIGKDD Explorations Newsletter , vol.9 , Issue.2 , pp. 75-79
- Bell, R.M.¹ Koren, Y.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.