SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 18, Issue 2, 2010, Pages 310-319

On the Improvement of Singing Voice Separation for Monaural Recordings Using the MIR-1K Dataset

(2) Hsu, Chao Ling a Jang, Jyh Shing Roger a

a NATIONAL TSING HUA UNIVERSITY (Taiwan)

Author keywords

Computational auditory scene analysis (CASA); singing voice separation; unvoiced sound separation

Indexed keywords

EID: 85008542938 PISSN: 15587916 EISSN: 15587924 Source Type: Journal
DOI: 10.1109/TASL.2009.2026503 Document Type: Article

Times cited : (256)

References (32)

1
- 85009187525
- An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker
- Geneva, Switzerland
- C. K. Wang, R. Y. Lyu, and Y. C. Chiang, “An automatic singing transcription system with multilingual singing lyric recognizer and robust melody tracker,” in Proc. 8th Eur. Conf. Speech Commun. Technol., Geneva, Switzerland, 2003, pp. 1197–1200.
- (2003) Proc. 8th Eur. Conf. Speech Commun. Technol. , pp. 1197-1200
- Wang, C.K.¹ Lyu, R.Y.² Chiang, Y.C.³

2
- 11244322304
- System and method for automatic singer identification
- T. Zhang, “System and method for automatic singer identification,” in Proc. IEEE Int. Conf. Multimedia and Expo (ICME), 2003, pp. 33–36.
- (2003) Proc. IEEE Int. Conf. Multimedia and Expo (ICME) , pp. 33-36
- Zhang, T.¹

3
- 77957274549
- Disambiguating music emotion using software agents
- Barcelona, Spain
- D. Yang and W. Lee, “Disambiguating music emotion using software agents,” in Proc. Symp. Music Inf. Retrieval (ISMIR'04), Barcelona, Spain, 2004, pp. 52–57.
- (2004) Proc. Symp. Music Inf. Retrieval (ISMIR'04) , pp. 52-57
- Yang, D.¹ Lee, W.²

4
- 84873538214
- Separation of vocals from polyphonic audio recordings
- S. Vembu and S. Baumann, “Separation of vocals from polyphonic audio recordings,” in Proc. Int. Symp. Music Inf. Retrieval (ISMIR'05), 2005, pp. 337–344.
- (2005) Proc. Int. Symp. Music Inf. Retrieval (ISMIR'05) , pp. 337-344
- Vembu, S.¹ Baumann, S.²

5
- 50549089895
- Separation of singing voice from music accompaniment for monaural recordings
- Y. Li and D. L. Wang, “Separation of singing voice from music accompaniment for monaural recordings,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, pp. 1475–1487, 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , pp. 1475-1487
- Li, Y.¹ Wang, D.L.²

6
- 84946031315
- Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music
- Brisbane, Australia, Sep.
- T. Virtanen, A. Mesaros, and M. Ryynanen, “Combining pitch-based inference and non-negative spectrogram factorization in separating vocals from polyphonic music,” in Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA'08), Brisbane, Australia, Sep. 2008, pp. 17–20.
- (2008) Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audition (SAPA'08) , pp. 17-20
- Virtanen, T.¹ Mesaros, A.² Ryynanen, M.³

7
- 51449094735
- Adaptation of Bayesian models for single channel source separation and its application to voice/music separation in popular songs
- Jul.
- A. Ozerov, P. Philippe, F. Bimbot, and R. Gribonval, “Adaptation of Bayesian models for single channel source separation and its application to voice/music separation in popular songs,” IEEE Trans. Audio, Speech, Lang. Proc., Special Iss. Blind Signal Process Speech Audio Applicat., vol. 15, no. 5, pp. 1564–1578, Jul. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Proc., Special Iss. Blind Signal Process Speech Audio Applicat. , vol.15 , Issue.5 , pp. 1564-1578
- Ozerov, A.¹ Philippe, P.² Bimbot, F.³ Gribonval, R.⁴

8
- 80053146532
- Separating a foreground singer from background music
- Mysore, India, Jan.
- B. Raj, P. Smaragdis, M. V. Shashanka, and R. Singh, “Separating a foreground singer from background music,” in Proc. Int Symp. Frontiers Res. Speech Music (FRSM), Mysore, India, Jan. 2007.
- (2007) Proc. Int Symp. Frontiers Res. Speech Music (FRSM)
- Raj, B.¹ Smaragdis, P.² Shashanka, M.V.³ Singh, R.⁴

9
- 84870889437
- Timing is of the essence: Neural oscillator models of auditory grouping
- Lawrence Erlbaum, NJ: Mahwah
- G. J. Brown and D. L. Wang, S. Greenberg and W. Ainsworth, Eds., “Timing is of the essence: Neural oscillator models of auditory grouping,” in Listening to Speech: An Auditory Perspective. Lawrence Erlbaum, NJ: Mahwah, 2006, pp. 375–392.
- (2006) Listening to Speech: An Auditory Perspective , pp. 375-392
- Brown, G.J.¹ Wang, D.L.² Greenberg, S.³ Ainsworth, W.⁴

10
- 46049084696
- An auditory scene analysis approach to monaural speech segregation
- Heidelberg, Germany: Springer
- G. Hu and D. L. Wang, E. Hansler and G. Schmidt, Eds., “An auditory scene analysis approach to monaural speech segregation,” in Acoustic Echo and Noise Control. Heidelberg, Germany: Springer, 2006, pp. 485–515.
- (2006) Acoustic Echo and Noise Control , pp. 485-515
- Hu, G.¹ Wang, D.L.² Hansler, E.³ Schmidt, G.⁴

11
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- Norwell, MA: Kluwer
- D. L. Wang, P. Divenyi, Ed., “On ideal binary mask as the computational goal of auditory scene analysis,” in Speech Separation by Humans and Machines. Norwell, MA: Kluwer, 2005, pp. 181–197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹ Divenyi, P.²

12
- 49249107353
- Segregation of unvoiced speech from non-speech interference
- G. Hu and D. L. Wang, “Segregation of unvoiced speech from non-speech interference,” J. Acoust. Soc. Amer., vol. 124, pp. 1306–1319, 2008.
- (2008) J. Acoust. Soc. Amer. , vol.124 , pp. 1306-1319
- Hu, G.¹ Wang, D.L.²

13
- 4644265990
- Monaural speech segregation based on pitch tracking and amplitude modulation
- Sep.
- G. Hu and D. L. Wang, “Monaural speech segregation based on pitch tracking and amplitude modulation,” IEEE Trans. Neural Netw., vol. 15, no. 5, pp. 1135–1150, Sep. 2004.
- (2004) IEEE Trans. Neural Netw. , vol.15 , Issue.5 , pp. 1135-1150
- Hu, G.¹ Wang, D.L.²

14
- 33745190137
- Ph.D. dissertation, Media Lab., Mass. Inst. Technol., Cambridge, MA
- Y. E. Kim, “Singing voice analysis/synthesis” Ph.D. dissertation, Media Lab., Mass. Inst. Technol., Cambridge, MA, 2003.
- (2003) Singing voice analysis/synthesis
- Kim, Y.E.¹

15
- 34547508425
- Automatic synchronization between lyrics and music CD recordings based on viterbi alignment of segregated vocal signals
- H. Fujihara, M. Goto, O. Jun, K. Komatani, T. Ogata, and H. G. Okuno, “Automatic synchronization between lyrics and music CD recordings based on viterbi alignment of segregated vocal signals,” in Proc. IEEE Int. Symp. Multimedia (ISM 2006), 2006, pp. 257–264.
- (2006) Proc. IEEE Int. Symp. Multimedia (ISM 2006) , pp. 257-264
- Fujihara, H.¹ Goto, M.² Jun, O.³ Komatani, K.⁴ Ogata, T.⁵ Okuno, H.G.⁶

16
- 51449099173
- Three techniques for improving automatic synchronization between music and lyrics: Fricative sound detection, filler model, and novel feature vectors for vocal activity detection
- Las Vegas, NV, Mar.–Apr.
- H. Fujihara and M. Goto, “Three techniques for improving automatic synchronization between music and lyrics: Fricative sound detection, filler model, and novel feature vectors for vocal activity detection,” in Proc. 2008 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP'08), Las Vegas, NV, Mar.–Apr. 2008, pp. 69–72.
- (2008) Proc. 2008 IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP'08) , pp. 69-72
- Fujihara, H.¹ Goto, M.²

17
- 54049086684
- Accompaniment separation and karaoke application based on automatic melody transcription
- Hannover, Germany, Jun.
- M. Ryynanen, T. Virtanen, J. Paulus, and A. Klapuri, “Accompaniment separation and karaoke application based on automatic melody transcription,” in Proc. 2008 IEEE Int. Conf. Multimedia Expo (ICME'08), Hannover, Germany, Jun. 2008, pp. 1417–1420.
- (2008) Proc. 2008 IEEE Int. Conf. Multimedia Expo (ICME'08) , pp. 1417-1420
- Ryynanen, M.¹ Virtanen, T.² Paulus, J.³ Klapuri, A.⁴

18
- 33745220723
- On designing and evaluating speech event detectors
- Lisbon, Portugal, Sep.
- J. Li and C.-H. Lee, “On designing and evaluating speech event detectors,” in Proc. Inter Speech, Lisbon, Portugal, Sep. 2005, pp. 3365–3368.
- (2005) Proc. Inter Speech , pp. 3365-3368
- Li, J.¹ Lee, C.-H.²

19
- 84866491801
- An auditory streaming approach on melody extraction
- Victoria, BC, Canada, Sept.
- K. Dressier, “An auditory streaming approach on melody extraction,” in Extended Abstract for ISMIR 2006, Victoria, BC, Canada, Sept. 8–12, 2006.
- (2006) Extended Abstract for ISMIR 2006 , pp. 8-12
- Dressier, K.¹

20
- 84872702570
- Sinusoidal extraction using an efficient implementation of a multi-resolution FFT
- Montreal, Quebec, Canada, Sep. 18–20
- K. Dressier, “Sinusoidal extraction using an efficient implementation of a multi-resolution FFT,” in Proc. Int. Conf Digital Audio Effects (DAFx-06), Montreal, Quebec, Canada, Sep. 18–20, 2006, pp. 247–252.
- (2006) Proc. Int. Conf Digital Audio Effects (DAFx-06) , pp. 247-252
- Dressier, K.¹

21
- 0029726517
- Speech enhancement based on a priori signal to noise estimation
- Atlanta, GA, May
- P. Scalart and J. Vieira-Filho, “Speech enhancement based on a priori signal to noise estimation,” in Proc. 21st IEEE Int. Conf. Acoust. Speech Signal Process., Atlanta, GA, May 1996, pp. 629–632.
- (1996) Proc. 21st IEEE Int. Conf. Acoust. Speech Signal Process. , pp. 629-632
- Scalart, P.¹ Vieira-Filho, J.²

22
- 85008519458
- [Online]. Available: http://dea.brunel.ac.uk/cmsp/Home_Esfandiar/Sample wave Files.htm 2005
- E. Zavarehei, Sample Speech Enhancement Methods, [Online]. Available: http://dea.brunel.ac.uk/cmsp/Home_Esfandiar/Sample wave Files.htm 2005
- Sample Speech Enhancement Methods
- Zavarehei, E.¹

23
- 0003982501
- A theory and computational model of auditory monaural sound separation
- Ph.D. dissertation, Dept. Elect. Eng., Stanford Univ., Stanford, CA
- M. Weintraub, “A theory and computational model of auditory monaural sound separation,” Ph.D. dissertation, Dept. Elect. Eng., Stanford Univ., Stanford, CA, 1985.
- (1985)
- Weintraub, M.¹

24
- 0004129646
- Cambridge, MA: MIT Press
- K. N. Stevens, Acoustic Phonetics. Cambridge, MA: MIT Press, 1998.
- (1998) Acoustic Phonetics
- Stevens, K.N.¹

25
- 84968756938
- Phonetic and phonological background of Chinese spoken language
- Singapore
- C.-C. Kuo, C.-H. Lee, H. Li, L.-S. Lee, R.-H. Wang, and Q. Huo, Eds., “Phonetic and phonological background of Chinese spoken language,” in Proc. Chinese Spoken Lang. Process., Singapore, 2007, pp. 33–55.
- (2007) Proc. Chinese Spoken Lang. Process. , pp. 33-55
- Kuo, C.-C.¹ Lee, C.-H.² Li, H.³ Lee, L.-S.⁴ Wang, R.-H.⁵ Huo, Q.⁶

26
- 0020102027
- Least squares quantization in PCM
- Mar.
- S. P. Lloyd “Least squares quantization in PCM,” IEEE Trans. Inf. Theory, vol. IT-28, no. 2, pp. 129–137, Mar. 1982.
- (1982) IEEE Trans. Inf. Theory , vol.IT-28 , Issue.2 , pp. 129-137
- Lloyd, S.P.¹

27
- 0002629270
- Maximum likelihood from incomplete data via the EM algorithm
- A. P. Dempster, N. M. Laird, and D. B. Rubin “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc., vol. 39, pp. 1–38, 1977.
- (1977) J. R. Statist. Soc. , vol.39 , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

28
- 0024610919
- A tutorial on hidden Markov models and selected application in speech recognition
- Feb.
- L. R. Rabiner, “A tutorial on hidden Markov models and selected application in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, Feb. 1989.
- (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.R.¹

29
- 0348196088
- Proposals for performance measurement in source separation
- Nara, Apr.
- R. Gribonval, L. Benaroya, E. Vincent, and C. Fevotte, “Proposals for performance measurement in source separation,” in Proc. Int. Symp. ICA BSS, Nara, Apr. 2003, pp. 763–768.
- (2003) Proc. Int. Symp. ICA BSS , pp. 763-768
- Gribonval, R.¹ Benaroya, L.² Vincent, E.³ Fevotte, C.⁴

30
- 33745686986
- One microphone singing voice separation using source-adapted models
- New York
- A. Ozerov, P. Philippe, R. Gribonval, and F. Bimbot, “One microphone singing voice separation using source-adapted models,” in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust., New York, 2005, pp. 90–93.
- (2005) Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust. , pp. 90-93
- Ozerov, A.¹ Philippe, P.² Gribonval, R.³ Bimbot, F.⁴

31
- 85101783421
- A SVM-based classification approach to musical audio
- N. C. Maddage, C. Xu, and Y. Wang, “A SVM-based classification approach to musical audio,” in Proc. Int. Conf. Music Inform. Retrieval, 2003.
- (2003) Proc. Int. Conf. Music Inform. Retrieval
- Maddage, N.C.¹ Xu, C.² Wang, Y.³

32
- 13444308866
- Using voice segment to improve artist classification of music
- A. L. Berenzweig, D. P. W. Ellis, and S. Lawrence, “Using voice segment to improve artist classification of music,” in Proc. AES 22nd Int. Conf. Virtual, Synth. Entertainment Audio, 2002.
- (2002) Proc. AES 22nd Int. Conf. Virtual, Synth. Entertainment Audio
- Berenzweig, A.L.¹ Ellis, D.P.W.² Lawrence, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.