메뉴 건너뛰기




Volumn 22, Issue 10, 2014, Pages 1506-1521

Exemplar-based sparse representation with residual compensation for voice conversion

Author keywords

Exemplar; Nonnegative matrix factorization; Residual compensation; Sparse representation; Voice conversion

Indexed keywords

MATRIX ALGEBRA; MAXIMUM LIKELIHOOD; SPEECH PROCESSING;

EID: 84911369131     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2014.2333242     Document Type: Article
Times cited : (188)

References (54)
  • 1
    • 70350125882 scopus 로고    scopus 로고
    • An overview of text-independent speaker recognition: From features to supervectors
    • T. Kinnunen and H. Li, "An overview of text-independent speaker recognition: From features to supervectors," Speech Commun., vol. 52, no. 1, pp. 12-40, 2010.
    • (2010) Speech Commun , vol.52 , Issue.1 , pp. 12-40
    • Kinnunen, T.1    Li, H.2
  • 6
    • 84867591125 scopus 로고    scopus 로고
    • Stereo-based stochastic mapping with context using probabilistic PCA for noise robust automatic speech recognition
    • X. Cui, M. Afify, and B. Zhou, "Stereo-based stochastic mapping with context using probabilistic PCA for noise robust automatic speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012, pp. 4705-4708.
    • Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2012 , pp. 4705-4708
    • Cui, X.1    Afify, M.2    Zhou, B.3
  • 7
    • 38649140222 scopus 로고    scopus 로고
    • Statistical mapping between articulatory movements and acoustic spectrum using a gaussian mixture model
    • T. Toda, A. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a gaussian mixture model," Speech Commun., vol. 50, no. 3, pp. 215-227, 2008.
    • (2008) Speech Commun , vol.50 , Issue.3 , pp. 215-227
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 11
    • 0032026483 scopus 로고    scopus 로고
    • Continuous probabilistic transform for voice conversion
    • Mar.
    • Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
    • (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
    • Stylianou, Y.1    Cappé, O.2    Moulines, E.3
  • 12
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Nov.
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 13
    • 78149260085 scopus 로고    scopus 로고
    • Continuous stochastic feature mapping based on trajectory HMMs
    • Feb.
    • H. Zen, Y. Nankaku, and K. Tokuda, "Continuous stochastic feature mapping based on trajectory HMMs," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 2, pp. 417-430, Feb. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.2 , pp. 417-430
    • Zen, H.1    Nankaku, Y.2    Tokuda, K.3
  • 14
    • 84911386426 scopus 로고    scopus 로고
    • Perceptually weighted linear transformations for voice conversion
    • H. Ye and S. Young, "Perceptually weighted linear transformations for voice conversion," in Proc. Interspeech, 2003.
    • Proc. Interspeech, 2003
    • Ye, H.1    Young, S.2
  • 15
    • 34047254509 scopus 로고    scopus 로고
    • Quality-enhanced voice morphing using maximum likelihood transformations
    • Jul.
    • H. Ye and S. Young, "Quality-enhanced voice morphing using maximum likelihood transformations," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1301-1312, Jul. 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.4 , pp. 1301-1312
    • Ye, H.1    Young, S.2
  • 18
    • 0029254176 scopus 로고
    • Transformation of formants for voice conversion using artificial neural networks
    • M. Narendranath, H. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of formants for voice conversion using artificial neural networks," Speech Commun., vol. 16, no. 2, pp. 207-216, 1995.
    • (1995) Speech Commun , vol.16 , Issue.2 , pp. 207-216
    • Narendranath, M.1    Murthy, H.2    Rajendran, S.3    Yegnanarayana, B.4
  • 20
    • 80053068819 scopus 로고    scopus 로고
    • Voice conversion using support vector regression
    • P. Song, Y. Bao, L. Zhao, and C. Zou, "Voice conversion using support vector regression," Electron. Lett., vol. 47, no. 18, pp. 1045-1046, 2011.
    • (2011) Electron. Lett. , vol.47 , Issue.18 , pp. 1045-1046
    • Song, P.1    Bao, Y.2    Zhao, L.3    Zou, C.4
  • 22
    • 84906225084 scopus 로고    scopus 로고
    • Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion
    • L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion," in Proc. Interspeech, 2013.
    • Proc. Interspeech, 2013
    • Chen, L.-H.1    Ling, Z.-H.2    Song, Y.3    Dai, L.-R.4
  • 25
  • 26
    • 84857498745 scopus 로고    scopus 로고
    • Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora
    • May
    • E. Godoy, O. Rosec, and T. Chonavel, "Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 4, pp. 1313-1323, May 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.4 , pp. 1313-1323
    • Godoy, E.1    Rosec, O.2    Chonavel, T.3
  • 27
    • 84872177757 scopus 로고    scopus 로고
    • Parametric voice conversion based on bilinear frequency warping plus amplitude scaling
    • Mar.
    • D. Erro, E. Navas, and I. Hernaez, "Parametric voice conversion based on bilinear frequency warping plus amplitude scaling," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 3, pp. 556-566, Mar. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.3 , pp. 556-566
    • Erro, D.1    Navas, E.2    Hernaez, I.3
  • 29
    • 84898964201 scopus 로고    scopus 로고
    • Algorithms for non-negative matrix factorization
    • D. Seung and L. Lee, "Algorithms for non-negative matrix factorization," Adv. Neural Inf. Process. Syst., vol. 13, pp. 556-562, 2001.
    • (2001) Adv. Neural Inf. Process. Syst. , vol.13 , pp. 556-562
    • Seung, D.1    Lee, L.2
  • 30
    • 79960657803 scopus 로고    scopus 로고
    • Exemplar-based sparse representations for noise robust automatic speech recognition
    • Sep.
    • J. Gemmeke, T. Virtanen, and A. Hurmalainen, "Exemplar-based sparse representations for noise robust automatic speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 7, pp. 2067-2080, Sep. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.7 , pp. 2067-2080
    • Gemmeke, J.1    Virtanen, T.2    Hurmalainen, A.3
  • 31
    • 80051620372 scopus 로고    scopus 로고
    • Non-negative matrix deconvolution in noise robust speech recognition
    • A. Hurmalainen, J. Gemmeke, and T. Virtanen, "Non-negative matrix deconvolution in noise robust speech recognition," in Proc. ICASSP, 2011, pp. 4588-4591.
    • Proc. ICASSP, 2011 , pp. 4588-4591
    • Hurmalainen, A.1    Gemmeke, J.2    Virtanen, T.3
  • 34
    • 77953725318 scopus 로고    scopus 로고
    • INCA algorithm for training voice conversion systems from nonparallel corpora
    • D. Erro, A. Moreno, and A. Bonafonte, "INCA algorithm for training voice conversion systems from nonparallel corpora," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 5, pp. 944-953, 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.5 , pp. 944-953
    • Erro, D.1    Moreno, A.2    Bonafonte, A.3
  • 39
    • 77953728395 scopus 로고    scopus 로고
    • Measuring the gap between HMM-based ASR and TTS
    • Aug.
    • J. Dines, J. Yamagishi, and S. King, "Measuring the gap between HMM-based ASR and TTS," IEEE J. Sel. Topics Signal Process., vol. 4, no. 6, pp. 1046-1058, Aug. 2010.
    • (2010) IEEE J. Sel. Topics Signal Process. , vol.4 , Issue.6 , pp. 1046-1058
    • Dines, J.1    Yamagishi, J.2    King, S.3
  • 40
    • 64849096680 scopus 로고    scopus 로고
    • Unsupervised learning methods for source separation in monaural music signals
    • A. Klapuri and M. Davy, Eds. New York, NY, USA: Springer
    • T. Virtanen, "Unsupervised learning methods for source separation in monaural music signals," in Signal Processing Methods for Music Transcription, A. Klapuri and M. Davy, Eds. New York, NY, USA: Springer, 2006, pp. 267-296.
    • (2006) Signal Processing Methods for Music Transcription , pp. 267-296
    • Virtanen, T.1
  • 42
    • 84863766226 scopus 로고    scopus 로고
    • Bandwidth expansion of narrow-band speech using non-negative matrix factorization
    • D. Bansal, B. Raj, and P. Smaragdis, "Bandwidth expansion of narrow-band speech using non-negative matrix factorization," in Proc. Interspeech, 2005.
    • Proc. Interspeech, 2005
    • Bansal, D.1    Raj, B.2    Smaragdis, P.3
  • 43
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3, pp. 187-207, 1999.
    • (1999) Speech Commun , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 45
    • 4444285698 scopus 로고    scopus 로고
    • Ph.D. dissertation, OGI School of Sci. & Eng., Oregon Health and Science Univ., Beaverton, OR, USA
    • A. B. Kain, "High resolution voice transformation," Ph.D. dissertation, OGI School of Sci. & Eng., Oregon Health and Science Univ., Beaverton, OR, USA, 2001.
    • (2001) High Resolution Voice Transformation
    • Kain, A.B.1
  • 46
    • 84878390910 scopus 로고    scopus 로고
    • Implementation of computationally efficient real-time voice conversion
    • T. Toda, T. Muramatsu, and H. Banno, "Implementation of computationally efficient real-time voice conversion," in Proc. Interspeech, 2012.
    • Proc. Interspeech, 2012
    • Toda, T.1    Muramatsu, T.2    Banno, H.3
  • 49
    • 84865713971 scopus 로고    scopus 로고
    • Crowdsourcing preference tests, and how to detect cheating
    • S. Buchholz and J. Latorre, "Crowdsourcing preference tests, and how to detect cheating," in Proc. Interspeech, 2011.
    • Proc. Interspeech, 2011
    • Buchholz, S.1    Latorre, J.2
  • 53
    • 0033592606 scopus 로고    scopus 로고
    • Learning the parts of objects by non-negative matrix factorization
    • D. D. Lee and H. S. Seung, "Learning the parts of objects by non-negative matrix factorization," Nature, vol. 401, no. 6755, pp. 788-791, 1999.
    • (1999) Nature , vol.401 , Issue.6755 , pp. 788-791
    • Lee, D.D.1    Seung, H.S.2
  • 54
    • 80052600921 scopus 로고    scopus 로고
    • Large margin based nonnegative matrix factorization and partial least squares regression for face recognition
    • J.-Y. Pan and J.-S. Zhang, "Large margin based nonnegative matrix factorization and partial least squares regression for face recognition," Pattern Recogn. Lett., vol. 32, no. 14, pp. 1822-1835, 2011.
    • (2011) Pattern Recogn. Lett. , vol.32 , Issue.14 , pp. 1822-1835
    • Pan, J.-Y.1    Zhang, J.-S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.