메뉴 건너뛰기




Volumn 67, Issue , 2015, Pages 113-128

Voice conversion based on feature combination with limited training data

Author keywords

Dynamic kernel partial least square regression (DKPLS); Feature combination; Gaussian mixture models (GMM); Voice conversion

Indexed keywords

LEAST SQUARES APPROXIMATIONS; POLES; QUALITY CONTROL; SPEECH RECOGNITION;

EID: 84919915933     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2014.12.004     Document Type: Article
Times cited : (20)

References (39)
  • 3
    • 0033154052 scopus 로고    scopus 로고
    • Speaker transformation algorithm using segmental codebooks (STASC)
    • Arslan, L.M., 1999. Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun. 28 (3), 211-226.
    • (1999) Speech Commun. , vol.28 , Issue.3 , pp. 211-226
    • Arslan, L.M.1
  • 4
    • 0027530250 scopus 로고
    • SIMPLS: An alternative approach to partial least squares regression
    • de Jong, S., 1993. SIMPLS: an alternative approach to partial least squares regression. Chemometr. Intell. Lab. Syst. 18 (3), 251-263.
    • (1993) Chemometr. Intell. Lab. Syst. , vol.18 , Issue.3 , pp. 251-263
    • De Jong, S.1
  • 8
    • 84872177757 scopus 로고    scopus 로고
    • Parametric voice conversion based on bilinear frequency warping plus amplitude scaling
    • Erro, D., Navas, E., Hernaez, I., 2013. Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio, Speech Lang. Process. 21 (3), 556-566.
    • (2013) IEEE Trans. Audio, Speech Lang. Process. , vol.21 , Issue.3 , pp. 556-566
    • Erro, D.1    Navas, E.2    Hernaez, I.3
  • 9
    • 85161148381 scopus 로고    scopus 로고
    • The elements of statistical learning: Data mining, inference and prediction
    • Hastie, T., Tibshirani, R., Friedman, J., Franklin, J., 2005. The elements of statistical learning: data mining, inference and prediction. Math. Intell. 27 (2), 83-85.
    • (2005) Math. Intell. , vol.27 , Issue.2 , pp. 83-85
    • Hastie, T.1    Tibshirani, R.2    Friedman, J.3    Franklin, J.4
  • 13
    • 0001810975 scopus 로고
    • Line spectrum representation of linear predictor coefficients of speech signals
    • Itakura, F., 1975. Line spectrum representation of linear predictor coefficients of speech signals. J. Acoust. Soc. Am. 57 (S1), S35-S35.
    • (1975) J. Acoust. Soc. Am. , vol.57 , Issue.S1 , pp. S35-S35
    • Itakura, F.1
  • 15
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A., 1999. Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27 (3), 187-207.
    • (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigne, A.3
  • 17
    • 44949210554 scopus 로고    scopus 로고
    • Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
    • Lee, C.H., Wu, C.H., 2006. Map-based adaptation for speech conversion using adaptation data selection and non-parallel training. In: Proc. Interspeech, 2006.
    • (2006) Proc. Interspeech, 2006
    • Lee, C.H.1    Wu, C.H.2
  • 19
    • 44949143155 scopus 로고    scopus 로고
    • Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
    • Ohtani, Y., Toda, T., Saruwatari, H., Shikano, K., 2006. Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. In: Proc. Interspeech 2006, pp. 2266-2269.
    • (2006) Proc. Interspeech 2006 , pp. 2266-2269
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 27
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • Toda, T., Black, A.W., Tokuda, K., 2007. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech Lang. Process. 15 (8), 2222-2235.
    • (2007) IEEE Trans. Audio, Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 28
    • 84865698185 scopus 로고    scopus 로고
    • Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
    • Toda, T., Nakagiri, M., Shikano, K., 2012. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Trans. Audio, Speech Lang. Process. 20 (9), 2505-2517.
    • (2012) IEEE Trans. Audio, Speech Lang. Process. , vol.20 , Issue.9 , pp. 2505-2517
    • Toda, T.1    Nakagiri, M.2    Shikano, K.3
  • 29
    • 85131821539 scopus 로고
    • Mel-generalized cepstral analysis - A unified approach to speech spectral estimation
    • Tokuda, K., Kobayashi, T., Masuko, T., Imai, S., 1994. Mel-generalized cepstral analysis-a unified approach to speech spectral estimation. In: ICSLP, vol. 3, pp. 1043-1046.
    • (1994) ICSLP , vol.3 , pp. 1043-1046
    • Tokuda, K.1    Kobayashi, T.2    Masuko, T.3    Imai, S.4
  • 30
    • 84905242025 scopus 로고
    • Recursive calculation of melcepstrum from LP coefficients
    • Tokuda, K., Kobayashi, T., Imai, S., 1994. Recursive calculation of melcepstrum from LP coefficients. Trans. IEICE 71, 128-131.
    • (1994) Trans. IEICE , vol.71 , pp. 128-131
    • Tokuda, K.1    Kobayashi, T.2    Imai, S.3
  • 33
    • 34047247202 scopus 로고    scopus 로고
    • Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
    • Wu, C.H., Hsia, C.C., Liu, T.H., Wang, J.F., 2006. Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans. Audio, Speech Lang. Process. 14 (4), 1109-1116.
    • (2006) IEEE Trans. Audio, Speech Lang. Process. , vol.14 , Issue.4 , pp. 1109-1116
    • Wu, C.H.1    Hsia, C.C.2    Liu, T.H.3    Wang, J.F.4
  • 37
    • 34047254509 scopus 로고    scopus 로고
    • Quality-enhanced voice morphing using maximum likelihood transformations
    • 14.4
    • Ye, H., Young, S., 2006. Quality-enhanced voice morphing using maximum likelihood transformations. IEEE Trans. Audio Speech Lang. Process. 14.4, 1301-1312.
    • (2006) IEEE Trans. Audio Speech Lang. Process. , pp. 1301-1312
    • Ye, H.1    Young, S.2
  • 38
    • 68249104241 scopus 로고    scopus 로고
    • The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge
    • 91.6
    • Zen, H., Toda, T., Tokuda, K., 2008. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge. IEICE Trans. Inform. Syst. 91.6, 1764-1773.
    • (2008) IEICE Trans. Inform. Syst. , pp. 1764-1773
    • Zen, H.1    Toda, T.2    Tokuda, K.3
  • 39
    • 78149260085 scopus 로고    scopus 로고
    • Continuous stochastic feature mapping based on trajectory HMMs
    • Zen, H., Nankaku, Y., Tokuda, K., 2011. Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans. Audio, Speech Lang. Process. 19 (2), 417-430.
    • (2011) IEEE Trans. Audio, Speech Lang. Process. , vol.19 , Issue.2 , pp. 417-430
    • Zen, H.1    Nankaku, Y.2    Tokuda, K.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.