메뉴 건너뛰기




Volumn 08-12-September-2016, Issue , 2016, Pages 1642-1646

The USTC system for voice conversion challenge 2016: Neural network based approaches for spectrum, aperiodicity and F0 conversion

Author keywords

DNN; Frequency warping; LSTM; RNN; Voice conversion

Indexed keywords

RECURRENT NEURAL NETWORKS; SPEECH COMMUNICATION;

EID: 84994337398     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: 10.21437/Interspeech.2016-456     Document Type: Conference Paper
Times cited : (7)

References (20)
  • 1
    • 84865698185 scopus 로고    scopus 로고
    • Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
    • T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 9, pp. 2505-2517, 2012.
    • (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.9 , pp. 2505-2517
    • Toda, T.1    Nakagiri, M.2    Shikano, K.3
  • 2
    • 67650657780 scopus 로고    scopus 로고
    • Foreign accent conversion in computer assisted pronunciation training
    • D. Felps, H. Bortfeld, and R. Gutierrez-Osuna, "Foreign accent conversion in computer assisted pronunciation training," Speech communication, vol. 51, no. 10, pp. 920-932, 2009.
    • (2009) Speech Communication , vol.51 , Issue.10 , pp. 920-932
    • Felps, D.1    Bortfeld, H.2    Gutierrez-Osuna, R.3
  • 5
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-tospeech synthesis
    • A. Kain and M. Macon, "Spectral voice conversion for text-tospeech synthesis," in Proc. ICASSP, 1998, pp. 285-288.
    • (1998) Proc. ICASSP , pp. 285-288
    • Kain, A.1    Macon, M.2
  • 6
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • nov
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, and Lang. Process, vol. 15, no. 8, pp. 2222-2235, nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, and Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 11
  • 14
    • 27644522706 scopus 로고    scopus 로고
    • Vocal tract normalization equals linear transformation in cepstral space
    • sep
    • M. Pitz and H. Ney, "Vocal tract normalization equals linear transformation in cepstral space," Speech and Audio Processing, IEEE Transactions on, vol. 13, no. 5, pp. 930-944, sep. 2005.
    • (2005) Speech and Audio Processing, IEEE Transactions on , vol.13 , Issue.5 , pp. 930-944
    • Pitz, M.1    Ney, H.2
  • 15
    • 84865785753 scopus 로고    scopus 로고
    • Improved bottleneck features using pretrained deep neural networks
    • D. Yu and M. L. Seltzer, "Improved bottleneck features using pretrained deep neural networks." Interspeech, pp. 237-240, 2011.
    • (2011) Interspeech , pp. 237-240
    • Yu, D.1    Seltzer, M.L.2
  • 16
    • 84959118000 scopus 로고    scopus 로고
    • The fisher corpus: A resource for the next generations of speech-to-text
    • C. Cieri, D. Miller, and K. Walker, "The fisher corpus: a resource for the next generations of speech-to-text." in LREC, vol. 4, 2004, pp. 69-71.
    • (2004) LREC , vol.4 , pp. 69-71
    • Cieri, C.1    Miller, D.2    Walker, K.3
  • 17
    • 84906257669 scopus 로고    scopus 로고
    • Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression
    • H. Siln, J. Nurminen, E. Helander, and M. Gabbouj, "Voice conversion for non-parallel datasets using dynamic kernel partial least squares regression," in Interspeech, 2013.
    • (2013) Interspeech
    • Siln, H.1    Nurminen, J.2    Helander, E.3    Gabbouj, M.4
  • 19
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, vol. 27, no. 3, pp. 187-208, 1999.
    • (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-208
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.