메뉴 건너뛰기




Volumn 08-12-September-2016, Issue , 2016, Pages 1632-1636

The voice conversion challenge 2016

Author keywords

Evaluation challenge; Speech synthesis; Voice conversion

Indexed keywords

SPEECH COMMUNICATION; SPEECH SYNTHESIS;

EID: 84994361374     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: 10.21437/Interspeech.2016-1066     Document Type: Conference Paper
Times cited : (211)

References (40)
  • 4
    • 58149203393 scopus 로고    scopus 로고
    • Data-driven emotion conversion in spoken english
    • Z. Inanoglu and S. Young, "Data-driven emotion conversion in spoken english," Speech Communication, vol. 51, no. 3, pp. 268-283, 2009.
    • (2009) Speech Communication , vol.51 , Issue.3 , pp. 268-283
    • Inanoglu, Z.1    Young, S.2
  • 5
    • 77953699443 scopus 로고    scopus 로고
    • Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques
    • O. Türk and M. Schröder, "Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 18, no. 5, pp. 965-973, 2010.
    • (2010) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.18 , Issue.5 , pp. 965-973
    • Türk, O.1    Schröder, M.2
  • 6
    • 79959827418 scopus 로고    scopus 로고
    • Applying voice conversion to concatenative singing-voice synthesis
    • F. Villavicencio and J. Bonada, "Applying voice conversion to concatenative singing-voice synthesis," in Proc. INTERSPEECH, 2010, pp. 2162-2165.
    • (2010) Proc. INTERSPEECH , pp. 2162-2165
    • Villavicencio, F.1    Bonada, J.2
  • 8
    • 0038383054 scopus 로고    scopus 로고
    • On artificial bandwidth extension of telephone speech
    • P. Jax and P. Vary, "On artificial bandwidth extension of telephone speech," Signal Processing, vol. 83, no. 8, pp. 1707-1719, 2003.
    • (2003) Signal Processing , vol.83 , Issue.8 , pp. 1707-1719
    • Jax, P.1    Vary, P.2
  • 9
    • 84865698185 scopus 로고    scopus 로고
    • Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
    • T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 9, pp. 2505-2517, 2012.
    • (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.9 , pp. 2505-2517
    • Toda, T.1    Nakagiri, M.2    Shikano, K.3
  • 10
    • 67650657780 scopus 로고    scopus 로고
    • Foreign accent conversion in computer assisted pronunciation training
    • D. Felps, H. Bortfeld, and R. Gutierrez-Osuna, "Foreign accent conversion in computer assisted pronunciation training," Speech Communication, vol. 51, no. 10, pp. 920-932, 2009.
    • (2009) Speech Communication , vol.51 , Issue.10 , pp. 920-932
    • Felps, D.1    Bortfeld, H.2    Gutierrez-Osuna, R.3
  • 12
    • 0025892924 scopus 로고
    • Statistical analysis of bilingual speaker's speech for cross-language voice conversion
    • M. Abe, K. Shikano, and H. Kuwabara, "Statistical analysis of bilingual speaker's speech for cross-language voice conversion," The Journal of the Acoustical Society of America, vol. 90, no. 1, pp. 76-82, 1991.
    • (1991) The Journal of the Acoustical Society of America , vol.90 , Issue.1 , pp. 76-82
    • Abe, M.1    Shikano, K.2    Kuwabara, H.3
  • 13
    • 0026880275 scopus 로고
    • Voice transformation using psola technique
    • H. Valbret, E. Moulines, and J. P. Tubach, "Voice transformation using psola technique," Speech Communication, vol. 11, no. 2-3, pp. 175-187, 1992.
    • (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 175-187
    • Valbret, H.1    Moulines, E.2    Tubach, J.P.3
  • 14
    • 33745216749 scopus 로고    scopus 로고
    • The Blizzard Challenge-2005: Evaluating corpus-based speech synthesis on common datasets
    • A. W. Black and K. Tokuda, "The Blizzard Challenge-2005: evaluating corpus-based speech synthesis on common datasets," in Proc. INTERSPEECH, 2005, pp. 77-80.
    • (2005) Proc. INTERSPEECH , pp. 77-80
    • Black, A.W.1    Tokuda, K.2
  • 15
    • 0035127703 scopus 로고    scopus 로고
    • Applying the harmonic plus noise model in concatenative speech synthesis
    • Y. Stylianou, "Applying the harmonic plus noise model in concatenative speech synthesis," Speech and Audio Processing, IEEE Transactions on, vol. 9, no. 1, pp. 21-29, 2001.
    • (2001) Speech and Audio Processing, IEEE Transactions on , vol.9 , Issue.1 , pp. 21-29
    • Stylianou, Y.1
  • 16
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds," Speech Communication, vol. 27, no. 3-4, pp. 187-207, 1999.
    • (1999) Speech Communication , vol.27 , Issue.3-4 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 18
    • 85131821539 scopus 로고
    • Melgeneralized cepstral analysis-a unified approach to speech spectral estimation
    • K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Melgeneralized cepstral analysis-a unified approach to speech spectral estimation," in Proc. ICSLP, 1994, pp. 1043-1045.
    • (1994) Proc. ICSLP , pp. 1043-1045
    • Tokuda, K.1    Kobayashi, T.2    Masuko, T.3    Imai, S.4
  • 20
    • 44949143155 scopus 로고    scopus 로고
    • Maximum likelihood voice conversion based on gmm with straight mixed excitation
    • Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on gmm with straight mixed excitation," in Proc. INTERSPEECH, 2006, pp. 2266-2269.
    • (2006) Proc. INTERSPEECH , pp. 2266-2269
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 21
    • 0034841948 scopus 로고    scopus 로고
    • Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction
    • A. Kain and M.W. Macon, "Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction," in Proc. ICASSP, 2001, pp. 813-816.
    • (2001) Proc. ICASSP , pp. 813-816
    • Kain, A.1    Macon, M.W.2
  • 22
    • 34047254509 scopus 로고    scopus 로고
    • Quality-enhanced voice morphing using maximum likelihood transformations
    • H. Ye and S. Young, "Quality-enhanced voice morphing using maximum likelihood transformations," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, no. 4, pp. 1301-1312, 2006.
    • (2006) Audio, Speech, and Language Processing, IEEE Transactions on , vol.14 , Issue.4 , pp. 1301-1312
    • Ye, H.1    Young, S.2
  • 23
    • 85009212516 scopus 로고    scopus 로고
    • Transforming F0 contours
    • B. Gillett and S. King, "Transforming F0 contours," in Proc. INTERSPEECH, 2003, pp. 101-104.
    • (2003) Proc. INTERSPEECH , pp. 101-104
    • Gillett, B.1    King, S.2
  • 25
    • 84867199771 scopus 로고    scopus 로고
    • Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching
    • K. Yutani, Y. Uto, Y. Nankaku, T. Toda, and K. Tokuda, "Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching," in Proc. INTERSPEECH, 2008, pp. 1072-1075.
    • (2008) Proc. INTERSPEECH , pp. 1072-1075
    • Yutani, K.1    Uto, Y.2    Nankaku, Y.3    Toda, T.4    Tokuda, K.5
  • 27
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • T. Toda, A.W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, no. 8, pp. 2222-2235, 2007.
    • (2007) Audio, Speech, and Language Processing, IEEE Transactions on , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 28
    • 84901766069 scopus 로고    scopus 로고
    • Voice conversion based on speaker-dependent restricted boltzmann machines
    • T. Nakashika, T. Takiguchi, and Y. Ariki, "Voice conversion based on speaker-dependent restricted boltzmann machines," Information and Systems, IEICE Transactions on, vol. E67-D, no. 6, pp. 1403-1410, 2014.
    • (2014) Information and Systems, IEICE Transactions on , vol.E67-D , Issue.6 , pp. 1403-1410
    • Nakashika, T.1    Takiguchi, T.2    Ariki, Y.3
  • 30
    • 84865737668 scopus 로고    scopus 로고
    • Gaussian process experts for voice conversion
    • N. Pilkington, H. Zen, and M. Gales, "Gaussian process experts for voice conversion," in Proc. INTERSPEECH, 2011, pp. 2761-2764.
    • (2011) Proc. INTERSPEECH , pp. 2761-2764
    • Pilkington, N.1    Zen, H.2    Gales, M.3
  • 31
    • 84890539284 scopus 로고    scopus 로고
    • Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data
    • N. Xu, Y. Tang, J. Bao, A. Jiang, X. Liu, and Z. Yang, "Voice conversion based on gaussian processes by coherent and asymmetric training with limited training data," Speech Communication, vol. 58, pp. 124-138, 2014.
    • (2014) Speech Communication , vol.58 , pp. 124-138
    • Xu, N.1    Tang, Y.2    Bao, J.3    Jiang, A.4    Liu, X.5    Yang, Z.6
  • 33
    • 84885055553 scopus 로고    scopus 로고
    • Exemplar-based voice conversion using sparse representation in noisy environments
    • R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplar-based voice conversion using sparse representation in noisy environments," Information and Systems, IEICE Transactions on, vol. E96-A, no. 10, pp. 1946-1953, 2013.
    • (2013) Information and Systems, IEICE Transactions on , vol.E96-A , Issue.10 , pp. 1946-1953
    • Takashima, R.1    Takiguchi, T.2    Ariki, Y.3
  • 35
    • 84946027999 scopus 로고    scopus 로고
    • Voice conversion using deep bidirectional long short-term memory based recurrent neural networks
    • L. Sun, S. Kang, K. Li, and H. Meng, "Voice conversion using deep bidirectional long short-term memory based recurrent neural networks," in Proc. ICASSP, 2015, pp. 4869-4873.
    • (2015) Proc. ICASSP , pp. 4869-4873
    • Sun, L.1    Kang, S.2    Li, K.3    Meng, H.4
  • 38
    • 84919935005 scopus 로고    scopus 로고
    • Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech?-a dataset, insights, and challenges
    • Aug
    • G. J. Mysore, "Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech?-a dataset, insights, and challenges," IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1006-1010, Aug 2015.
    • (2015) IEEE Signal Processing Letters , vol.22 , Issue.8 , pp. 1006-1010
    • Mysore, G.J.1
  • 39
    • 84994351528 scopus 로고    scopus 로고
    • Analysis of the Voice Conversion Challenge 2016 evaluation results
    • M. Wester, Z. Wu, and J. Yamagishi, "Analysis of the Voice Conversion Challenge 2016 evaluation results," in (submitted to) Interspeech, 2016.
    • (2016) (Submitted To) Interspeech
    • Wester, M.1    Wu, Z.2    Yamagishi, J.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.