SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 1632-1636

The voice conversion challenge 2016

(7) Toda, Tomoki a Chen, Ling Hui b Saito, Daisuke c Villavicencio, Fernando d Wester, Mirjam e Wu, Zhizheng e Yamagishi, Junichi d,e

a NAGOYA UNIVERSITY (Japan)

b UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

c UNIVERSITY OF TOKYO (Japan)

d NATIONAL INSTITUTE OF INFORMATICS (Japan)

e UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Evaluation challenge; Speech synthesis; Voice conversion

Indexed keywords

SPEECH COMMUNICATION; SPEECH SYNTHESIS;

CONTROLLED ENVIRONMENT; EVALUATION CHALLENGE; SPEAKER CONVERSION; TARGET SPEAKER; UNSOLVED PROBLEMS; VOICE CONVERSION; VOICE IDENTITIES;

SPEECH PROCESSING;

EID: 84994361374 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-1066 Document Type: Conference Paper

Times cited : (211)

References (40)

1
- 85004448479
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, and K. Shikano, "Voice conversion through vector quantization," The Journal of the Acoustical Society of Japan (E), vol. 11, no. 2, pp. 71-76, 1990.
- (1990) The Journal of the Acoustical Society of Japan (E , vol.11 , Issue.2 , pp. 71-76
- Abe, M.¹ Nakamura, S.² Shikano, K.³

2
- 34447635527
- Improving the intelligibility of dysarthric speech
- A. B. Kain, J. P. Hosom, X. Niu, J. P. H. van Santen, M. Fried-Oken, and J. Staehely, "Improving the intelligibility of dysarthric speech," Speech Communication, vol. 49, no. 9, pp. 743-759, 2007.
- (2007) Speech Communication , vol.49 , Issue.9 , pp. 743-759
- Kain, A.B.¹ Hosom, J.P.² Niu, X.³ Van Santen, J.P.H.⁴ Fried-Oken, M.⁵ Staehely, J.⁶

3
- 84897939966
- Alaryngeal speech enhancement based on one-to-many eigenvoice conversion
- H. Doi, T. Toda, H. Saruwatari, and K. Shikano, "Alaryngeal speech enhancement based on one-to-many eigenvoice conversion," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 1, pp. 172-183, 2014.
- (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.22 , Issue.1 , pp. 172-183
- Doi, H.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

4
- 58149203393
- Data-driven emotion conversion in spoken english
- Z. Inanoglu and S. Young, "Data-driven emotion conversion in spoken english," Speech Communication, vol. 51, no. 3, pp. 268-283, 2009.
- (2009) Speech Communication , vol.51 , Issue.3 , pp. 268-283
- Inanoglu, Z.¹ Young, S.²

5
- 77953699443
- Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques
- O. Türk and M. Schröder, "Evaluation of expressive speech synthesis with voice conversion and copy resynthesis techniques," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 18, no. 5, pp. 965-973, 2010.
- (2010) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.18 , Issue.5 , pp. 965-973
- Türk, O.¹ Schröder, M.²

6
- 79959827418
- Applying voice conversion to concatenative singing-voice synthesis
- F. Villavicencio and J. Bonada, "Applying voice conversion to concatenative singing-voice synthesis," in Proc. INTERSPEECH, 2010, pp. 2162-2165.
- (2010) Proc. INTERSPEECH , pp. 2162-2165
- Villavicencio, F.¹ Bonada, J.²

7
- 84901767453
- Voice timbre control based on perceived age in singing voice conversion
- K. Kobayashi, T. Toda, H. Doi, T. Nakano, M. Goto, G. Neubig, S. Sakti, and S. Nakamura, "Voice timbre control based on perceived age in singing voice conversion," Information and Systems, IEICE Transactions on, vol. E97-D, no. 6, pp. 1419-1428, 2014.
- (2014) Information and Systems, IEICE Transactions on , vol.E97-D , Issue.6 , pp. 1419-1428
- Kobayashi, K.¹ Toda, T.² Doi, H.³ Nakano, T.⁴ Goto, M.⁵ Neubig, G.⁶ Sakti, S.⁷ Nakamura, S.⁸

8
- 0038383054
- On artificial bandwidth extension of telephone speech
- P. Jax and P. Vary, "On artificial bandwidth extension of telephone speech," Signal Processing, vol. 83, no. 8, pp. 1707-1719, 2003.
- (2003) Signal Processing , vol.83 , Issue.8 , pp. 1707-1719
- Jax, P.¹ Vary, P.²

9
- 84865698185
- Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
- T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 9, pp. 2505-2517, 2012.
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.9 , pp. 2505-2517
- Toda, T.¹ Nakagiri, M.² Shikano, K.³

10
- 67650657780
- Foreign accent conversion in computer assisted pronunciation training
- D. Felps, H. Bortfeld, and R. Gutierrez-Osuna, "Foreign accent conversion in computer assisted pronunciation training," Speech Communication, vol. 51, no. 10, pp. 920-932, 2009.
- (2009) Speech Communication , vol.51 , Issue.10 , pp. 920-932
- Felps, D.¹ Bortfeld, H.² Gutierrez-Osuna, R.³

11
- 0023739214
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," in Proc. ICASSP, 1988, pp. 655-658.
- (1988) Proc. ICASSP , pp. 655-658
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

12
- 0025892924
- Statistical analysis of bilingual speaker's speech for cross-language voice conversion
- M. Abe, K. Shikano, and H. Kuwabara, "Statistical analysis of bilingual speaker's speech for cross-language voice conversion," The Journal of the Acoustical Society of America, vol. 90, no. 1, pp. 76-82, 1991.
- (1991) The Journal of the Acoustical Society of America , vol.90 , Issue.1 , pp. 76-82
- Abe, M.¹ Shikano, K.² Kuwabara, H.³

13
- 0026880275
- Voice transformation using psola technique
- H. Valbret, E. Moulines, and J. P. Tubach, "Voice transformation using psola technique," Speech Communication, vol. 11, no. 2-3, pp. 175-187, 1992.
- (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.P.³

14
- 33745216749
- The Blizzard Challenge-2005: Evaluating corpus-based speech synthesis on common datasets
- A. W. Black and K. Tokuda, "The Blizzard Challenge-2005: evaluating corpus-based speech synthesis on common datasets," in Proc. INTERSPEECH, 2005, pp. 77-80.
- (2005) Proc. INTERSPEECH , pp. 77-80
- Black, A.W.¹ Tokuda, K.²

15
- 0035127703
- Applying the harmonic plus noise model in concatenative speech synthesis
- Y. Stylianou, "Applying the harmonic plus noise model in concatenative speech synthesis," Speech and Audio Processing, IEEE Transactions on, vol. 9, no. 1, pp. 21-29, 2001.
- (2001) Speech and Audio Processing, IEEE Transactions on , vol.9 , Issue.1 , pp. 21-29
- Stylianou, Y.¹

16
- 0032673049
- Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based f0 extraction: possible role of a repetitive structure in sounds," Speech Communication, vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

17
- 84885499464
- Optimal quantization of lsp parameters
- F. Soong and B. Juang, "Optimal quantization of lsp parameters," Speech and Audio Processing, IEEE Transactions on, vol. 1, no. 1, pp. 15-24, 1993.
- (1993) Speech and Audio Processing, IEEE Transactions on , vol.1 , Issue.1 , pp. 15-24
- Soong, F.¹ Juang, B.²

18
- 85131821539
- Melgeneralized cepstral analysis-a unified approach to speech spectral estimation
- K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Melgeneralized cepstral analysis-a unified approach to speech spectral estimation," in Proc. ICSLP, 1994, pp. 1043-1045.
- (1994) Proc. ICSLP , pp. 1043-1045
- Tokuda, K.¹ Kobayashi, T.² Masuko, T.³ Imai, S.⁴

19
- 84921735339
- Voice conversion using deep neural networks with layer-wise generative training
- L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, "Voice conversion using deep neural networks with layer-wise generative training," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 12, pp. 1859-1872, 2014.
- (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.22 , Issue.12 , pp. 1859-1872
- Chen, L.-H.¹ Ling, Z.-H.² Liu, L.-J.³ Dai, L.-R.⁴

20
- 44949143155
- Maximum likelihood voice conversion based on gmm with straight mixed excitation
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on gmm with straight mixed excitation," in Proc. INTERSPEECH, 2006, pp. 2266-2269.
- (2006) Proc. INTERSPEECH , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

21
- 0034841948
- Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction
- A. Kain and M.W. Macon, "Design and evaluation of a voice conversion algorithm based on spectral envelope mapping and residual prediction," in Proc. ICASSP, 2001, pp. 813-816.
- (2001) Proc. ICASSP , pp. 813-816
- Kain, A.¹ Macon, M.W.²

22
- 34047254509
- Quality-enhanced voice morphing using maximum likelihood transformations
- H. Ye and S. Young, "Quality-enhanced voice morphing using maximum likelihood transformations," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 14, no. 4, pp. 1301-1312, 2006.
- (2006) Audio, Speech, and Language Processing, IEEE Transactions on , vol.14 , Issue.4 , pp. 1301-1312
- Ye, H.¹ Young, S.²

23
- 85009212516
- Transforming F0 contours
- B. Gillett and S. King, "Transforming F0 contours," in Proc. INTERSPEECH, 2003, pp. 101-104.
- (2003) Proc. INTERSPEECH , pp. 101-104
- Gillett, B.¹ King, S.²

24
- 84928405078
- Generative modeling of voice fundamental frequency contours
- H. Kameoka, K. Yoshizato, T. Ishihara, K. Kadowaki, Y. Ohishi, and K. Kashino, "Generative modeling of voice fundamental frequency contours," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 23, no. 6, pp. 1042-1053, 2015.
- (2015) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.23 , Issue.6 , pp. 1042-1053
- Kameoka, H.¹ Yoshizato, K.² Ishihara, T.³ Kadowaki, K.⁴ Ohishi, Y.⁵ Kashino, K.⁶

25
- 84867199771
- Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching
- K. Yutani, Y. Uto, Y. Nankaku, T. Toda, and K. Tokuda, "Simultaneous conversion of duration and spectrum based on statistical models including time-sequence matching," in Proc. INTERSPEECH, 2008, pp. 1072-1075.
- (2008) Proc. INTERSPEECH , pp. 1072-1075
- Yutani, K.¹ Uto, Y.² Nankaku, Y.³ Toda, T.⁴ Tokuda, K.⁵

26
- 0032026483
- Continuous probabilistic transform for voice conversion
- Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," Speech and Audio Processing, IEEE Transactions on, vol. 6, no. 2, pp. 131-142, 1998.
- (1998) Speech and Audio Processing, IEEE Transactions on , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

27
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A.W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) Audio, Speech, and Language Processing, IEEE Transactions on , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

28
- 84901766069
- Voice conversion based on speaker-dependent restricted boltzmann machines
- T. Nakashika, T. Takiguchi, and Y. Ariki, "Voice conversion based on speaker-dependent restricted boltzmann machines," Information and Systems, IEICE Transactions on, vol. E67-D, no. 6, pp. 1403-1410, 2014.
- (2014) Information and Systems, IEICE Transactions on , vol.E67-D , Issue.6 , pp. 1403-1410
- Nakashika, T.¹ Takiguchi, T.² Ariki, Y.³

29
- 84856141218
- Voice conversion using dynamic kernel partial least squares regression
- E. Helander, H. Silé, T. Virtanen, and M. M. Gabbouj, "Voice conversion using dynamic kernel partial least squares regression," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 3, pp. 806-817, 2012.
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.3 , pp. 806-817
- Helander, E.¹ Silé, H.² Virtanen, T.³ Gabbouj, M.M.⁴

30
- 84865737668
- Gaussian process experts for voice conversion
- N. Pilkington, H. Zen, and M. Gales, "Gaussian process experts for voice conversion," in Proc. INTERSPEECH, 2011, pp. 2761-2764.
- (2011) Proc. INTERSPEECH , pp. 2761-2764
- Pilkington, N.¹ Zen, H.² Gales, M.³

31
- 84890539284
- Voice conversion based on Gaussian processes by coherent and asymmetric training with limited training data
- N. Xu, Y. Tang, J. Bao, A. Jiang, X. Liu, and Z. Yang, "Voice conversion based on gaussian processes by coherent and asymmetric training with limited training data," Speech Communication, vol. 58, pp. 124-138, 2014.
- (2014) Speech Communication , vol.58 , pp. 124-138
- Xu, N.¹ Tang, Y.² Bao, J.³ Jiang, A.⁴ Liu, X.⁵ Yang, Z.⁶

32
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- S. Desai, A. Black, B. Yegnanarayana, and K. Prahallad, "Spectral mapping using artificial neural networks for voice conversion," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, no. 5, pp. 954-964, 2010.
- (2010) Audio, Speech, and Language Processing, IEEE Transactions on , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.² Yegnanarayana, B.³ Prahallad, K.⁴

33
- 84885055553
- Exemplar-based voice conversion using sparse representation in noisy environments
- R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplar-based voice conversion using sparse representation in noisy environments," Information and Systems, IEICE Transactions on, vol. E96-A, no. 10, pp. 1946-1953, 2013.
- (2013) Information and Systems, IEICE Transactions on , vol.E96-A , Issue.10 , pp. 1946-1953
- Takashima, R.¹ Takiguchi, T.² Ariki, Y.³

34
- 84911369131
- Exemplar-based sparse representation with residual compensation for voice conversion
- Z. Wu, T. Virtanen, E. Chng, and H. Li, "Exemplar-based sparse representation with residual compensation for voice conversion," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 10, pp. 1506-1521, 2014.
- (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.22 , Issue.10 , pp. 1506-1521
- Wu, Z.¹ Virtanen, T.² Chng, E.³ Li, H.⁴

35
- 84946027999
- Voice conversion using deep bidirectional long short-term memory based recurrent neural networks
- L. Sun, S. Kang, K. Li, and H. Meng, "Voice conversion using deep bidirectional long short-term memory based recurrent neural networks," in Proc. ICASSP, 2015, pp. 4869-4873.
- (2015) Proc. ICASSP , pp. 4869-4873
- Sun, L.¹ Kang, S.² Li, K.³ Meng, H.⁴

36
- 84962834006
- Post-filters to modify the modulation spectrum for statistical parametric speech synthesis
- S. Takamichi, T. Toda, A. Black, G. Neubig, S. Sakti, and S. Nakamura, "Post-filters to modify the modulation spectrum for statistical parametric speech synthesis," Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 24, no. 4, pp. 757-767, 2016.
- (2016) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.24 , Issue.4 , pp. 757-767
- Takamichi, S.¹ Toda, T.² Black, A.³ Neubig, G.⁴ Sakti, S.⁵ Nakamura, S.⁶

37
- 77953727123
- Voice conversion based on weighted frequency warping
- D. Erro, A. Moreno, and A. Bonafonte, "Voice conversion based on weighted frequency warping," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 18, no. 5, pp. 922-931, 2010.
- (2010) Audio, Speech, and Language Processing, IEEE Transactions on , vol.18 , Issue.5 , pp. 922-931
- Erro, D.¹ Moreno, A.² Bonafonte, A.³

38
- 84919935005
- Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech?-a dataset, insights, and challenges
- Aug
- G. J. Mysore, "Can we automatically transform speech recorded on common consumer devices in real-world environments into professional production quality speech?-a dataset, insights, and challenges," IEEE Signal Processing Letters, vol. 22, no. 8, pp. 1006-1010, Aug 2015.
- (2015) IEEE Signal Processing Letters , vol.22 , Issue.8 , pp. 1006-1010
- Mysore, G.J.¹

39
- 84994351528
- Analysis of the Voice Conversion Challenge 2016 evaluation results
- M. Wester, Z. Wu, and J. Yamagishi, "Analysis of the Voice Conversion Challenge 2016 evaluation results," in (submitted to) Interspeech, 2016.
- (2016) (Submitted To) Interspeech
- Wester, M.¹ Wu, Z.² Yamagishi, J.³

40
- 84962901047
- Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance
- Z. Wu, P. L. De Leon, C. Demiroglu, A. Khodabakhsh, S. King, Z.-H. Ling, D. Saito, B. Stewart, T. Toda, M. Wester, and J. Yamagishi, "Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, and human performance," Audio, Speech and Language Processing, IEEE/ACM Transactions on, vol. 24, pp. 768-783, 2016.
- (2016) Audio, Speech and Language Processing, IEEE/ACM Transactions on , vol.24 , pp. 768-783
- Wu, Z.¹ De Leon, P.L.² Demiroglu, C.³ Khodabakhsh, A.⁴ King, S.⁵ Ling, Z.-H.⁶ Saito, D.⁷ Stewart, B.⁸ Toda, T.⁹ Wester, M.¹⁰ Yamagishi, J.¹¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.