SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 2759-2763

System fusion for high-performance voice conversion

(6) Tian, Xiaohai a,b Wu, Zhizheng c Lee, Siu Wa d Hy, Nguyen Quy a,b Dong, Minghui d Chng, Eng Siong a,b

a NANYANG TECHNOLOGICAL UNIVERSITY (Singapore)

b Joint NTU UBC Research Centre of Excellence in Active Living for the ElderlyUBC Research Centre of Excellence in Active Living for the Elderly (Singapore)

c UNIVERSITY OF EDINBURGH (United Kingdom)

d INSTITUTE FOR INFOCOMM RESEARCH (Singapore)

Author keywords

Frequency warping; GMM; Highperformance; System fusion; Voice conversion

Indexed keywords

GAUSSIAN DISTRIBUTION; SPEECH COMMUNICATION;

CEPSTRAL COEFFICIENTS; FREQUENCY WARPING; GAUSSIAN MIXTURE MODEL; HIGH-RESOLUTION SPECTRA; HIGHPERFORMANCE; OBJECTIVE AND SUBJECTIVE EVALUATIONS; SPEAKER CHARACTERISTICS; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84959163883 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (13)

References (28)

1
- 0032026483
- Continuousprobabilistic transform for voice conversion
- Y. Stylianou, O. Cappé, and E. Moulines, "Continuousprobabilistic transform for voice conversion, " IEEETransactions on Speech and Audio Processing, vol. 6, no. 2, pp. 131-142, 1998.
- (1998) IEEETransactions on Speech and Audio Processing , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

2
- 0031623661
- Spectral voice conversionfor text-to-speech synthesis
- A. Kain and M. W. Macon, "Spectral voice conversionfor text-to-speech synthesis, " in IEEE InternationalConference on Acoustics, Speech and Signal Processing(ICASSP), vol. 1, 1998, pp. 285-288.
- (1998) IEEE InternationalConference on Acoustics, Speech and Signal Processing(ICASSP) , vol.1 , pp. 285-288
- Kain, A.¹ Macon, M.W.²

3
- 77953712499
- Voice conversion using partial least squares regression
- E. Heland er, T. Virtanen, J. Nurminen, and M. Gabbouj, "Voice conversion using partial least squares regression, "IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 912-921, 2010.
- (2010) IEEE Transactions on Audio, Speech, and Language Processing , vol.18 , Issue.5 , pp. 912-921
- Heland Er, E.¹ Virtanen, T.² Nurminen, J.³ Gabbouj, M.⁴

4
- 70349197691
- Voice conversion using artificialneural networks
- S. Desai, E. V. Raghavendra, B. Yegnanarayana, A. W. Black, and K. Prahallad, "Voice conversion using artificialneural networks, " in IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP), 2009, pp. 3893-3896.
- (2009) IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP) , pp. 3893-3896
- Desai, S.¹ Raghavendra, E.V.² Yegnanarayana, B.³ Black, A.W.⁴ Prahallad, K.⁵

5
- 84921735339
- Voiceconversion using deep neural networks with layer-wisegenerative training
- L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, "Voiceconversion using deep neural networks with layer-wisegenerative training, " IEEE Transactions on Speech and Audio Processing, vol. 22, no. 12, pp. 1859-1872, 2014.
- (2014) IEEE Transactions on Speech and Audio Processing , vol.22 , Issue.12 , pp. 1859-1872
- Chen, L.-H.¹ Ling, Z.-H.² Liu, L.-J.³ Dai, L.-R.⁴

6
- 84910087395
- Sequenceerror (SE) minimization training of neural networkfor voice conversion
- F.-L. Xie, Y. Qian, Y. Fan, F. K. Soong, and H. Li, "Sequenceerror (SE) minimization training of neural networkfor voice conversion, " in INTERSPEECH, 2014.
- (2014) INTERSPEECH
- Xie, F.-L.¹ Qian, Y.² Fan, Y.³ Soong, F.K.⁴ Li, H.⁵

7
- 84856141218
- Voice conversion using dynamic kernel partial leastsquares regression
- E. Heland er, H. Silén, T. Virtanen, and M. Gabbouj, "Voice conversion using dynamic kernel partial leastsquares regression, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 3, pp. 806-817, 2012.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.3 , pp. 806-817
- Heland Er, E.¹ Silén, H.² Virtanen, T.³ Gabbouj, M.⁴

8
- 57749193836
- Voice conversionbased on maximum-likelihood estimation of spectral parametertrajectory
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversionbased on maximum-likelihood estimation of spectral parametertrajectory, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

9
- 84865754815
- Voice conversion using GMMwith enhanced global variance
- H. Benisty and D. Malah, "Voice conversion using GMMwith enhanced global variance, " in INTERSPEECH, 2011, pp. 669-672.
- (2011) INTERSPEECH , pp. 669-672
- Benisty, H.¹ Malah, D.²

10
- 84901803470
- Exemplar-based voice conversion using non-negativespectrogram deconvolution
- Z. Wu, T. Virtanen, T. Kinnunen, E. S. Chng, and H. Li, "Exemplar-based voice conversion using non-negativespectrogram deconvolution, " in 8th ISCA Speech SynthesisWorkshop, 2013.
- (2013) 8th ISCA Speech SynthesisWorkshop
- Wu, Z.¹ Virtanen, T.² Kinnunen, T.³ Chng, E.S.⁴ Li, H.⁵

11
- 84874248255
- Exemplarbasedvoice conversion in noisy environment
- R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplarbasedvoice conversion in noisy environment, " in SpokenLanguage Technology workshop (SLT), 2012, pp. 313-317.
- (2012) SpokenLanguage Technology Workshop (SLT) , pp. 313-317
- Takashima, R.¹ Takiguchi, T.² Ariki, Y.³

12
- 84911369131
- Exemplarbasedsparse representation with residual compensationfor voice conversion
- Z. Wu, T. Virtanen, E. S. Chng, and H. Li, "Exemplarbasedsparse representation with residual compensationfor voice conversion, " IEEE Transactions on Speech and Audio Processing, vol. 22, no. 10, pp. 1506-1521, 2014.
- (2014) IEEE Transactions on Speech and Audio Processing , vol.22 , Issue.10 , pp. 1506-1521
- Wu, Z.¹ Virtanen, T.² Chng, E.S.³ Li, H.⁴

13
- 84948175540
- VTLN-based voice conversion
- D. Sundermann and H. Ney, "VTLN-based voice conversion, "in IEEE International Symposium on Signal Processingand Information Technology (ISSPIT), 2003, pp. 556-559.
- (2003) IEEE International Symposium on Signal Processingand Information Technology (ISSPIT) , pp. 556-559
- Sundermann, D.¹ Ney, H.²

14
- 84946753271
- VTLN-basedcross-language voice conversion
- D. Sundermann, H. Ney, and H. Hoge, "VTLN-basedcross-language voice conversion, " in IEEE Workshopon Automatic Speech Recognition and Understand ing(ASRU), 2003, pp. 676-681.
- (2003) IEEE Workshopon Automatic Speech Recognition and Understand Ing(ASRU) , pp. 676-681
- Sundermann, D.¹ Ney, H.² Hoge, H.³

15
- 77953727123
- Voice conversionbased on weighted frequency warping
- D. Erro, A. Moreno, and A. Bonafonte, "Voice conversionbased on weighted frequency warping, " IEEE Transactionson Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 922-931, 2010.
- (2010) IEEE Transactionson Audio, Speech, and Language Processing , vol.18 , Issue.5 , pp. 922-931
- Erro, D.¹ Moreno, A.² Bonafonte, A.³

16
- 84872177757
- Parametric voice conversionbased on bilinear frequency warping plus amplitudescaling
- D. Erro, E. Navas, and I. Hernaez, "Parametric voice conversionbased on bilinear frequency warping plus amplitudescaling, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 3, pp. 556-566, 2013.
- (2013) IEEE Transactions on Audio, Speech, and Language Processing , vol.21 , Issue.3 , pp. 556-566
- Erro, D.¹ Navas, E.² Hernaez, I.³

17
- 84912079352
- Correlationbasedfrequency warping for voice conversion
- X. Tian, Z. Wu, S. W. Lee, and E. S. Chng, "Correlationbasedfrequency warping for voice conversion, " in 9th InternationalSymposium on Chinese Spoken Language Processing(ISCSLP), 2014, pp. 211-215.
- (2014) 9th InternationalSymposium on Chinese Spoken Language Processing(ISCSLP) , pp. 211-215
- Tian, X.¹ Wu, Z.² Lee, S.W.³ Chng, E.S.⁴

18
- 84946020861
- Sparse representation for frequency warpingbased voice conversion
- to appear
- X. Tian, Z. Wu, S. W. Lee, N. Q. Hy, E. S. Chng, and M. Dong, "Sparse representation for frequency warpingbased voice conversion, " in IEEE International Conferenceon Acoustics, Speech, and Signal Processing(ICASSP) to appear, 2015.
- (2015) IEEE International Conferenceon Acoustics, Speech, and Signal Processing(ICASSP)
- Tian, X.¹ Wu, Z.² Lee, S.W.³ Hy, N.Q.⁴ Chng, E.S.⁵ Dong, M.⁶

19
- 84857498745
- Voice conversionusing dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora
- E. Godoy, O. Rosec, and T. Chonavel, "Voice conversionusing dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora, " IEEE Transactions onAudio, Speech, and Language Processing, vol. 20, no. 4, pp. 1313-1323, 2012.
- (2012) IEEE Transactions OnAudio, Speech, and Language Processing , vol.20 , Issue.4 , pp. 1313-1323
- Godoy, E.¹ Rosec, O.² Chonavel, T.³

20
- 0030245128
- Robust continuous speechrecognition using parallel model combination
- M. J. Gales and S. J. Young, "Robust continuous speechrecognition using parallel model combination, " IEEETransactions on Speech and Audio Processing, vol. 4, no. 5, pp. 352-359, 1996.
- (1996) IEEETransactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 352-359
- Gales, M.J.¹ Young, S.J.²

21
- 51449086024
- Fusion of heterogeneousspeaker recognition systems in the stbu submissionfor the nist speaker recognition evaluation 2006
- N. Brummer, L. Burget, J. H. Cernocky, O. Glembek, F. Grezl, M. Karafiat, D. A. Van Leeuwen, P. Matejka, P. Schwarz, and A. Strasheim, "Fusion of heterogeneousspeaker recognition systems in the stbu submissionfor the nist speaker recognition evaluation 2006, " IEEETransactions on Audio, Speech, and Language Processing, vol. 15, no. 7, pp. 2072-2084, 2007.
- (2007) IEEETransactions on Audio, Speech, and Language Processing , vol.15 , Issue.7 , pp. 2072-2084
- Brummer, N.¹ Burget, L.² Cernocky, J.H.³ Glembek, O.⁴ Grezl, F.⁵ Karafiat, M.⁶ Van Leeuwen, D.A.⁷ Matejka, P.⁸ Schwarz, P.⁹ Strasheim, A.¹⁰

22
- 85008525798
- Productof experts for statistical parametric speech synthesis
- H. Zen, M. J. Gales, Y. Nankaku, and K. Tokuda, "Productof experts for statistical parametric speech synthesis, "IEEE Transactions on Audio, Speech, and Language Processing, vol. 20, no. 3, pp. 794-805, 2012.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , Issue.3 , pp. 794-805
- Zen, H.¹ Gales, M.J.² Nankaku, Y.³ Tokuda, K.⁴

23
- 0026880275
- Voice transformationusing PSOLA technique
- H. Valbret, E. Moulines, and J.-P. Tubach, "Voice transformationusing PSOLA technique, " Speech Communication, vol. 11, no. 2, pp. 175-187, 1992.
- (1992) Speech Communication , vol.11 , Issue.2 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.-P.³

24
- 0001481529
- Bark and ERB bilinear transforms
- J. O. Smith and J. S. Abel, "Bark and ERB bilinear transforms, "IEEE Transactions on Speech and Audio Processing, vol. 7, no. 6, pp. 697-708, 1999.
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , Issue.6 , pp. 697-708
- Smith, J.O.¹ Abel, J.S.²

25
- 4444285698
- Ph. D. dissertation, Rockford College
- A. B. Kain, "High resolution voice transformation, " Ph. D. dissertation, Rockford College, 2001.
- (2001) High Resolution Voice Transformation
- Kain, A.B.¹

26
- 0032673049
- Restructuring speech representations using a pitchadaptivetime-frequency smoothing and an instantaneousfrequency-based F0 extraction: Possible role of a repetitivestructure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitchadaptivetime-frequency smoothing and an instantaneousfrequency-based F0 extraction: Possible role of a repetitivestructure in sounds, " Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

27
- 84878390910
- Implementationof computationally efficient real-time voice conversion
- T. Toda, T. Muramatsu, and H. Banno, "Implementationof computationally efficient real-time voice conversion. "in INTERSPEECH, 2012.
- (2012) INTERSPEECH
- Toda, T.¹ Muramatsu, T.² Banno, H.³

28
- 4544284652
- High quality voice morphing
- H. Ye and S. Young, "High quality voice morphing, " inIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, 2004, pp. 1-9.
- (2004) IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , vol.1 , pp. 1-9
- Ye, H.¹ Young, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.