SCOPUS 정보 검색 플랫폼

Volumn 67, Issue , 2015, Pages 113-128

Voice conversion based on feature combination with limited training data

(6) Ghorbandoost, Mostafa a Sayadiyan, Abolghasem a Ahangar, Mohsen a Sheikhzadeh, Hamid a Shahrebabaki, Abdoreza Sabzi a Amini, Jamal a

a AMIRKABIR UNIVERSITY OF TECHNOLOGY (Iran)

Author keywords

Dynamic kernel partial least square regression (DKPLS); Feature combination; Gaussian mixture models (GMM); Voice conversion

Indexed keywords

LEAST SQUARES APPROXIMATIONS; POLES; QUALITY CONTROL; SPEECH RECOGNITION;

ACOUSTICAL CHARACTERISTICS; ANALYSIS/SYNTHESIS; FEATURE COMBINATION; GAUSSIAN MIXTURE MODEL; KERNEL PARTIAL LEAST SQUARE REGRESSIONS; LIMITED TRAINING DATA; VOICE CONVERSION; VOICE CONVERSION ALGORITHM;

SPEECH PROCESSING;

EID: 84919915933 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2014.12.004 Document Type: Article

Times cited : (20)

References (39)

1
- 84899126601
- Voice conversion based on state space model and considering global variance
- Ahangar, M., Ghorbandoost, M., Sheikhzadeh, H., Raahemifar, K., Shahrebabaki, A.S., Amini, J., 2013. Voice conversion based on state space model and considering global variance. In: IEEE Intl. Symp. Signal Processing and Information Technology (ISSPIT), pp. 000416-000421.
- (2013) IEEE Intl. Symp. Signal Processing and Information Technology (ISSPIT) , pp. 000416-000421
- Ahangar, M.¹ Ghorbandoost, M.² Sheikhzadeh, H.³ Raahemifar, K.⁴ Shahrebabaki, A.S.⁵ Amini, J.⁶

2
- 84899122951
- Speech analysis/synthesis by Gaussian mixture approximation of the speech spectrum for voice conversion
- Amini, J., Shahrebabaki, A.S., Shokouhi, N., Sheikhzadeh, H., Raahemifar, K., Eslami, M., 2013. Speech analysis/synthesis by Gaussian mixture approximation of the speech spectrum for voice conversion. In: IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), pp. 000428-000433.
- (2013) IEEE International Symposium on Signal Processing and Information Technology (ISSPIT) , pp. 000428-000433
- Amini, J.¹ Shahrebabaki, A.S.² Shokouhi, N.³ Sheikhzadeh, H.⁴ Raahemifar, K.⁵ Eslami, M.⁶

3
- 0033154052
- Speaker transformation algorithm using segmental codebooks (STASC)
- Arslan, L.M., 1999. Speaker transformation algorithm using segmental codebooks (STASC). Speech Commun. 28 (3), 211-226.
- (1999) Speech Commun. , vol.28 , Issue.3 , pp. 211-226
- Arslan, L.M.¹

4
- 0027530250
- SIMPLS: An alternative approach to partial least squares regression
- de Jong, S., 1993. SIMPLS: an alternative approach to partial least squares regression. Chemometr. Intell. Lab. Syst. 18 (3), 251-263.
- (1993) Chemometr. Intell. Lab. Syst. , vol.18 , Issue.3 , pp. 251-263
- De Jong, S.¹

5
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- Desai, S., Black, A.W., Yegnanarayana, B., Prahallad, K., 2010. Spectral mapping using artificial neural networks for voice conversion. IEEE Trans. Audio, Speech Lang. Process. 18 (5), 954-964.
- (2010) IEEE Trans. Audio, Speech Lang. Process. , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.W.² Yegnanarayana, B.³ Prahallad, K.⁴

6
- 77953697940
- Barcelona, Spain: PhD Thesis, Universitat Politechnica de Catalunya, 2008
- Erro, D., 2008. Intra-lingual and Cross-lingual Voice Conversion using Harmonic Plus Stochastic Models. Barcelona, Spain: PhD Thesis, Universitat Politechnica de Catalunya, 2008.
- (2008) Intra-lingual and Cross-lingual Voice Conversion Using Harmonic Plus Stochastic Models
- Erro, D.¹

7
- 77953727123
- Voice conversion based on weighted frequency warping
- Erro, D., Moreno, A., Bonafonte, A., 2010. Voice conversion based on weighted frequency warping. IEEE Trans. Audio, Speech Lang. Process. 18 (5), 922-931.
- (2010) IEEE Trans. Audio, Speech Lang. Process. , vol.18 , Issue.5 , pp. 922-931
- Erro, D.¹ Moreno, A.² Bonafonte, A.³

8
- 84872177757
- Parametric voice conversion based on bilinear frequency warping plus amplitude scaling
- Erro, D., Navas, E., Hernaez, I., 2013. Parametric voice conversion based on bilinear frequency warping plus amplitude scaling. IEEE Trans. Audio, Speech Lang. Process. 21 (3), 556-566.
- (2013) IEEE Trans. Audio, Speech Lang. Process. , vol.21 , Issue.3 , pp. 556-566
- Erro, D.¹ Navas, E.² Hernaez, I.³

9
- 85161148381
- The elements of statistical learning: Data mining, inference and prediction
- Hastie, T., Tibshirani, R., Friedman, J., Franklin, J., 2005. The elements of statistical learning: data mining, inference and prediction. Math. Intell. 27 (2), 83-85.
- (2005) Math. Intell. , vol.27 , Issue.2 , pp. 83-85
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³ Franklin, J.⁴

10
- 51449107658
- LSF mapping for voice conversion with very small training sets
- Helander, E., Nurminen, J., Gabbouj, M., 2008. LSF mapping for voice conversion with very small training sets. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008, pp. 4669-4672.
- (2008) IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2008 , pp. 4669-4672
- Helander, E.¹ Nurminen, J.² Gabbouj, M.³

11
- 77953712499
- Voice conversion using partial least squares regression
- Helander, E., Virtanen, T., Nurminen, J., Gabbouj, M., 2010. Voice conversion using partial least squares regression. IEEE Trans. Audio, Speech Lang. Process. 18 (5), 912-921.
- (2010) IEEE Trans. Audio, Speech Lang. Process. , vol.18 , Issue.5 , pp. 912-921
- Helander, E.¹ Virtanen, T.² Nurminen, J.³ Gabbouj, M.⁴

12
- 84856141218
- Voice conversion using dynamic kernel partial least squares regression
- Helander, E., Siln, H., Virtanen, T., Gabbouj, M., 2012. Voice conversion using dynamic kernel partial least squares regression. IEEE Trans. Audio, Speech Lang. Process. 20 (3), 806-817.
- (2012) IEEE Trans. Audio, Speech Lang. Process. , vol.20 , Issue.3 , pp. 806-817
- Helander, E.¹ Siln, H.² Virtanen, T.³ Gabbouj, M.⁴

13
- 0001810975
- Line spectrum representation of linear predictor coefficients of speech signals
- Itakura, F., 1975. Line spectrum representation of linear predictor coefficients of speech signals. J. Acoust. Soc. Am. 57 (S1), S35-S35.
- (1975) J. Acoust. Soc. Am. , vol.57 , Issue.S1 , pp. S35-S35
- Itakura, F.¹

14
- 0031623661
- Spectral voice conversion for text-to-speech synthesis
- Kain, A., Macon, M., 1998. Spectral voice conversion for text-to-speech synthesis. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP1998), pp. 285-288.
- (1998) Proc. Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP1998) , pp. 285-288
- Kain, A.¹ Macon, M.²

15
- 0032673049
- Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- Kawahara, H., Masuda-Katsuse, I., de Cheveigne, A., 1999. Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Commun. 27 (3), 187-207.
- (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigne, A.³

16
- 85090475413
- The CMU Arctic speech databases
- Kominek, J., Black, A.W., 2004. The CMU Arctic speech databases. In: Fifth ISCA Workshop on Speech Synthesis.
- (2004) Fifth ISCA Workshop on Speech Synthesis
- Kominek, J.¹ Black, A.W.²

17
- 44949210554
- Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
- Lee, C.H., Wu, C.H., 2006. Map-based adaptation for speech conversion using adaptation data selection and non-parallel training. In: Proc. Interspeech, 2006.
- (2006) Proc. Interspeech, 2006
- Lee, C.H.¹ Wu, C.H.²

18
- 0003440227
- Prentice-Hall, Englewood Cliffs, NJ
- Lim, J., Oppenheim, A.V., 1988. Advanced Topics in Signal Processing. Prentice-Hall, Englewood Cliffs, NJ.
- (1988) Advanced Topics in Signal Processing
- Lim, J.¹ Oppenheim, A.V.²

19
- 44949143155
- Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
- Ohtani, Y., Toda, T., Saruwatari, H., Shikano, K., 2006. Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation. In: Proc. Interspeech 2006, pp. 2266-2269.
- (2006) Proc. Interspeech 2006 , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

20
- 84888273467
- Reduced search space frame alignment based on Kullback-Leibler divergence for voice conversion
- Springer, Berlin Heidelberg
- Shahrebabaki, A.S., Amini, J., Sheikhzadeh, H., Ghorbandoost, M., Faraji, N., 2013. Reduced search space frame alignment based on Kullback-Leibler divergence for voice conversion. Advances in Nonlinear Speech Processing. Springer, Berlin Heidelberg, pp. 83-88.
- (2013) Advances in Nonlinear Speech Processing , pp. 83-88
- Shahrebabaki, A.S.¹ Amini, J.² Sheikhzadeh, H.³ Ghorbandoost, M.⁴ Faraji, N.⁵

21
- 51449112440
- Voice conversion by combining frequency warping with unit selection
- Shuang, Z., Meng, F., Qin, Y., 2008. Voice conversion by combining frequency warping with unit selection. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP2008), pp. 4661-4664.
- (2008) Proc. Internat. Conf. On Acoustics, Speech and Signal Processing (ICASSP2008) , pp. 4661-4664
- Shuang, Z.¹ Meng, F.² Qin, Y.³

22
- 84919942332
- Text-independent cross-language voice conversion
- Sndermann, D., Hge, H., Bonafonte, A., Ney, H., Hirschberg, J., 2006. Text-independent cross-language voice conversion. In: Proc. Interspeech, 2006.
- (2006) Proc. Interspeech, 2006
- Sndermann, D.¹ Hge, H.² Bonafonte, A.³ Ney, H.⁴ Hirschberg, J.⁵

23
- 0003447548
- PhD Diss., Ecole Nationale Suprieure des Tcommunications
- Stylianou, I., 1996. Harmonic Plus Noise Models for Speech, Combined with Statistical Methods, for Speech and Speaker Modification. PhD Diss., Ecole Nationale Suprieure des Tcommunications.
- (1996) Harmonic Plus Noise Models for Speech, Combined with Statistical Methods, for Speech and Speaker Modification
- Stylianou, I.¹

24
- 0032026483
- Continuous probabilistic transform for voice conversion
- Stylianou, Y., Capp, O., Moulines, E., 1998. Continuous probabilistic transform for voice conversion. IEEE Trans. Audio, Speech Lang. Process. 6 (2), 131-142.
- (1998) IEEE Trans. Audio, Speech Lang. Process. , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Capp, O.² Moulines, E.³

25
- 77953724495
- Supervisory data alignment for text-independent voice conversion
- Tao, J., Zhang, M., Nurminen, J., Tian, J., Wang, X., 2010. Supervisory data alignment for text-independent voice conversion. IEEE Trans. Audio, Speech Lang. Process. 18 (5), 932-943.
- (2010) IEEE Trans. Audio, Speech Lang. Process. , vol.18 , Issue.5 , pp. 932-943
- Tao, J.¹ Zhang, M.² Nurminen, J.³ Tian, J.⁴ Wang, X.⁵

26
- 33646779506
- Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter
- Toda, T., Black, A.W., Tokuda, K., 2005. Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter. In: IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP2005), vol. 1, pp. 9-12.
- (2005) IEEE Internat. Conf. On Acoustics, Speech, and Signal Processing (ICASSP2005) , vol.1 , pp. 9-12
- Toda, T.¹ Black, A.W.² Tokuda, K.³

27
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- Toda, T., Black, A.W., Tokuda, K., 2007. Voice conversion based on maximum likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio, Speech Lang. Process. 15 (8), 2222-2235.
- (2007) IEEE Trans. Audio, Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

28
- 84865698185
- Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
- Toda, T., Nakagiri, M., Shikano, K., 2012. Statistical voice conversion techniques for body-conducted unvoiced speech enhancement. IEEE Trans. Audio, Speech Lang. Process. 20 (9), 2505-2517.
- (2012) IEEE Trans. Audio, Speech Lang. Process. , vol.20 , Issue.9 , pp. 2505-2517
- Toda, T.¹ Nakagiri, M.² Shikano, K.³

29
- 85131821539
- Mel-generalized cepstral analysis - A unified approach to speech spectral estimation
- Tokuda, K., Kobayashi, T., Masuko, T., Imai, S., 1994. Mel-generalized cepstral analysis-a unified approach to speech spectral estimation. In: ICSLP, vol. 3, pp. 1043-1046.
- (1994) ICSLP , vol.3 , pp. 1043-1046
- Tokuda, K.¹ Kobayashi, T.² Masuko, T.³ Imai, S.⁴

30
- 84905242025
- Recursive calculation of melcepstrum from LP coefficients
- Tokuda, K., Kobayashi, T., Imai, S., 1994. Recursive calculation of melcepstrum from LP coefficients. Trans. IEICE 71, 128-131.
- (1994) Trans. IEICE , vol.71 , pp. 128-131
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

31
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T., 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing (ICASSP2000), vol. 3, pp. 1315-1318.
- (2000) Proc. Internat. Conf. On Acoustics, Speech and Signal Processing (ICASSP2000) , vol.3 , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

32
- 0027541354
- B-spline signal processing
- Unser, M., Aldroubi, A., Eden, M., 1993. B-spline signal processing. IEEE Trans. Signal Process. 41 (2), 821-848.
- (1993) IEEE Trans. Signal Process. , vol.41 , Issue.2 , pp. 821-848
- Unser, M.¹ Aldroubi, A.² Eden, M.³

33
- 34047247202
- Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
- Wu, C.H., Hsia, C.C., Liu, T.H., Wang, J.F., 2006. Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis. IEEE Trans. Audio, Speech Lang. Process. 14 (4), 1109-1116.
- (2006) IEEE Trans. Audio, Speech Lang. Process. , vol.14 , Issue.4 , pp. 1109-1116
- Wu, C.H.¹ Hsia, C.C.² Liu, T.H.³ Wang, J.F.⁴

34
- 84889579519
- Conditional restricted boltzmann machine for voice conversion
- Wu, Z., Chng, E.S., Li, H., 2013. Conditional restricted boltzmann machine for voice conversion. In: Signal and Information Processing (ChinaSIP), pp. 104-108.
- (2013) Signal and Information Processing (ChinaSIP) , pp. 104-108
- Wu, Z.¹ Chng, E.S.² Li, H.³

35
- 84867589419
- Statistical approach to voice quality control in esophageal speech enhancement
- Yamamoto, K., Toda, T., Saruwatari, H., Shikano, K., 2012. Statistical approach to voice quality control in esophageal speech enhancement. In: IEEE Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP2012), pp. 4497-4500.
- (2012) IEEE Internat. Conf. On Acoustics, Speech, and Signal Processing (ICASSP2012) , pp. 4497-4500
- Yamamoto, K.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

36
- 84919942331
- Voice conversion for unknown speakers
- Ye, H., Young, S., 2004. Voice conversion for unknown speakers. In: Proc. Interspeech, 2004.
- (2004) Proc. Interspeech, 2004
- Ye, H.¹ Young, S.²

37
- 34047254509
- Quality-enhanced voice morphing using maximum likelihood transformations
- 14.4
- Ye, H., Young, S., 2006. Quality-enhanced voice morphing using maximum likelihood transformations. IEEE Trans. Audio Speech Lang. Process. 14.4, 1301-1312.
- (2006) IEEE Trans. Audio Speech Lang. Process. , pp. 1301-1312
- Ye, H.¹ Young, S.²

38
- 68249104241
- The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge
- 91.6
- Zen, H., Toda, T., Tokuda, K., 2008. The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge. IEICE Trans. Inform. Syst. 91.6, 1764-1773.
- (2008) IEICE Trans. Inform. Syst. , pp. 1764-1773
- Zen, H.¹ Toda, T.² Tokuda, K.³

39
- 78149260085
- Continuous stochastic feature mapping based on trajectory HMMs
- Zen, H., Nankaku, Y., Tokuda, K., 2011. Continuous stochastic feature mapping based on trajectory HMMs. IEEE Trans. Audio, Speech Lang. Process. 19 (2), 417-430.
- (2011) IEEE Trans. Audio, Speech Lang. Process. , vol.19 , Issue.2 , pp. 417-430
- Zen, H.¹ Nankaku, Y.² Tokuda, K.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.