메뉴 건너뛰기




Volumn 35, Issue 4, 2016, Pages 1283-1311

A Multi-level GMM-Based Cross-Lingual Voice Conversion Using Language-Specific Mixture Weights for Polyglot Synthesis

Author keywords

ABX listening test; Cross lingual voice conversion; GMM; Multilingual; Oversmoothing; Polyglot

Indexed keywords

COMPUTATIONAL LINGUISTICS; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MIXTURES; SPEECH; SPEECH INTELLIGIBILITY; SPEECH RECOGNITION;

EID: 84959297010     PISSN: 0278081X     EISSN: 15315878     Source Type: Journal    
DOI: 10.1007/s00034-015-0118-1     Document Type: Article
Times cited : (17)

References (33)
  • 1
    • 0023739214 scopus 로고
    • Voice conversion through vector quantization, in International Conference on Acoustics
    • M. Abe, S. Nakamura, K. Shikano, H. Kuwabara, Voice conversion through vector quantization, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (1988), pp. 655–658
    • (1988) Speech, and Signal Processing (ICASSP) , vol.1 , pp. 655-658
    • Abe, M.1    Nakamura, S.2    Shikano, K.3    Kuwabara, H.4
  • 2
    • 0025590356 scopus 로고
    • Cross-language voice conversion, in International Conference on Acoustics
    • M. Abe, K. Shikano, H. Kuwabara, Cross-language voice conversion, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (1990), pp. 345–348
    • (1990) Speech, and Signal Processing (ICASSP) , vol.1 , pp. 345-348
    • Abe, M.1    Shikano, K.2    Kuwabara, H.3
  • 5
    • 84857498745 scopus 로고    scopus 로고
    • Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora
    • E. Godoy, O. Rosec, T. Chonavel, Voice conversion using dynamic frequency warping with amplitude scaling, for parallel or nonparallel corpora. IEEE Trans. Audio Speech Lang. Process. 20(4), 1313–1323 (2012)
    • (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , Issue.4 , pp. 1313-1323
    • Godoy, E.1    Rosec, O.2    Chonavel, T.3
  • 6
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database, in International Conference on Acoustics
    • A.J. Hunt, A.W. Black, Unit selection in a concatenative speech synthesis system using a large speech database, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (1996), pp. 373–376
    • (1996) Speech, and Signal Processing (ICASSP) , vol.1 , pp. 373-376
    • Hunt, A.J.1    Black, A.W.2
  • 8
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to-speech synthesis. In International Conference on Acoustics
    • A. Kain, M. Macon, Spectral voice conversion for text-to-speech synthesis. In International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (1998), pp. 285–288
    • (1998) Speech, and Signal Processing (ICASSP) , vol.1 , pp. 285-288
    • Kain, A.1    Macon, M.2
  • 9
    • 0030677481 scopus 로고    scopus 로고
    • Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited, in International Conference on Acoustics
    • H. Kawahara, Speech representation and transformation using adaptive interpolation of weighted spectrum: vocoder revisited, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2 (1997), pp. 1303–1306
    • (1997) Speech, and Signal Processing (ICASSP) , vol.2 , pp. 1303-1306
    • Kawahara, H.1
  • 11
    • 33748468338 scopus 로고    scopus 로고
    • New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer
    • J. Latorre, K. Iwano, S. Furui, New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer. Speech Commun. 48(10), 1227–1242 (2006)
    • (2006) Speech Commun. , vol.48 , Issue.10 , pp. 1227-1242
    • Latorre, J.1    Iwano, K.2    Furui, S.3
  • 15
    • 0009435105 scopus 로고    scopus 로고
    • Numerical recipes in C: the art of scientific computing (Chapter 14), 2nd edn. (Cambridge University Press
    • W.H. Press, S.A. Teukolsky, W.T. Vetterling, B.P. Flannery, Numerical recipes in C: the art of scientific computing (Chapter 14), 2nd edn. (Cambridge University Press, Cambridge, 1992), pp. 615–619
    • Cambridge , vol.1992 , pp. 615-619
    • Press, W.H.1    Teukolsky, S.A.2    Vetterling, W.T.3    Flannery, B.P.4
  • 20
    • 70349197715 scopus 로고    scopus 로고
    • Voice transformation: a survey, in International Conference on Acoustics
    • Y. Stylianou, Voice transformation: a survey, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2009), pp. 3585–3588
    • (2009) Speech, and Signal Processing (ICASSP) , pp. 3585-3588
    • Stylianou, Y.1
  • 23
    • 84946753271 scopus 로고    scopus 로고
    • VTLN-based cross-language voice conversion, in IEEE Workshop on Automatic Speech Recognition and Understanding
    • D. Sundermann, H. Ney, H. Hoge, VTLN-based cross-language voice conversion, in IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU’03 (2003), pp. 676–681
    • (2003) ASRU’03 , pp. 676-681
    • Sundermann, D.1    Ney, H.2    Hoge, H.3
  • 24
    • 84959258193 scopus 로고    scopus 로고
    • Technology Development for Indian Languages Programme, DeitY (2013),. Last Accessed on 06 Sept 2014
    • Technology Development for Indian Languages Programme, DeitY (2013), http://tdil.mit.gov.in/AboutUs.aspx. Last Accessed on 06 Sept 2014
  • 25
    • 0034842552 scopus 로고    scopus 로고
    • Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum, in International Conference on Acoustics
    • T. Toda, H. Saruwatari, K. Shikano, Voice conversion algorithm based on Gaussian mixture model with dynamic frequency warping of straight spectrum, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2 (2001), pp. 841–844
    • (2001) Speech, and Signal Processing (ICASSP) , vol.2 , pp. 841-844
    • Toda, T.1    Saruwatari, H.2    Shikano, K.3
  • 26
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • T. Toda, A. Black, K. Tokuda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15, 2222–2235 (2007)
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 27
    • 17444453660 scopus 로고    scopus 로고
    • Torres-carrasquillo, D.A. Reynolds, J. Deller Jr, Language identification using Gaussian mixture model tokenization, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp
    • P.A. Torres-carrasquillo, D.A. Reynolds, J. Deller Jr, Language identification using Gaussian mixture model tokenization, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. I-757–I-760 (2002)
    • (2002) I-757–I-760
  • 28
    • 85010815133 scopus 로고
    • Voice transformation using PSOLA technique, in International Conference on Acoustics
    • H. Valbret, E. Moulines, J.P. Tubach, Voice transformation using PSOLA technique, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (1992), pp. 145–148
    • (1992) Speech, and Signal Processing (ICASSP) , vol.1 , pp. 145-148
    • Valbret, H.1    Moulines, E.2    Tubach, J.P.3
  • 29
    • 84857131313 scopus 로고    scopus 로고
    • Improving speech intelligibility in cochlear implants using acoustic models
    • P. Vijayalakshmi, T. Nagarajan, P. Mahadevan, Improving speech intelligibility in cochlear implants using acoustic models. WSEAS Trans. Signal Process. 7(4), 131–144 (2011)
    • (2011) WSEAS Trans. Signal Process. , vol.7 , Issue.4 , pp. 131-144
    • Vijayalakshmi, P.1    Nagarajan, T.2    Mahadevan, P.3
  • 30
    • 67650854725 scopus 로고    scopus 로고
    • Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
    • J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, J. Isogai, Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm. IEEE Trans. Audio Speech Lang. Process. 17(1), 66–83 (2009)
    • (2009) IEEE Trans. Audio Speech Lang. Process. , vol.17 , Issue.1 , pp. 66-83
    • Yamagishi, J.1    Kobayashi, T.2    Nakano, Y.3    Ogata, K.4    Isogai, J.5
  • 31
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, A.W. Black, Statistical parametric speech synthesis. Speech Commun. 51, 1039–1064 (2009)
    • (2009) Speech Commun. , vol.51 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 32
    • 51449121435 scopus 로고    scopus 로고
    • Text-independent voice conversion based on state mapped codebook, in International Conference on Acoustics
    • M. Zhang, J. Tao, J. Tian, X. Wang, Text-independent voice conversion based on state mapped codebook, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2008), pp. 4605–4608
    • (2008) Speech, and Signal Processing (ICASSP) , pp. 4605-4608
    • Zhang, M.1    Tao, J.2    Tian, J.3    Wang, X.4
  • 33
    • 85079102131 scopus 로고    scopus 로고
    • M.A. Zissman, E. Singer, Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (1994)
    • M.A. Zissman, E. Singer, Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling, in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1 (1994), pp. I-305–I-308


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.