메뉴 건너뛰기




Volumn 20, Issue 6, 2012, Pages 1784-1794

Statistical voice conversion based on noisy channel model

Author keywords

Joint density model; noisy channel model; probabilistic integration; speaker model; voice conversion (VC)

Indexed keywords

JOINT DENSITIES; NOISY CHANNEL; PROBABILISTIC INTEGRATION; SPEAKER MODEL; VOICE CONVERSION;

EID: 84859768504     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2188628     Document Type: Article
Times cited : (28)

References (37)
  • 1
    • 34047254509 scopus 로고    scopus 로고
    • Quality-enhanced voice morphing using maximum likelihood transformations
    • DOI 10.1109/TSA.2005.860839
    • H. Ye and S. Young, "Quality-enhanced voice morphing using maximum likelihood transformations," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1301-1312, Jul. 2006. (Pubitemid 46547625)
    • (2006) IEEE Transactions on Audio, Speech and Language Processing , vol.14 , Issue.4 , pp. 1301-1312
    • Ye, H.1    Young, S.2
  • 3
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-tospeech synthesis
    • A. Kain and M. W. Macon, "Spectral voice conversion for text-tospeech synthesis," in Proc. ICASSP, 1998, vol. 1, pp. 285-288.
    • (1998) Proc. ICASSP , vol.1 , pp. 285-288
    • Kain, A.1    MacOn, M.W.2
  • 5
    • 0036291376 scopus 로고    scopus 로고
    • Uncertainty decoding with SPLICE for noise robust speech recognition
    • J. Droppo, A. Acero, and L. Deng, "Uncertainty decoding with SPLICE for noise robust speech recognition," in Proc. ICASSP, 2002, pp. 57-60.
    • (2002) Proc. ICASSP , pp. 57-60
    • Droppo, J.1    Acero, A.2    Deng, L.3
  • 6
    • 0033692729 scopus 로고    scopus 로고
    • Narrowband to wideband conversion of speech using GMM based transformation
    • K. Y. Park and H. S. Kim, "Narrowband to wideband conversion of speech using GMM based transformation," in Proc. ICASSP, 2000, pp. 1847-1850.
    • (2000) Proc. ICASSP , pp. 1847-1850
    • Park, K.Y.1    Kim, H.S.2
  • 7
    • 44949265538 scopus 로고    scopus 로고
    • Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech
    • K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking aid system for total laryngectomees using voice conversion of body transmitted artificial speech," in Proc. Interspeech, 2006, pp. 1395-1398.
    • (2006) Proc. Interspeech , pp. 1395-1398
    • Nakamura, K.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 8
    • 70450192197 scopus 로고    scopus 로고
    • Speech generation from hand gestures based on space mapping
    • A. Kunikoshi, Y. Qiao, N. Minematsu, and K. Hirose, "Speech generation from hand gestures based on space mapping," in Proc. Interspeech, 2009, pp. 308-311.
    • (2009) Proc. Interspeech , pp. 308-311
    • Kunikoshi, A.1    Qiao, Y.2    Minematsu, N.3    Hirose, K.4
  • 9
    • 84984984625 scopus 로고    scopus 로고
    • Voice conversion using pitch shifting algorithm by time stretching with PSOLA and re-sampling
    • A. Mousa, "Voice conversion using pitch shifting algorithm by time stretching with PSOLA and re-sampling," J. Elect. Eng., vol. 61, no. 1, pp. 57-61, 2010.
    • (2010) J. Elect. Eng. , vol.61 , Issue.1 , pp. 57-61
    • Mousa, A.1
  • 10
    • 0029254176 scopus 로고
    • Transformation of formants for voice conversion using artificial neural networks
    • M. Narendranath, H. A. Murthy, S. Rajendran, and B. Yegnanarayana, "Transformation of formants for voice conversion using artificial neural networks," Speech Commun., vol. 16, no. 2, pp. 207-216, 1995.
    • (1995) Speech Commun. , vol.16 , Issue.2 , pp. 207-216
    • Narendranath, M.1    Murthy, H.A.2    Rajendran, S.3    Yegnanarayana, B.4
  • 14
    • 44949210554 scopus 로고    scopus 로고
    • Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
    • C. H. Lee and C. H.Wu, "Map-based adaptation for speech conversion using adaptation data selection and non-parallel training," in Proc. Interspeech, 2006, pp. 2254-2257.
    • (2006) Proc. Interspeech , pp. 2254-2257
    • Lee, C.H.1    Wu, C.H.2
  • 15
    • 34547512822 scopus 로고    scopus 로고
    • Eigenvoice conversion based on Gaussian mixture model
    • T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on Gaussian mixture model," in Proc. Interspeech, 2006, pp. 2446-2449.
    • (2006) Proc. Interspeech , pp. 2446-2449
    • Toda, T.1    Ohtani, Y.2    Shikano, K.3
  • 16
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 17
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M. J. F. Gales, "Maximum likelihood linear transformations for HMMbased speech recognition," Comput. Speech Lang., vol. 12, pp. 75-98, 1998. (Pubitemid 128383747)
    • (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.J.F.1
  • 18
    • 0026142334 scopus 로고
    • A study on speaker adaptation of the parameters of continuous density hidden Markov models
    • Apr.
    • C.-H. Lee, C.-H. Lin, and B.-H. Juang, "A study on speaker adaptation of the parameters of continuous density hidden Markov models," IEEE Trans. Audio, Speech, Lang. Process., vol. 39, no. 4, pp. 806-814, Apr. 1991.
    • (1991) IEEE Trans. Audio, Speech, Lang. Process. , vol.39 , Issue.4 , pp. 806-814
    • Lee, C.-H.1    Lin, C.-H.2    Juang, B.-H.3
  • 20
    • 0016939124 scopus 로고
    • Continuous speech recognition by statistical methods
    • F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE, vol. 64, no. 4, pp. 532-556, Apr. 1976. (Pubitemid 8019230)
    • (1976) Proceedings of the IEEE , vol.64 , Issue.4 , pp. 532-556
    • Jelinek, F.1
  • 22
    • 0033884858 scopus 로고    scopus 로고
    • Speaker verification using adapted Gaussian mixture models
    • DOI 10.1006/dspr.1999.0361
    • D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Process., vol. 10, no. 1-3, pp. 19-41, 2000. (Pubitemid 30592166)
    • (2000) Digital Signal Processing: A Review Journal , vol.10 , Issue.1 , pp. 19-41
    • Reynolds, D.A.1    Quatieri, T.F.2    Dunn, R.B.3
  • 24
    • 79959834571 scopus 로고    scopus 로고
    • Probabilistic integration of joint density model and speaker model for voice conversion
    • D. Saito, S.Watanabe, A. Nakamura, and N. Minematsu, "Probabilistic integration of joint density model and speaker model for voice conversion," in Proc. Interspeech, 2010, pp. 1728-1731.
    • (2010) Proc. Interspeech , pp. 1728-1731
    • Saito, D.1    Watanabe, S.2    Nakamura, A.3    Minematsu, N.4
  • 25
    • 0022667694 scopus 로고
    • Speaker-independent isolated word recognition using dynamic features of speech spectrum
    • S. Furui, "Speaker-independent isolated word recognition using dynamic features of speech spectrum," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 1, pp. 52-59, Feb.. 1986. (Pubitemid 16575387)
    • (1986) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-34 , Issue.1 , pp. 52-59
    • Furui Sadaoki1
  • 26
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Nov.
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 27
    • 0028996993 scopus 로고
    • Speech parameter generation from HMM using dynamic features
    • K. Tokuda, T. Kobaayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features," in Proc. ICASSP, 1995, pp. 660-663.
    • (1995) Proc. ICASSP , pp. 660-663
    • Tokuda, K.1    Kobaayashi, T.2    Imai, S.3
  • 28
    • 84905560807 scopus 로고    scopus 로고
    • Voice conversion with smoothedGMM and MAP adaptation
    • Y. Chen, M. Chu, E. Chang, J. Jiu, and R. Liu, "Voice conversion with smoothedGMM and MAP adaptation," in Proc. Eurospeech, 2003, pp. 2413-2416.
    • (2003) Proc. Eurospeech , pp. 2413-2416
    • Chen, Y.1    Chu, M.2    Chang, E.3    Jiu, J.4    Liu, R.5
  • 29
    • 33646773080 scopus 로고    scopus 로고
    • CMU ARCTIC databases for speech synthesis
    • Carnegie Mellon Univ., Pittsburgh, PA, [Online]
    • J. Kominek and A. W. Black, "CMU ARCTIC databases for speech synthesis," Lang. Technol. Inst., Carnegie Mellon Univ., Pittsburgh, PA, 2003[Online].Available: http://festvox.org/cmu-arctic/index.html
    • (2003) Lang. Technol. Inst.
    • Kominek, J.1    Black, A.W.2
  • 31
    • 85133439657 scopus 로고    scopus 로고
    • An introduction of trajectory model into HMM-based speech synthesis
    • H. Zen, K. Tokuda, and T. Kitamura, "An introduction of trajectory model into HMM-based speech synthesis," in Proc. 5th ISCA Speech Synth. Workshop, 2004, pp. 191-196.
    • (2004) Proc. 5th ISCA Speech Synth. Workshop , pp. 191-196
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 32
    • 3042741069 scopus 로고    scopus 로고
    • Variational Bayesian estimation and clustering for speech recognition
    • Jul.
    • S. Watanabe, Y. Minami, A. Nakamura, and N. Ueda, "Variational Bayesian estimation and clustering for speech recognition," IEEE Trans. Speech Audio Process., vol. 12, no. 4, pp. 365-381, Jul. 2004.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.4 , pp. 365-381
    • Watanabe, S.1    Minami, Y.2    Nakamura, A.3    Ueda, N.4
  • 33
    • 80051615070 scopus 로고    scopus 로고
    • High accurate model-integration-based voice conversion using dynamic features and model structure optimization
    • D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, "High accurate model-integration-based voice conversion using dynamic features and model structure optimization," in Proc. ICASSP, 2011, pp. 4576-4579.
    • (2011) Proc. ICASSP , pp. 4576-4579
    • Saito, D.1    Watanabe, S.2    Nakamura, A.3    Minematsu, N.4
  • 34
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
    • (1999) Speech Commun. , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 35
    • 84994241109 scopus 로고    scopus 로고
    • Including dynamic and phonetic information in voice conversion systems
    • H. Duxans, A. Bonafonte, A. Kain, and J. Van Santen, "Including dynamic and phonetic information in voice conversion systems," in Proc. ICSLP, 2004, pp. 1193-1196.
    • (2004) Proc. ICSLP , pp. 1193-1196
    • Duxans, H.1    Bonafonte, A.2    Kain, A.3    Van Santen, J.4
  • 36
    • 34047247202 scopus 로고    scopus 로고
    • Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis
    • Jul.
    • C. H. Wu, C. C. Hsia, T. H. Liu, and J. F. Wang, "Voice conversion using duration-embedded bi-HMMs for expressive speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1109-1116, Jul. 2004.
    • (2004) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.4 , pp. 1109-1116
    • Wu, C.H.1    Hsia, C.C.2    Liu, T.H.3    Wang, F.J.4
  • 37
    • 77956285048 scopus 로고    scopus 로고
    • Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis
    • Nov.
    • C.-C. Hsia, C.-H.Wu, and J.-Y.Wu, "Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 1994-2003, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 1994-2003
    • Hsia, C.-C.1    Wu, C.-H.2    Wu, J.-Y.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.