메뉴 건너뛰기




Volumn , Issue , 2010, Pages 4610-4613

Simple methods for improving speaker-similarity of HMM-based speech synthesis

Author keywords

HMM; HTS; Speech synthesis; TTS

Indexed keywords

SIGNAL PROCESSING; SPEECH RECOGNITION;

EID: 78049403515     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2010.5495562     Document Type: Conference Paper
Times cited : (14)

References (18)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • Nov.
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Communication, vol. 51, no. 11, pp. 1039-1064, Nov. 2009.
    • (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 3
    • 67650819492 scopus 로고    scopus 로고
    • The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
    • Sep.
    • J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, and K. Tokuda, "The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge," in Proc. Blizzard Challenge 2008, Brisbane, Australia, Sep. 2008.
    • (2008) Proc. Blizzard Challenge 2008, Brisbane, Australia
    • Yamagishi, J.1    Zen, H.2    Wu, Y.-J.3    Toda, T.4    Tokuda, K.5
  • 4
    • 0002648826 scopus 로고
    • A model of loudness summation
    • E. Zwicker and B. Scharf, "A model of loudness summation," Psych. Rev., vol. 72, pp. 2-26, 1965.
    • (1965) Psych. Rev. , vol.72 , pp. 2-26
    • Zwicker, E.1    Scharf, B.2
  • 5
    • 78049361102 scopus 로고    scopus 로고
    • Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
    • Aug. in Japanese
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis," IEICE Trans., vol. J87-D-II, no. 8, pp. 1565-1571, Aug. 2004, (in Japanese).
    • (2004) IEICE Trans. , vol.J87-D-II , Issue.8 , pp. 1565-1571
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 6
    • 85011187169 scopus 로고    scopus 로고
    • Analysis of voice fundamental frequency contours for declarative sentences of Japanese
    • Oct.
    • H. Fujisaki and K. Hirose, "Analysis of voice fundamental frequency contours for declarative sentences of Japanese," J. Acoust. Soc. Japan (E), vol. 5, no. 4, pp. 233-242, Oct. 2000.
    • (2000) J. Acoust. Soc. Japan (E) , vol.5 , Issue.4 , pp. 233-242
    • Fujisaki, H.1    Hirose, K.2
  • 7
    • 70450161300 scopus 로고    scopus 로고
    • Thousands of voices for HMM-based speech synthesis
    • Sep.
    • J. Yamagishi et al., "Thousands of voices for HMM-based speech synthesis," in Proc. Interspeech 2009, Brighton, U.K., Sep. 2009, pp. 420-423.
    • (2009) Proc. Interspeech 2009, Brighton, U.K. , pp. 420-423
    • Yamagishi, J.1
  • 9
    • 0001310760 scopus 로고
    • Spectral estimation of speech based on mel-cepstral representation
    • Aug. in Japanese
    • K. Tokuda, T. Kobayashi, T. Fukada, H. Saito, and S. Imai, "Spectral estimation of speech based on mel-cepstral representation," IEICE Trans. Fundamentals, vol. J74-A, no. 8, pp. 1240-1248, Aug. 1991, in Japanese.
    • (1991) IEICE Trans. Fundamentals , vol.J74-A , Issue.8 , pp. 1240-1248
    • Tokuda, K.1    Kobayashi, T.2    Fukada, T.3    Saito, H.4    Imai, S.5
  • 11
    • 0016938506 scopus 로고
    • Auditory filter shapes derived with noise stimuli
    • Mar.
    • R. Patterson, "Auditory filter shapes derived with noise stimuli," Journal of the Acoustical Society of America, vol. 76, pp. 640-654, Mar. 1982.
    • (1982) Journal of the Acoustical Society of America , vol.76 , pp. 640-654
    • Patterson, R.1
  • 13
    • 84898967346 scopus 로고    scopus 로고
    • Gaussianization
    • Nov.
    • S. S. Chen and R. A. Gopinath, "Gaussianization," in NIPS 2000, Nov. 2000, pp. 423-429.
    • (2000) NIPS 2000 , pp. 423-429
    • Chen, S.S.1    Gopinath, R.A.2
  • 14
    • 85030493378 scopus 로고    scopus 로고
    • Synthesis of regional English using a keyword lexicon
    • Sep.
    • S. Fitt and S. Isard, "Synthesis of regional English using a keyword lexicon," in Proc. Eurospeech 1999, vol. 2, Sep. 1999, pp. 823-826.
    • (1999) Proc. Eurospeech 1999 , vol.2 , pp. 823-826
    • Fitt, S.1    Isard, S.2
  • 15
    • 34047123652 scopus 로고    scopus 로고
    • Multisyn: Open-domain unit selection for the Festival speech synthesis system
    • R. A. J. Clark, K. Richmond, and S. King, "Multisyn: Open-domain unit selection for the Festival speech synthesis system," Speech Communication, vol. 49, no. 4, pp. 317-330, 2007.
    • (2007) Speech Communication , vol.49 , Issue.4 , pp. 317-330
    • Clark, R.A.J.1    Richmond, K.2    King, S.3
  • 16
    • 33846405723 scopus 로고    scopus 로고
    • Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
    • Jan.
    • H. Zen, T. Toda, M. Nakamura, and K. Tokuda, "Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. & Syst., vol. E90-D, no. 1, pp. 325-333, Jan. 2007.
    • (2007) IEICE Trans. Inf. & Syst. , vol.E90-D , Issue.1 , pp. 325-333
    • Zen, H.1    Toda, T.2    Nakamura, M.3    Tokuda, K.4
  • 17
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds," Speech Communication, vol. 27, pp. 187-207, 1999.
    • (1999) Speech Communication , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    Cheveigné, A.3
  • 18
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • May
    • Y. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP 2006, May 2006, pp. 89-92.
    • (2006) Proc. ICASSP 2006 , pp. 89-92
    • Wu, Y.1    Wang, R.-H.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.