메뉴 건너뛰기




Volumn 19, Issue 6, 2011, Pages 1702-1710

Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units

Author keywords

Discrete cosine transforms (DCTs); speech synthesis; statistical distributions

Indexed keywords


EID: 85008039410     PISSN: 15587916     EISSN: 15587924     Source Type: Journal    
DOI: 10.1109/TASL.2010.2097248     Document Type: Article
Times cited : (37)

References (31)
  • 1
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” in Proc. Eurospeech, 1999, pp. 2347–2350.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 2
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorithms for HMM-based speech synthesis,” in Proc. ICASSP, 2000, pp. 1315–1318.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 4
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database
    • A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Proc. ICASSP, 1996, pp. 373–376.
    • (1996) Proc. ICASSP , pp. 373-376
    • Hunt, A.1    Black, A.2
  • 5
    • 33847129573 scopus 로고    scopus 로고
    • Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
    • J. Yamagishi and T. Kobayashi “Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training,” IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533–543, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
    • Yamagishi, J.1    Kobayashi, T.2
  • 6
    • 51449114529 scopus 로고    scopus 로고
    • A style control technique for HMM-based expressive speech synthesis
    • T. Nose, J. Yamagishi, and T. Kobayashi “A style control technique for HMM-based expressive speech synthesis,” IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406–1413, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.9 , pp. 1406-1413
    • Nose, T.1    Yamagishi, J.2    Kobayashi, T.3
  • 7
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black “Statistical parametric speech synthesis,” Speech Commun., vol. 51, no. 11, pp. 1039–1064, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 8
    • 33846410497 scopus 로고    scopus 로고
    • Speech parameter generation algorithm considering global variance for HMM-Based speech synthesis
    • T. Toda and K. Tokuda, “Speech parameter generation algorithm considering global variance for HMM-Based speech synthesis,” in Proc. Eurospeech, 2005, pp. 373–376.
    • (2005) Proc. Eurospeech , pp. 373-376
    • Toda, T.1    Tokuda, K.2
  • 9
    • 34547497133 scopus 로고    scopus 로고
    • Combining gaussian mixture model with global variance term to improve the quality of an HMM-based polyglot speech synthesizer
    • J. Latorre, K. Iwano, and S. Furui, “Combining gaussian mixture model with global variance term to improve the quality of an HMM-based polyglot speech synthesizer,” in Proc. ICASSP, 2007, pp. 1241–1244.
    • (2007) Proc. ICASSP , pp. 1241-1244
    • Latorre, J.1    Iwano, K.2    Furui, S.3
  • 10
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
    • H. Zen, K. Tokuda, and T. Kitamura “Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences,” Comput. Speech Lang., vol. 21, no. 1, pp. 153–173, 2007.
    • (2007) Comput. Speech Lang. , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 11
    • 34547517493 scopus 로고    scopus 로고
    • Full HMM Training for Minimizing Generation Error in Synthesis
    • Y.-J. Wu, R.-H. Wang, and F. K. Soong, “Full HMM Training for Minimizing Generation Error in Synthesis,” in Proc. ICASSP, 2007, pp. 517–520.
    • (2007) Proc. ICASSP , pp. 517-520
    • Wu, Y.-J.1    Wang, R.-H.2    Soong, F.K.3
  • 13
    • 85009257781 scopus 로고    scopus 로고
    • F0 generation for speech synthesis using a multi-tier approach
    • X.-J. Sun, “F0 generation for speech synthesis using a multi-tier approach,” in Proc. ICSLP, 2002, pp. 2077–2080.
    • (2002) Proc. ICSLP , pp. 2077-2080
    • Sun, X.-J.1
  • 14
    • 33646821329 scopus 로고    scopus 로고
    • Additive modeling of english f0 contour for speech synthesis
    • S. Sakai, “Additive modeling of english f0 contour for speech synthesis,” in Proc. ICASSP, 2005, pp. 277–280.
    • (2005) Proc. ICASSP , pp. 277-280
    • Sakai, S.1
  • 15
    • 48549095974 scopus 로고    scopus 로고
    • HMM-based trainable speech synthesis for Chinese
    • Y.-J. Wu and R.-H. Wang “HMM-based trainable speech synthesis for Chinese,” J. Chinese Inf. Process., vol. 20, no. 4, pp. 75–81, 2006.
    • (2006) J. Chinese Inf. Process. , vol.20 , Issue.4 , pp. 75-81
    • Wu, Y.-J.1    Wang, R.-H.2
  • 16
    • 84867200235 scopus 로고    scopus 로고
    • Generating natural F0 trajectory with additive trees
    • Y. Qian, H. Fiang, and F. K. Song, “Generating natural F0 trajectory with additive trees,” in Proc. Interspeech, 2008, pp. 2126–2129.
    • (2008) Proc. Interspeech , pp. 2126-2129
    • Qian, Y.1    Fiang, H.2    Song, F.K.3
  • 17
    • 41049090228 scopus 로고    scopus 로고
    • Phone duration modeling using gradient tree boosting
    • J. Yamagishi, H. Kawai, and T. Kobayashi “Phone duration modeling using gradient tree boosting,” Speech Commun., vol. 50, no. 5, pp. 405–415, 2008.
    • (2008) Speech Commun. , vol.50 , Issue.5 , pp. 405-415
    • Yamagishi, J.1    Kawai, H.2    Kobayashi, T.3
  • 18
    • 67650851610 scopus 로고    scopus 로고
    • Improved prosody generation by maximizing joint likelihood of state and longer units
    • Y. Qian, Z.-Z. Wu, and F. K. Song, “Improved prosody generation by maximizing joint likelihood of state and longer units,” in Proc. ICASSP, 2009, pp. 3781–3784.
    • (2009) Proc. ICASSP , pp. 3781-3784
    • Qian, Y.1    Wu, Z.-Z.2    Song, F.K.3
  • 19
    • 84867194192 scopus 로고    scopus 로고
    • Multilevel parametric-base F0 model for speech synthesis
    • J. Latorre and M. Akamine, “Multilevel parametric-base F0 model for speech synthesis,” in Proc. Interspeech, 2008, pp. 2274–2277.
    • (2008) Proc. Interspeech , pp. 2274-2277
    • Latorre, J.1    Akamine, M.2
  • 20
    • 33846442604 scopus 로고    scopus 로고
    • Investigation of state duration model based on gamma distribution for HMM-based speech synthesis
    • Y. Ishimatsu, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Investigation of state duration model based on gamma distribution for HMM-based speech synthesis,” IEICE Tech. Rep., vol. 101, no. 352, pp. 57–62, 2001.
    • (2001) IEICE Tech. Rep. , vol.101 , Issue.352 , pp. 57-62
    • Ishimatsu, Y.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 21
    • 24144455395 scopus 로고    scopus 로고
    • Context-Dependent phoneme duration modeling with Tree-Based state tying
    • S. J. Park, M. W. Koo, and C. S. Jhon “Context-Dependent phoneme duration modeling with Tree-Based state tying,” IEICE Trans. Inf. Syst., vol. E88-D, no. 3, pp. 662–666, 2005.
    • (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 662-666
    • Park, S.J.1    Koo, M.W.2    Jhon, C.S.3
  • 22
    • 84867218426 scopus 로고    scopus 로고
    • Duration refinement by jointly optimizing state and longer unit likelihood
    • B.-Y. Gao, Y. Qian, Z.-Z. Wu, and F. K. Soong, “Duration refinement by jointly optimizing state and longer unit likelihood,” in Proc. Interspeech, 2008, pp. 2266–2269.
    • (2008) Proc. Interspeech , pp. 2266-2269
    • Gao, B.-Y.1    Qian, Y.2    Wu, Z.-Z.3    Soong, F.K.4
  • 23
    • 51449117929 scopus 로고    scopus 로고
    • Modelling and synthesising F0 contours with the discrete cosine transform
    • J. Teutenberg, C. Watson, and P. Riddle, “Modelling and synthesising F0 contours with the discrete cosine transform,” in Proc. ICASSP, 2008, pp. 3973–3976.
    • (2008) Proc. ICASSP , pp. 3973-3976
    • Teutenberg, J.1    Watson, C.2    Riddle, P.3
  • 24
    • 60849112575 scopus 로고    scopus 로고
    • Modeling and generating tone contour with phrase intonation for Mandarin Chinese speech
    • Z.-Z. Wu, Y. Qian, F. K. Soong, and B. Zhang, “Modeling and generating tone contour with phrase intonation for Mandarin Chinese speech,” in Proc. ISCSLP, 2008, pp. 121–124.
    • (2008) Proc. ISCSLP , pp. 121-124
    • Wu, Z.-Z.1    Qian, Y.2    Soong, F.K.3    Zhang, B.4
  • 28
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • 4
    • H. Kawahara, I. Masuda Katsuse, and A. de Cheveigne “Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,” Speech Commun., vol. 27, no. 3–4, pp. 187–207, 1999.
    • (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Katsuse, I.M.2    de Cheveigne, A.3
  • 29
    • 0001455934 scopus 로고
    • chapter A robust algorithm for pitch tracking (RAPT)
    • Amsterdam, The Netherlands: Elservier
    • A. D. Talkin, “chapter A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis. Amsterdam, The Netherlands: Elservier, 1995.
    • (1995) Speech Coding and Synthesis
    • Talkin, A.D.1
  • 30
    • 0033906251 scopus 로고    scopus 로고
    • MDL-based context-dependent subword modeling for speech recognition
    • K. Shinoda and T. Watanable “MDL-based context-dependent subword modeling for speech recognition,” J. Acoust. Soc. Jpn(E), vol. 21, no. 2, pp. 79–86, 2000.
    • (2000) J. Acoust. Soc. Jpn(E) , vol.21 , Issue.2 , pp. 79-86
    • Shinoda, K.1    Watanable, T.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.