메뉴 건너뛰기




Volumn 18, Issue 8, 2010, Pages 1994-2003

Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis

Author keywords

Dynamic features; hidden Markov model (HMM) based speech synthesis; pitch modeling and generation; prosody hierarchy

Indexed keywords

DYNAMIC FEATURES; DYNAMIC PITCH; FRAME LAYER; HMM-BASED SPEECH SYNTHESIS; MINIMUM DESCRIPTION LENGTH; PITCH CONTOURS; PITCH MODELING; PROSODIC FEATURES; PROSODIC STRUCTURE; PROSODIC WORDS; PROSODY HIERARCHY; PROSODY MODEL; STATIC FEATURES; STATISTICAL HYPOTHESIS TESTING; SUBJECTIVE EVALUATIONS; SUPERVISED CLASSIFICATION; TEMPORAL CORRELATIONS;

EID: 77956285048     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2010.2040791     Document Type: Article
Times cited : (35)

References (40)
  • 1
    • 54249088981 scopus 로고    scopus 로고
    • A comparison of grapheme and phoneme-based units for Spanish spoken term detection
    • Nov.-Dec.
    • J. Tejedor, D. Wang, J. Frankel, S. King, and J. Colás, "A comparison of grapheme and phoneme-based units for Spanish spoken term detection," Speech Commun., vol. 50, no. 11-12, pp. 980-991, Nov.-Dec. 2008.
    • (2008) Speech Commun , vol.50 , Issue.11-12 , pp. 980-991
    • Tejedor, J.1    Wang, D.2    Frankel, J.3    King, S.4    Colás, J.5
  • 2
    • 64149128218 scopus 로고    scopus 로고
    • Variable-length unit selection in TTS using structural syntactic cost
    • May
    • C.-H.Wu, C.-C. Hsia, J.-F. Chen, and J.-F.Wang, "Variable-length unit selection in TTS using structural syntactic cost," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1227-1235, May 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.4 , pp. 1227-1235
    • Wu, C.-H.1    Hsia, C.-C.2    Chen, J.-F.3    Wang, J.-F.4
  • 3
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database
    • A. J. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. ICASSP'96, 1996, pp. 373-376.
    • (1996) Proc. ICASSP'96 , pp. 373-376
    • Hunt, A.J.1    Black, A.W.2
  • 4
    • 84966398940 scopus 로고
    • Optimizing selection of units from speech database for concatenative synthesis
    • A. W. Black and N. Campbell, "Optimizing selection of units from speech database for concatenative synthesis," in Proc. Eurospeech'95, 1995, pp. 581-584.
    • (1995) Proc. Eurospeech'95 , pp. 581-584
    • Black, A.W.1    Campbell, N.2
  • 6
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • Jun.
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, Jun. 2000, pp. 1315-1318.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 9
    • 84856036312 scopus 로고
    • A corpus-based Mandarin text-to-speech synthesizer
    • Yokohama, Japan, Sep.
    • A. Benijamin, S. Chilin, and S. Richard, "A corpus-based Mandarin text-to-speech synthesizer," in Proc. ICSLP'94, Yokohama, Japan, Sep. 1994, pp. 1771-1774.
    • (1994) Proc. ICSLP'94 , pp. 1771-1774
    • Benijamin, A.1    Chilin, S.2    Richard, S.3
  • 10
    • 0022796218 scopus 로고
    • Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models
    • Oct.
    • L. Andrej and F. Frank, "Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 5, pp. 1074-1080, Oct. 1986.
    • (1986) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-34 , Issue.5 , pp. 1074-1080
    • Andrej, L.1    Frank, F.2
  • 11
    • 0034509204 scopus 로고    scopus 로고
    • Prosody model in a Mandarin text-to-speech system based on a hierarchical approach
    • NY, Jul.
    • N. H. Pan, W. T. Jen, S. S. Yu, M. S. Yu, S. Y. Huang, and M. J. Wu, "Prosody model in a Mandarin text-to-speech system based on a hierarchical approach," in Proc. IEEE Int. Conf. Multimedia Expo, NY, Jul. 2000, vol. 1, pp. 448-451.
    • (2000) Proc. IEEE Int. Conf. Multimedia Expo , vol.1 , pp. 448-451
    • Pan, N.H.1    Jen, W.T.2    Yu, S.S.3    Yu, M.S.4    Huang, S.Y.5    Wu, M.J.6
  • 12
    • 0032073761 scopus 로고    scopus 로고
    • An RNN-based prosodic information synthesizer for Mandarin text-to-speech
    • May
    • S. H. Chen, S. H. Hwang, and Y. R. Wang, "An RNN-based prosodic information synthesizer for Mandarin text-to-speech," IEEE Trans. Acoust., Speech, Signal Process., vol. 6, no. 3, pp. 226-269, May 1998.
    • (1998) IEEE Trans. Acoust., Speech, Signal Process. , vol.6 , Issue.3 , pp. 226-269
    • Chen, S.H.1    Hwang, S.H.2    Wang, Y.R.3
  • 13
    • 77956275334 scopus 로고    scopus 로고
    • Efficient model of establishing words tone dictionary for korean TTS system
    • Greece, Sep.
    • S. H. Kim and J. Y. Kim, "Efficient model of establishing words tone dictionary for korean TTS system," in Proc. Eurospeech, Rhodes, Greece, Sep. 1997, pp. 243-246.
    • (1997) Proc. Eurospeech, Rhodes , pp. 243-246
    • Kim, S.H.1    Kim, J.Y.2
  • 14
    • 85009282418 scopus 로고    scopus 로고
    • Pitch contour model for chinese text-tospeech using CART and statistical model
    • Denver, CO, Sep.
    • M. Dong and K. T. Lua, "Pitch contour model for chinese text-tospeech using CART and statistical model," in Proc. ICSLP'02, Denver, CO, Sep. 2002, pp. 2405-2408.
    • (2002) Proc. ICSLP'02 , pp. 2405-2408
    • Dong, M.1    Lua, K.T.2
  • 15
    • 0035478985 scopus 로고    scopus 로고
    • Automatic generation of synthesis units and prosodic information for chinese concatenative synthesis
    • C. H. Wu and J. H. Chen, "Automatic generation of synthesis units and prosodic information for chinese concatenative synthesis," Speech Commun., vol. 35, pp. 219-237, 2001.
    • (2001) Speech Commun , vol.35 , pp. 219-237
    • Wu, C.H.1    Chen, J.H.2
  • 16
    • 22944450058 scopus 로고    scopus 로고
    • F0 Prediction model of speech synthesis based on template and statistical method
    • New York: Springer
    • J. Tao, "F0 Prediction model of speech synthesis based on template and statistical method," in Lecture Nodes of Artificial Intelligence. New York: Springer, 2004.
    • (2004) Lecture Nodes of Artificial Intelligence
    • Tao, J.1
  • 17
    • 21844474040 scopus 로고    scopus 로고
    • Fluent speech prosody: Framework and modeling
    • C. Y. Tseng, S. H. Pin, Y. Lee, H. M. Wang, and Y. C. Chen, "Fluent speech prosody: Framework and modeling," Speech Commun., vol. 46, no. 3-4, pp. 284-309, 2005.
    • (2005) Speech Commun , vol.46 , Issue.3-4 , pp. 284-309
    • Tseng, C.Y.1    Pin, S.H.2    Lee, Y.3    Wang, H.M.4    Chen, Y.C.5
  • 19
    • 21444454844 scopus 로고    scopus 로고
    • Speech rate and prosody units: Evidence of interaction from Mandarin Chinese
    • Nara, Japan, Mar.
    • C. Y. Tseng and Y. L. Lee, "Speech rate and prosody units: Evidence of interaction from Mandarin Chinese," in Proc. Int. Conf. Speech Prosody, Nara, Japan, Mar. 2004, pp. 251-254.
    • (2004) Proc. Int. Conf. Speech Prosody , pp. 251-254
    • Tseng, C.Y.1    Lee, Y.L.2
  • 20
    • 13544257213 scopus 로고    scopus 로고
    • A statistics-based pitch contour model for Mandarin speech
    • S. H. Chen, W. H. Lai, and Y. R. Wang, "A statistics-based pitch contour model for Mandarin speech," J. Acoust. Soc. Amer., vol. 117, no. 2, pp. 908-925, 2005.
    • (2005) J. Acoust. Soc. Amer. , vol.117 , Issue.2 , pp. 908-925
    • Chen, S.H.1    Lai, W.H.2    Wang, Y.R.3
  • 21
    • 21444431930 scopus 로고    scopus 로고
    • Locating boundaries for prosodic constituents in unrestricted Mandarin texts
    • M. Chu and Y. Qian, "Locating boundaries for prosodic constituents in unrestricted Mandarin texts," Comput. Linguist. Chinese Lang. Process., vol. 6, no. 1, pp. 61-82, 2001.
    • (2001) Comput. Linguist. Chinese Lang. Process. , vol.6 , Issue.1 , pp. 61-82
    • Chu, M.1    Qian, Y.2
  • 22
    • 21444433495 scopus 로고    scopus 로고
    • A Mandarin TTS system with an integrated prosodic model
    • Hong Kong, Dec.
    • S. H. Pin, Y. L. Lee, Y. C. Chen, H. M. Wang, and C. Y. Tseng, "A Mandarin TTS system with an integrated prosodic model," in Proc. ISCSLP'04, Hong Kong, Dec. 2004, pp. 169-172.
    • (2004) Proc. ISCSLP'04 , pp. 169-172
    • Pin, S.H.1    Lee, Y.L.2    Chen, Y.C.3    Wang, H.M.4    Tseng, C.Y.5
  • 23
    • 85011187169 scopus 로고
    • Analysis of voice fundamental frequency contours for declarative sentence of japanese
    • H. Fujisaki and K. Hirose, "Analysis of voice fundamental frequency contours for declarative sentence of japanese," J. Acoust. Soc. Japan (E), vol. 5, no. 4, pp. 233-241, 1984.
    • (1984) J. Acoust. Soc. Japan (E) , vol.5 , Issue.4 , pp. 233-241
    • Fujisaki, H.1    Hirose, K.2
  • 24
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
    • Budapest, Hungary, Sep.
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. Eurospeech'99, Budapest, Hungary, Sep. 1999, pp. 2347-2350.
    • (1999) Proc. Eurospeech'99 , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 26
    • 4544383281 scopus 로고    scopus 로고
    • The tile intonation model
    • Sydney, Australia, Nov.
    • P. Taylor, "The tile intonation model," in Proc. ICSLP'98, Sydney, Australia, Nov. 1998, pp. 1383-1386.
    • (1998) Proc. ICSLP'98 , pp. 1383-1386
    • Taylor, P.1
  • 27
    • 51449117929 scopus 로고    scopus 로고
    • Modeling and synthesising F0 contours with the discrete cosine transform
    • Las Vegas, NV, Mar.
    • J. Teutenberg, C. Watson, and P. Riddle, "Modeling and synthesising F0 contours with the discrete cosine transform," in Proc. ICASSP'08, Las Vegas, NV, Mar. 2008, pp. 3973-3976.
    • (2008) Proc. ICASSP'08 , pp. 3973-3976
    • Teutenberg, J.1    Watson, C.2    Riddle, P.3
  • 28
    • 84867194192 scopus 로고    scopus 로고
    • Multilevel parametric-based F0 model for speech synthesis
    • Brisbane, Australia, Sep.
    • J. Latorre and M. Akamine, "Multilevel parametric-based F0 model for speech synthesis," in Proc. Interspeech'08, Brisbane, Australia, Sep. 2008, pp. 2274-2277.
    • (2008) Proc. Interspeech'08 , pp. 2274-2277
    • Latorre, J.1    Akamine, M.2
  • 29
    • 0025495218 scopus 로고
    • Vector quantization of pitch information in Mandarin speech
    • Sep.
    • S. H. Chen and Y. R. Wang, "Vector quantization of pitch information in Mandarin speech," IEEE Trans. Commun., vol. 38, no. 9, pp. 1317-1320, Sep. 1990.
    • (1990) IEEE Trans. Commun. , vol.38 , Issue.9 , pp. 1317-1320
    • Chen, S.H.1    Wang, Y.R.2
  • 30
    • 0030677481 scopus 로고    scopus 로고
    • Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited
    • Munich, Germany, Apr.
    • H. Kawahara, "Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited," in Proc. ICASSP'97, Munich, Germany, Apr. 1997, vol. 2, pp. 1303-1306.
    • (1997) Proc. ICASSP'97 , vol.2 , pp. 1303-1306
    • Kawahara, H.1
  • 31
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch adaptive time-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch adaptive time-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
    • (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 32
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • San Francisco, CA, Mar.
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP '92, San Francisco, CA, Mar. 1992, vol. 1, pp. 137-140.
    • (1992) Proc. ICASSP '92 , vol.1 , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 33
    • 85156206534 scopus 로고    scopus 로고
    • Fast exact inference with a factored model for natural language parsing
    • Cambridge, MA: MIT Press, 15 (NIPS)
    • D. Klein and C. D. Manning, "Fast exact inference with a factored model for natural language parsing," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, pp. 3-10, 15 (NIPS 2002).
    • (2002) Advances in Neural Information Processing Systems , pp. 3-10
    • Klein, D.1    Manning, C.D.2
  • 34
    • 0033906251 scopus 로고    scopus 로고
    • MDL-based context-dependent subword modeling for speech recognition
    • Mar.
    • K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Japan (English), vol. 21, pp. 79-86, Mar. 2000.
    • (2000) J. Acoust. Soc. Japan (English) , vol.21 , pp. 79-86
    • Shinoda, K.1    Watanabe, T.2
  • 35
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the em algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Statist. Soc. B, vol. 39, pp. 1-38, 1977.
    • (1977) J. R. Statist. Soc. B , vol.39 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 36
    • 44449179384 scopus 로고    scopus 로고
    • TH-CoSS, aMandarin speech corpus for TTS
    • Mar.
    • L. H. Cai, D. D. Cui, and R. Cai, "TH-CoSS, aMandarin speech corpus for TTS," J. Chinese Inf. Process., vol. 21, no. 2, pp. 94-99, Mar. 2007.
    • (2007) J. Chinese Inf. Process. , vol.21 , Issue.2 , pp. 94-99
    • Cai, L.H.1    Cui, D.D.2    Cai, R.3
  • 37
    • 4544354696 scopus 로고    scopus 로고
    • Segmental tonal modeling for phone set design in Mandarin LVCSR
    • Montreal, QC, Canada, May
    • C. Huang, Y. Shi, J. Zhou, M. Chu, T.Wang, and E. Chang, "Segmental tonal modeling for phone set design in Mandarin LVCSR," in Proc. ICASSP'04, Montreal, QC, Canada, May 2004, pp. 901-904.
    • (2004) Proc. ICASSP'04 , pp. 901-904
    • Huang, C.1    Shi, Y.2    Zhou, J.3    Chu, M.4    Wang, T.5    Chang, E.6
  • 38
    • 4544303009 scopus 로고
    • Beijing China: Beijing Univ. Press
    • T. Lin and L. J. Wang, Phonetic Tutorials. Beijing, China: Beijing Univ. Press, 1992, pp. 103-121.
    • (1992) Phonetic Tutorials , pp. 103-121
    • Lin, T.1    Wang, L.J.2
  • 40
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • Toulouse, France, May
    • Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP'06, Toulouse, France, May 2006, pp. 89-92.
    • (2006) Proc. ICASSP'06 , pp. 89-92
    • Wu, Y.-J.1    Wang, R.-H.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.