SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 18, Issue 8, 2010, Pages 1994-2003

Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM-based speech synthesis

(3) Hsia, Chi Chun a Wu, Chung Hsien b Wu, Jung Yun b

a INDUSTRIAL TECHNOLOGY RESEARCH INSTITUTE (Taiwan)

b NATIONAL CHENG KUNG UNIVERSITY (Taiwan)

Author keywords

Dynamic features; hidden Markov model (HMM) based speech synthesis; pitch modeling and generation; prosody hierarchy

Indexed keywords

DYNAMIC FEATURES; DYNAMIC PITCH; FRAME LAYER; HMM-BASED SPEECH SYNTHESIS; MINIMUM DESCRIPTION LENGTH; PITCH CONTOURS; PITCH MODELING; PROSODIC FEATURES; PROSODIC STRUCTURE; PROSODIC WORDS; PROSODY HIERARCHY; PROSODY MODEL; STATIC FEATURES; STATISTICAL HYPOTHESIS TESTING; SUBJECTIVE EVALUATIONS; SUPERVISED CLASSIFICATION; TEMPORAL CORRELATIONS;

FUNCTION EVALUATION; HIDDEN MARKOV MODELS; SPEECH SYNTHESIS; SUBJECTIVE TESTING; TESTING;

CONTINUOUS SPEECH RECOGNITION;

EID: 77956285048 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2010.2040791 Document Type: Article

Times cited : (35)

References (40)

1
- 54249088981
- A comparison of grapheme and phoneme-based units for Spanish spoken term detection
- Nov.-Dec.
- J. Tejedor, D. Wang, J. Frankel, S. King, and J. Colás, "A comparison of grapheme and phoneme-based units for Spanish spoken term detection," Speech Commun., vol. 50, no. 11-12, pp. 980-991, Nov.-Dec. 2008.
- (2008) Speech Commun , vol.50 , Issue.11-12 , pp. 980-991
- Tejedor, J.¹ Wang, D.² Frankel, J.³ King, S.⁴ Colás, J.⁵

2
- 64149128218
- Variable-length unit selection in TTS using structural syntactic cost
- May
- C.-H.Wu, C.-C. Hsia, J.-F. Chen, and J.-F.Wang, "Variable-length unit selection in TTS using structural syntactic cost," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 4, pp. 1227-1235, May 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.4 , pp. 1227-1235
- Wu, C.-H.¹ Hsia, C.-C.² Chen, J.-F.³ Wang, J.-F.⁴

3
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- A. J. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. ICASSP'96, 1996, pp. 373-376.
- (1996) Proc. ICASSP'96 , pp. 373-376
- Hunt, A.J.¹ Black, A.W.²

4
- 84966398940
- Optimizing selection of units from speech database for concatenative synthesis
- A. W. Black and N. Campbell, "Optimizing selection of units from speech database for concatenative synthesis," in Proc. Eurospeech'95, 1995, pp. 581-584.
- (1995) Proc. Eurospeech'95 , pp. 581-584
- Black, A.W.¹ Campbell, N.²

5
- 85133720638
- The HMM-based speech synthesis system version 2.0
- Bonn, Germany, Aug.
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. W. Black, and K. Tokuda, "The HMM-based speech synthesis system version 2.0," in Proc. ISCA SSW6, Bonn, Germany, Aug. 2007.
- (2007) Proc. ISCA SSW6
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.W.⁶ Tokuda, K.⁷

6
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- Jun.
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, Jun. 2000, pp. 1315-1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

7
- 85008006694
- Robust speaker-adaptive HMM-based text-to-speech synthesis
- Aug.
- J. Yamagishi, T. Nose, H. Zen, Z. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, "Robust speaker-adaptive HMM-based text-to-speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 6, pp. 1208-1230, Aug. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

8
- 0024736612
- The synthesis rules in a chinese text-to-speech system
- Sep.
- L. S. Lee, C. Y. Tseng, and M. Ouh-young, "The synthesis rules in a chinese text-to-speech system," IEEE Trans. Acoust., Speech, Signal Process., vol. 37, no. 9, pp. 1309-1319, Sep. 1989.
- (1989) IEEE Trans. Acoust., Speech, Signal Process. , vol.37 , Issue.9 , pp. 1309-1319
- Lee, L.S.¹ Tseng, C.Y.² Ouh-Young, M.³

9
- 84856036312
- A corpus-based Mandarin text-to-speech synthesizer
- Yokohama, Japan, Sep.
- A. Benijamin, S. Chilin, and S. Richard, "A corpus-based Mandarin text-to-speech synthesizer," in Proc. ICSLP'94, Yokohama, Japan, Sep. 1994, pp. 1771-1774.
- (1994) Proc. ICSLP'94 , pp. 1771-1774
- Benijamin, A.¹ Chilin, S.² Richard, S.³

10
- 0022796218
- Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models
- Oct.
- L. Andrej and F. Frank, "Synthesis of natural sounding pitch contours in isolated utterances using hidden Markov models," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 5, pp. 1074-1080, Oct. 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-34 , Issue.5 , pp. 1074-1080
- Andrej, L.¹ Frank, F.²

11
- 0034509204
- Prosody model in a Mandarin text-to-speech system based on a hierarchical approach
- NY, Jul.
- N. H. Pan, W. T. Jen, S. S. Yu, M. S. Yu, S. Y. Huang, and M. J. Wu, "Prosody model in a Mandarin text-to-speech system based on a hierarchical approach," in Proc. IEEE Int. Conf. Multimedia Expo, NY, Jul. 2000, vol. 1, pp. 448-451.
- (2000) Proc. IEEE Int. Conf. Multimedia Expo , vol.1 , pp. 448-451
- Pan, N.H.¹ Jen, W.T.² Yu, S.S.³ Yu, M.S.⁴ Huang, S.Y.⁵ Wu, M.J.⁶

12
- 0032073761
- An RNN-based prosodic information synthesizer for Mandarin text-to-speech
- May
- S. H. Chen, S. H. Hwang, and Y. R. Wang, "An RNN-based prosodic information synthesizer for Mandarin text-to-speech," IEEE Trans. Acoust., Speech, Signal Process., vol. 6, no. 3, pp. 226-269, May 1998.
- (1998) IEEE Trans. Acoust., Speech, Signal Process. , vol.6 , Issue.3 , pp. 226-269
- Chen, S.H.¹ Hwang, S.H.² Wang, Y.R.³

13
- 77956275334
- Efficient model of establishing words tone dictionary for korean TTS system
- Greece, Sep.
- S. H. Kim and J. Y. Kim, "Efficient model of establishing words tone dictionary for korean TTS system," in Proc. Eurospeech, Rhodes, Greece, Sep. 1997, pp. 243-246.
- (1997) Proc. Eurospeech, Rhodes , pp. 243-246
- Kim, S.H.¹ Kim, J.Y.²

14
- 85009282418
- Pitch contour model for chinese text-tospeech using CART and statistical model
- Denver, CO, Sep.
- M. Dong and K. T. Lua, "Pitch contour model for chinese text-tospeech using CART and statistical model," in Proc. ICSLP'02, Denver, CO, Sep. 2002, pp. 2405-2408.
- (2002) Proc. ICSLP'02 , pp. 2405-2408
- Dong, M.¹ Lua, K.T.²

15
- 0035478985
- Automatic generation of synthesis units and prosodic information for chinese concatenative synthesis
- C. H. Wu and J. H. Chen, "Automatic generation of synthesis units and prosodic information for chinese concatenative synthesis," Speech Commun., vol. 35, pp. 219-237, 2001.
- (2001) Speech Commun , vol.35 , pp. 219-237
- Wu, C.H.¹ Chen, J.H.²

16
- 22944450058
- F0 Prediction model of speech synthesis based on template and statistical method
- New York: Springer
- J. Tao, "F0 Prediction model of speech synthesis based on template and statistical method," in Lecture Nodes of Artificial Intelligence. New York: Springer, 2004.
- (2004) Lecture Nodes of Artificial Intelligence
- Tao, J.¹

17
- 21844474040
- Fluent speech prosody: Framework and modeling
- C. Y. Tseng, S. H. Pin, Y. Lee, H. M. Wang, and Y. C. Chen, "Fluent speech prosody: Framework and modeling," Speech Commun., vol. 46, no. 3-4, pp. 284-309, 2005.
- (2005) Speech Commun , vol.46 , Issue.3-4 , pp. 284-309
- Tseng, C.Y.¹ Pin, S.H.² Lee, Y.³ Wang, H.M.⁴ Chen, Y.C.⁵

18
- 21844454654
- Ph.D. dissertating, Northwestern Univ., Evanston, IL
- X. Sun, "The determination, analysis and synthesis of fundamental frequency," Ph.D. dissertating, Northwestern Univ., Evanston, IL, 2002.
- (2002) The Determination, Analysis and Synthesis of Fundamental Frequency
- Sun, X.¹

19
- 21444454844
- Speech rate and prosody units: Evidence of interaction from Mandarin Chinese
- Nara, Japan, Mar.
- C. Y. Tseng and Y. L. Lee, "Speech rate and prosody units: Evidence of interaction from Mandarin Chinese," in Proc. Int. Conf. Speech Prosody, Nara, Japan, Mar. 2004, pp. 251-254.
- (2004) Proc. Int. Conf. Speech Prosody , pp. 251-254
- Tseng, C.Y.¹ Lee, Y.L.²

20
- 13544257213
- A statistics-based pitch contour model for Mandarin speech
- S. H. Chen, W. H. Lai, and Y. R. Wang, "A statistics-based pitch contour model for Mandarin speech," J. Acoust. Soc. Amer., vol. 117, no. 2, pp. 908-925, 2005.
- (2005) J. Acoust. Soc. Amer. , vol.117 , Issue.2 , pp. 908-925
- Chen, S.H.¹ Lai, W.H.² Wang, Y.R.³

21
- 21444431930
- Locating boundaries for prosodic constituents in unrestricted Mandarin texts
- M. Chu and Y. Qian, "Locating boundaries for prosodic constituents in unrestricted Mandarin texts," Comput. Linguist. Chinese Lang. Process., vol. 6, no. 1, pp. 61-82, 2001.
- (2001) Comput. Linguist. Chinese Lang. Process. , vol.6 , Issue.1 , pp. 61-82
- Chu, M.¹ Qian, Y.²

22
- 21444433495
- A Mandarin TTS system with an integrated prosodic model
- Hong Kong, Dec.
- S. H. Pin, Y. L. Lee, Y. C. Chen, H. M. Wang, and C. Y. Tseng, "A Mandarin TTS system with an integrated prosodic model," in Proc. ISCSLP'04, Hong Kong, Dec. 2004, pp. 169-172.
- (2004) Proc. ISCSLP'04 , pp. 169-172
- Pin, S.H.¹ Lee, Y.L.² Chen, Y.C.³ Wang, H.M.⁴ Tseng, C.Y.⁵

23
- 85011187169
- Analysis of voice fundamental frequency contours for declarative sentence of japanese
- H. Fujisaki and K. Hirose, "Analysis of voice fundamental frequency contours for declarative sentence of japanese," J. Acoust. Soc. Japan (E), vol. 5, no. 4, pp. 233-241, 1984.
- (1984) J. Acoust. Soc. Japan (E) , vol.5 , Issue.4 , pp. 233-241
- Fujisaki, H.¹ Hirose, K.²

24
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
- Budapest, Hungary, Sep.
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. Eurospeech'99, Budapest, Hungary, Sep. 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech'99 , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

25
- 85119213703
- TOBI: A standard for labeling english prosody
- Banff, AB, Canada, Oct.
- K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C.Wightman, P. Price, J. Pierrehumbertand, and J. Hirschberg, "TOBI: A standard for labeling english prosody," in Proc. ICSLP'92, Banff, AB, Canada, Oct. 1992, pp. 867-870.
- (1992) Proc. ICSLP'92 , pp. 867-870
- Silverman, K.¹ Beckman, M.² Pitrelli, J.³ Ostendorf, M.⁴ Wightman, C.⁵ Price, P.⁶ Pierrehumbertand, J.⁷ Hirschberg, J.⁸

26
- 4544383281
- The tile intonation model
- Sydney, Australia, Nov.
- P. Taylor, "The tile intonation model," in Proc. ICSLP'98, Sydney, Australia, Nov. 1998, pp. 1383-1386.
- (1998) Proc. ICSLP'98 , pp. 1383-1386
- Taylor, P.¹

27
- 51449117929
- Modeling and synthesising F0 contours with the discrete cosine transform
- Las Vegas, NV, Mar.
- J. Teutenberg, C. Watson, and P. Riddle, "Modeling and synthesising F0 contours with the discrete cosine transform," in Proc. ICASSP'08, Las Vegas, NV, Mar. 2008, pp. 3973-3976.
- (2008) Proc. ICASSP'08 , pp. 3973-3976
- Teutenberg, J.¹ Watson, C.² Riddle, P.³

28
- 84867194192
- Multilevel parametric-based F0 model for speech synthesis
- Brisbane, Australia, Sep.
- J. Latorre and M. Akamine, "Multilevel parametric-based F0 model for speech synthesis," in Proc. Interspeech'08, Brisbane, Australia, Sep. 2008, pp. 2274-2277.
- (2008) Proc. Interspeech'08 , pp. 2274-2277
- Latorre, J.¹ Akamine, M.²

29
- 0025495218
- Vector quantization of pitch information in Mandarin speech
- Sep.
- S. H. Chen and Y. R. Wang, "Vector quantization of pitch information in Mandarin speech," IEEE Trans. Commun., vol. 38, no. 9, pp. 1317-1320, Sep. 1990.
- (1990) IEEE Trans. Commun. , vol.38 , Issue.9 , pp. 1317-1320
- Chen, S.H.¹ Wang, Y.R.²

30
- 0030677481
- Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited
- Munich, Germany, Apr.
- H. Kawahara, "Speech representation and transformation using adaptive interpolation of weighted spectrum: Vocoder revisited," in Proc. ICASSP'97, Munich, Germany, Apr. 1997, vol. 2, pp. 1303-1306.
- (1997) Proc. ICASSP'97 , vol.2 , pp. 1303-1306
- Kawahara, H.¹

31
- 0032673049
- Restructuring speech representations using a pitch adaptive time-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch adaptive time-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

32
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- San Francisco, CA, Mar.
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP '92, San Francisco, CA, Mar. 1992, vol. 1, pp. 137-140.
- (1992) Proc. ICASSP '92 , vol.1 , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

33
- 85156206534
- Fast exact inference with a factored model for natural language parsing
- Cambridge, MA: MIT Press, 15 (NIPS)
- D. Klein and C. D. Manning, "Fast exact inference with a factored model for natural language parsing," in Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, pp. 3-10, 15 (NIPS 2002).
- (2002) Advances in Neural Information Processing Systems , pp. 3-10
- Klein, D.¹ Manning, C.D.²

34
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- Mar.
- K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Japan (English), vol. 21, pp. 79-86, Mar. 2000.
- (2000) J. Acoust. Soc. Japan (English) , vol.21 , pp. 79-86
- Shinoda, K.¹ Watanabe, T.²

35
- 0002629270
- Maximum likelihood from incomplete data via the em algorithm
- A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Statist. Soc. B, vol. 39, pp. 1-38, 1977.
- (1977) J. R. Statist. Soc. B , vol.39 , pp. 1-38
- Dempster, A.P.¹ Laird, N.M.² Rubin, D.B.³

36
- 44449179384
- TH-CoSS, aMandarin speech corpus for TTS
- Mar.
- L. H. Cai, D. D. Cui, and R. Cai, "TH-CoSS, aMandarin speech corpus for TTS," J. Chinese Inf. Process., vol. 21, no. 2, pp. 94-99, Mar. 2007.
- (2007) J. Chinese Inf. Process. , vol.21 , Issue.2 , pp. 94-99
- Cai, L.H.¹ Cui, D.D.² Cai, R.³

37
- 4544354696
- Segmental tonal modeling for phone set design in Mandarin LVCSR
- Montreal, QC, Canada, May
- C. Huang, Y. Shi, J. Zhou, M. Chu, T.Wang, and E. Chang, "Segmental tonal modeling for phone set design in Mandarin LVCSR," in Proc. ICASSP'04, Montreal, QC, Canada, May 2004, pp. 901-904.
- (2004) Proc. ICASSP'04 , pp. 901-904
- Huang, C.¹ Shi, Y.² Zhou, J.³ Chu, M.⁴ Wang, T.⁵ Chang, E.⁶

38
- 4544303009
- Beijing China: Beijing Univ. Press
- T. Lin and L. J. Wang, Phonetic Tutorials. Beijing, China: Beijing Univ. Press, 1992, pp. 103-121.
- (1992) Phonetic Tutorials , pp. 103-121
- Lin, T.¹ Wang, L.J.²

39
- 0004116974
- W. B. Sauders Com
- S. Shott, "Statistics for health professionals," W. B. Sauders Com., 1990.
- (1990) Statistics for Health Professionals
- Shott, S.¹

40
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Toulouse, France, May
- Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP'06, Toulouse, France, May 2006, pp. 89-92.
- (2006) Proc. ICASSP'06 , pp. 89-92
- Wu, Y.-J.¹ Wang, R.-H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.