SCOPUS 정보 검색 플랫폼

IEICE Transactions on Information and Systems

Volumn E90-D, Issue 5, 2007, Pages 825-834

A hidden semi-Markov model-based speech synthesis system

(5) Zen, Heiga a Tokuda, Keiichi a Masuko, Takashi b,c Kobayasih, Takao b Kitamura, Tadashi a

a NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

b Tokyo Institute of Technology ^* (Japan)

c TOSHIBA CORPORATION (Japan)

Author keywords

Hidden Markov model; Hidden semi Markov model; HMM based speech synthesis

Indexed keywords

MARKOV PROCESSES; MAXIMUM LIKELIHOOD; PROBABILITY DENSITY FUNCTION; SPEECH; SPEECH SYNTHESIS;

HIDDEN SEMI-MARKOV MODELING; HIDDEN SEMI-MARKOV MODELS; HMM-BASED SPEECH SYNTHESIS; MAXIMUM LIKELIHOOD CRITERION; PROBABILITY DENSITY FUNCTIONS (PDFS); SPEECH SYNTHESIS SYSTEM; SUBJECTIVE LISTENING TEST; SYNTHESIS PROBLEMS;

HIDDEN MARKOV MODELS;

EID: 44449177634 PISSN: 09168532 EISSN: 17451361 Source Type: Journal
DOI: 10.1093/ietisy/e90-d.5.825 Document Type: Article

Times cited : (204)

References (36)

1
- 0029725605
- Speech synthesis from HMMs using dynamic features
- T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, "Speech synthesis from HMMs using dynamic features," Proc. ICASSP, pp. 389-392, 1996.
- (1996) Proc. ICASSP , pp. 389-392
- Masuko, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

2
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," Proc. Eurospeech, pp. 2347-2350, 1999.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

3
- 0030696416
- Voice characteristics conversion for HMM-based speech synthesis system
- T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, "Voice characteristics conversion for HMM-based speech synthesis system," Proc. ICASSP, pp. 1611-1614, 1997.
- (1997) Proc. ICASSP , pp. 1611-1614
- Masuko, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

4
- 0034842740
- Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR," Proc. ICASSP, pp. 805-808, 2001.
- (2001) Proc. ICASSP , pp. 805-808
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

5
- 85135145847
- Speaker interpolation in HMM-based speech synthesis system
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Speaker interpolation in HMM-based speech synthesis system," Proc. Eurospeech, pp. 2523-2526, 1997.
- (1997) Proc. Eurospeech , pp. 2523-2526
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

6
- 85009257840
- Eigenvoices for HMM-based speech synthesis
- K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Eigenvoices for HMM-based speech synthesis," Proc. ICSLP, pp. 1269-1272, 2002.
- (2002) Proc. ICSLP , pp. 1269-1272
- Shichiri, K.¹ Sawabe, A.² Tokuda, K.³ Masuko, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

7
- 0005897608
- Linguistic properties in the control of segmental duration for speech synthesis
- ed. G. Bailly and C. Benoit, pp, Elsevier Science Publishers
- N. Kaiki, K. Takeda, and Y. Sagisaka, "Linguistic properties in the control of segmental duration for speech synthesis," in Talking Machines: Theories, Models, and Designs, ed. G. Bailly and C. Benoit, pp. 255-263, Elsevier Science Publishers, 1992.
- (1992) Talking Machines: Theories, Models, and Designs , pp. 255-263
- Kaiki, N.¹ Takeda, K.² Sagisaka, Y.³

8
- 0002069313
- Tree-based modelling of segmental duration
- ed. G. Bailly and C. Benoit, pp, Elsevier Science Publishers
- M. Riley, "Tree-based modelling of segmental duration," in Talking Machines: Theories, Models, and Designs, ed. G. Bailly and C. Benoit, pp. 265-273, Elsevier Science Publishers, 1992.
- (1992) Talking Machines: Theories, Models, and Designs , pp. 265-273
- Riley, M.¹

9
- 0034226722
- Statistical modelling of speech segment duration by constrained tree regression
- July
- N. Iwahashi and Y. Sagisaka, "Statistical modelling of speech segment duration by constrained tree regression," IEICE Trans. Inf. & Syst., vol. E83-D, no. 7, pp. 1550-1559, July 2000.
- (2000) IEICE Trans. Inf. & Syst , vol.E83-D , Issue.7 , pp. 1550-1559
- Iwahashi, N.¹ Sagisaka, Y.²

10
- 0008520157
- Multi-lingual duration modelling
- J. van Santen, C. Shih, B. Möbius, E. Tzoukermann, and M. Tanenblatt, "Multi-lingual duration modelling," Proc. Eurospeech, pp. 2651-2654, 1997.
- (1997) Proc. Eurospeech , pp. 2651-2654
- van Santen, J.¹ Shih, C.² Möbius, B.³ Tzoukermann, E.⁴ Tanenblatt, M.⁵

11
- 85093445139
- Duration modeling for HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Duration modeling for HMM-based speech synthesis," Proc. ICSLP, pp. 29-32, 1998.
- (1998) Proc. ICSLP , pp. 29-32
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

12
- 33846442604
- Investigation of state duration model based on gamma distribution for HMM-based speech synthesis,
- SP2001-81, 2001
- Y. Ishimatsu, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Investigation of state duration model based on gamma distribution for HMM-based speech synthesis," IEICE Technical Report, SP2001-81, 2001.
- IEICE Technical Report
- Ishimatsu, Y.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

13
- 68249143901
- A study on state duration modeling using lognormal distribution for HMM-based speech synthesis
- March
- J. Yamagishi, T. Masuko, and Kobayashi, "A study on state duration modeling using lognormal distribution for HMM-based speech synthesis," Proc. ASJ, pp. 225-226, March 2004.
- (2004) Proc. ASJ , pp. 225-226
- Yamagishi, J.¹ Masuko, T.² Kobayashi³

14
- 0002629270
- Maximum likelihood from incomplete data via the EM algorithm
- A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," Journal of Royal Statistics Society, vol. 39, pp. 1-38, 1977.
- (1977) Journal of Royal Statistics Society , vol.39 , pp. 1-38
- Dempster, A.¹ Laird, N.² Rubin, D.³

15
- 0003805597
- Ph. D. thesis, Cambridge University
- J. Odell, The Use of Context in Large Vocabulary Speech Recognition, Ph. D. thesis, Cambridge University, 1995.
- (1995) The Use of Context in Large Vocabulary Speech Recognition
- Odell, J.¹

16
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," Proc. ICASSP, pp. 1315-1318, 2000.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

17
- 0024610919
- A tutorial on hidden Markov models and selected applications in speech recognition
- L. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-285, 1989.
- (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-285
- Rabiner, L.¹

18
- 0036522887
- Multispace probability distribution HMM
- March
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Multispace probability distribution HMM," IEICE Trans. Inf. & Syst., vol. E85-D, no. 3, pp. 455-464, March 2002.
- (2002) IEICE Trans. Inf. & Syst , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

19
- 0002585974
- Variable duration models for speech
- J. Ferguson, "Variable duration models for speech," Proc. Symposium on the Application Hidden Markov Models to Text and Speech, pp. 143-179, 1980.
- (1980) Proc. Symposium on the Application Hidden Markov Models to Text and Speech , pp. 143-179
- Ferguson, J.¹

20
- 0022234383
- Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition
- M. Russell and R. Moore, "Explicit modeling of state occupancy in hidden Markov models for automatic speech recognition," Proc. ICASSP, pp. 5-8, 1985.
- (1985) Proc. ICASSP , pp. 5-8
- Russell, M.¹ Moore, R.²

21
- 0022685753
- Continuously variable duration hidden Markov models for automatic speech recognition
- S. Levinson, "Continuously variable duration hidden Markov models for automatic speech recognition," Comput. Speech Lang., vol. 1, pp. 29-45, 1986.
- (1986) Comput. Speech Lang , vol.1 , pp. 29-45
- Levinson, S.¹

22
- 0000176621
- On the complexity of explicit duration HMMs
- C. Mitchell, M. Harper, and L. Jamieson, "On the complexity of explicit duration HMMs," IEEE Trans. Speech Audio Process., vol. 3, no. 3, pp. 213-217, 1995.
- (1995) IEEE Trans. Speech Audio Process , vol.3 , Issue.3 , pp. 213-217
- Mitchell, C.¹ Harper, M.² Jamieson, L.³

23
- 0030245363
- From HMMs to segment models
- M. Ostendorf, V. Digalakis, and O. Kimball, "From HMMs to segment models," IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 360-378, 1996.
- (1996) IEEE Trans. Speech Audio Process , vol.4 , Issue.5 , pp. 360-378
- Ostendorf, M.¹ Digalakis, V.² Kimball, O.³

24
- 0025419316
- Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition
- K. F. Lee, "Context-dependent phonetic hidden Markov models for speaker-independent continuous speech recognition," IEEE Trans. Acoust. Speech Signal Process., vol. 38, no. 4, pp. 599-609, 1990.
- (1990) IEEE Trans. Acoust. Speech Signal Process , vol.38 , Issue.4 , pp. 599-609
- Lee, K.F.¹

25
- 0030715097
- HMM topology design using maximum likelihood successive state splitting
- M. Ostendorf and H. Singer, "HMM topology design using maximum likelihood successive state splitting," Comput. Speech Lang., vol. 11, no. 1, pp. 17-41, 1997.
- (1997) Comput. Speech Lang , vol.11 , Issue.1 , pp. 17-41
- Ostendorf, M.¹ Singer, H.²

26
- 0027153655
- Predicting unseen triphones with senones
- M. Y. Hwang, X. Huang, and F. Alleva, "Predicting unseen triphones with senones," Proc. ICASSP, pp. 311-314, 1993.
- (1993) Proc. ICASSP , pp. 311-314
- Hwang, M.Y.¹ Huang, X.² Alleva, F.³

27
- 17444371335
- Benchmark DARPA RM results with the HTK portable HMM toolkit
- P. Woodland and S. Young, "Benchmark DARPA RM results with the HTK portable HMM toolkit," Proc. DARPA Continuous Speech Recognition Workshop, pp. 71-76, 1992.
- (1992) Proc. DARPA Continuous Speech Recognition Workshop , pp. 71-76
- Woodland, P.¹ Young, S.²

28
- 85027177249
- T. Masuko, K. Tokuda, N. Miyazaki, and T. Kobayashi, Pitch pattern generation using multi-space probability distribution HMM, IEICE Trans. Inf. & Syst. (Japanese Edition), J85-D-II, no. 7, pp. 1600-1609, July 2000.
- T. Masuko, K. Tokuda, N. Miyazaki, and T. Kobayashi, "Pitch pattern generation using multi-space probability distribution HMM," IEICE Trans. Inf. & Syst. (Japanese Edition), vol. J85-D-II, no. 7, pp. 1600-1609, July 2000.

29
- 0025475528
- ATR Japanese speech database as a tool of speech recognition and synthesis
- A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR Japanese speech database as a tool of speech recognition and synthesis," Speech Commun., vol. 9, pp. 357-363, 1990.
- (1990) Speech Commun , vol.9 , pp. 357-363
- Kurematsu, A.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Kuwabara, H.⁵ Shikano, K.⁶

30
- 0032673049
- 0 extraction: Possible role of a repetitive structure in sounds
- 0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigné, A.³

31
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight," Proc. MAVEBA, pp. 13-15, 2001.
- (2001) Proc. MAVEBA , pp. 13-15
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

32
- 33745200051
- Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- Eurospeech, pp
- T. Toda and K. Tokuda, "Speech parameter generation algorithm considering global variance for HMM-based speech synthesis," Proc. Interspeech (Eurospeech), pp. 2801-2804, 2005.
- (2005) Proc. Interspeech , pp. 2801-2804
- Toda, T.¹ Tokuda, K.²

33
- 33745215669
- An overview of Nitech HMM-based speech synthesis system for Blizzard Challenge 2005
- H. Zen and T. Toda, "An overview of Nitech HMM-based speech synthesis system for Blizzard Challenge 2005," Proc. Interspeech, pp. 93-96, 2005.
- (2005) Proc. Interspeech , pp. 93-96
- Zen, H.¹ Toda, T.²

34
- 0004087635
- World Scientific Publishing Company
- J. Rissanen, Stochastic Complexity in Stochastic Inquiry, World Scientific Publishing Company, 1980.
- (1980) Stochastic Complexity in Stochastic Inquiry
- Rissanen, J.¹

35
- 85135145174
- Acoustic modeling based on the MDL criterion for speech recognition
- K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," Proc. Eurospeech, pp. 99-102, 1997.
- (1997) Proc. Eurospeech , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

36
- 85009111560
- Hidden semi-Markov model based speech synthesis
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Hidden semi-Markov model based speech synthesis," Proc. ICSLP, pp. 1185-1180, 2004.
- (2004) Proc. ICSLP , pp. 1185-1180
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.