SCOPUS 정보 검색 플랫폼

IEICE Transactions on Information and Systems

Volumn E90-D, Issue 1, 2007, Pages 325-333

Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005

(4) Zen, Heiga a Toda, Tomoki b Nakamura, Masaru a Tokuda, Keiichi a

a NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

b NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

Author keywords

Blizzard challenge 2005; GV; HMM based speech synthesis; HSMM; STRAIGHT

Indexed keywords

AUDIO ACOUSTICS; MARKOV PROCESSES; MATHEMATICAL MODELS; PARAMETER ESTIMATION; PATTERN RECOGNITION SYSTEMS; SPEECH PROCESSING;

ACOUSTIC MODELING; LISTENING TESTS; SPEECH PARAMETER GENERATION ALGORITHMS;

SPEECH SYNTHESIS;

EID: 33846405723 PISSN: 09168532 EISSN: 17451361 Source Type: Journal
DOI: 10.1093/ietisy/e90-1.1.325 Document Type: Article

Times cited : (203)

References (41)

1
- 5544252636
- A corpus-based synthesizer
- R. Sproat, J. Hirschberg, and D. Yarowsky, "A corpus-based synthesizer," Proc. ICSLP, pp.563-566, 1992.
- (1992) Proc. ICSLP , pp. 563-566
- Sproat, R.¹ Hirschberg, J.² Yarowsky, D.³

2
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," Proc. Eurospeech, pp.2347-2350, 1999.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

3
- 0028996993
- Speech parameter generation from HMM using dynamic features
- K. Tokuda, T. Kobayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features," Proc. ICASSP, pp.660-663, 1995.
- (1995) Proc. ICASSP , pp. 660-663
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

4
- 0034842740
- Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR," Proc. ICASSP, pp.805-808, 2001.
- (2001) Proc. ICASSP , pp. 805-808
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

5
- 85135145847
- Speaker interpolation in HMM-based speech synthesis system
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Speaker interpolation in HMM-based speech synthesis system," Proc. Eurospeech, pp.2523-2526, 1997.
- (1997) Proc. Eurospeech , pp. 2523-2526
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

6
- 85009257840
- Eigenvoices for HMM-based speech synthesis
- Interspeech, pp
- K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Eigenvoices for HMM-based speech synthesis," Proc. ICSLP (Interspeech), pp. 1269-1272, 2002.
- (2002) Proc. ICSLP , pp. 1269-1272
- Shichiri, K.¹ Sawabe, A.² Tokuda, K.³ Masuko, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

7
- 33745216749
- The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common datasets
- Eurospeech, pp
- K. Tokuda and A. Black, "The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common datasets," Proc. Interspeech (Eurospeech), pp.77-80, 2005.
- (2005) Proc. Interspeech , pp. 77-80
- Tokuda, K.¹ Black, A.²

8
- 33846426268
- Speech synthesis research in a new age of cooperation and competition - The Blizzard Challenge
- K. Tokuda and A. Black, "Speech synthesis research in a new age of cooperation and competition - The Blizzard Challenge," J. ASJ, vol.62, no.6, pp.466-470, 2006.
- (2006) J. ASJ , vol.62 , Issue.6 , pp. 466-470
- Tokuda, K.¹ Black, A.²

9
- 0032673049
- 0 extraction: Possible role of a repetitive structure in sounds
- 0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol.27, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigné, A.³

10
- 85009111560
- Hidden semi-Markov model based speech synthesis
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Hidden semi-Markov model based speech synthesis," Proc. Interspeech (ICSLP), pp.1185-1180, 2004.
- (2004) Proc. Interspeech (ICSLP) , pp. 1185-1180
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

11
- 33745200051
- Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- Eurospeech, pp
- T. Toda and K. Tokuda, "Speech parameter generation algorithm considering global variance for HMM-based speech synthesis," Proc. Interspeech (Eurospeech), pp.2801-2804, 2005.
- (2005) Proc. Interspeech , pp. 2801-2804
- Toda, T.¹ Tokuda, K.²

12
- 33745206749
- Large scale evaluation of corpus-based synthesizers: Results and lessons from the 2005 Blizzard Challenge
- Eurospeech, pp
- C. Bennett, "Large scale evaluation of corpus-based synthesizers: Results and lessons from the 2005 Blizzard Challenge," Proc. Interspeech (Eurospeech), pp. 105-108, 2005.
- (2005) Proc. Interspeech , pp. 105-108
- Bennett, C.¹

13
- 33646773080
- CMU ARCTIC databases for speech synthesis
- Tech. Rep. CMU-LTI-03-177, Carnegie Mellon University
- J. Kominek and A. Black, "CMU ARCTIC databases for speech synthesis," Tech. Rep. CMU-LTI-03-177, Carnegie Mellon University, 2003.
- (2003)
- Kominek, J.¹ Black, A.²

14
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," Proc. ICASSP, pp. 137-140, 1992.
- (1992) Proc. ICASSP , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

15
- 0032678076
- Hidden Markov models based on multi-space probability distribution for pitch pattern modeling
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Hidden Markov models based on multi-space probability distribution for pitch pattern modeling," Proc. ICASSP, pp.229-232, 1999.
- (1999) Proc. ICASSP , pp. 229-232
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

16
- 84966341178
- The impact of speech recognition on speech synthesis
- CD-ROM proceeding
- M. Ostendorf and I. Bulyko, "The impact of speech recognition on speech synthesis," Proc. IEEE Workshop on Speech Synthesis, 2002. CD-ROM proceeding.
- (2002) Proc. IEEE Workshop on Speech Synthesis
- Ostendorf, M.¹ Bulyko, I.²

17
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Y.J. Wu and R.H. Wang, "Minimum generation error training for HMM-based speech synthesis," Proc. ICASSP, pp.89-92, 2006.
- (2006) Proc. ICASSP , pp. 89-92
- Wu, Y.J.¹ Wang, R.H.²

18
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," Proc. ICASSP, pp.1315-1318, 2000.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

19
- 33846442604
- Investigation of state duration model based on gamma distribution for HMM-based speech synthesis
- SP2001-81, 2001
- Y. Ishimatsu, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Investigation of state duration model based on gamma distribution for HMM-based speech synthesis," IEICE Technical Report, SP2001-81, 2001.
- IEICE Technical Report
- Ishimatsu, Y.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

20
- 0020596154
- Cepstral analysis synthesis on the mel frequency scale
- S. Imai, "Cepstral analysis synthesis on the mel frequency scale," Proc. ICASSP, pp.93-96, 1983.
- (1983) Proc. ICASSP , pp. 93-96
- Imai, S.¹

21
- 85027188775
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis, IEICE Trans. Inf. & Syst. (Japanese Edition), J87-D-II, no.8, pp.1563-1571, Aug. 2004.
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis," IEICE Trans. Inf. & Syst. (Japanese Edition), vol.J87-D-II, no.8, pp.1563-1571, Aug. 2004.

22
- 33846443006
- Improving naturalness using residual excitation for HMM-based speech synthesis
- M. Koike, K. Iwano, and S. Furui, "Improving naturalness using residual excitation for HMM-based speech synthesis," Proc. Spring Meeting of ASJ, pp.241-242, 2003.
- (2003) Proc. Spring Meeting of ASJ , pp. 241-242
- Koike, M.¹ Iwano, K.² Furui, S.³

23
- 84928118106
- 0 and periodicity
- 0 and periodicity," Proc. Eurospeech, pp.2781-2784, 1999.
- (1999) Proc. Eurospeech , pp. 2781-2784
- Kawahara, H.¹ Katayose, H.² Cheveigné, A.³ Patterson, R.⁴

24
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight," Proc. MAVEBA, pp. 13-15, 2001.
- (2001) Proc. MAVEBA , pp. 13-15
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

25
- 0001052406
- Discrete representation of signals
- A. Oppenheim and D. Johnson, "Discrete representation of signals," Proc. IEEE, pp.681-691, 1972.
- (1972) Proc. IEEE , pp. 681-691
- Oppenheim, A.¹ Johnson, D.²

26
- 0025543906
- Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones
- E. Moulines and F. Charpentier, "Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech Commun., vol.9, pp.453-467, 1990.
- (1990) Speech Commun , vol.9 , pp. 453-467
- Moulines, E.¹ Charpentier, F.²

27
- 0022685753
- Continuously variable duration hidden Markov models for automatic speech recognition
- S. Levinson, "Continuously variable duration hidden Markov models for automatic speech recognition," Comput. Speech Lang., vol.1, pp.29-45, 1986.
- (1986) Comput. Speech Lang , vol.1 , pp. 29-45
- Levinson, S.¹

28
- 33846438026
- A postfiltering technique for HMM-based speech synthesis
- Y. Kishimoto, H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A postfiltering technique for HMM-based speech synthesis," Proc. Autumn Meeting of ASJ, pp.279-280, 2002.
- (2002) Proc. Autumn Meeting of ASJ , pp. 279-280
- Kishimoto, Y.¹ Zen, H.² Tokuda, K.³ Masuko, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

29
- 28244501231
- A. Black and K. Lenzo, "Building synthetic voices," 2003. http://www.festvox.org/bsv/
- (2003) Building synthetic voices
- Black, A.¹ Lenzo, K.²

30
- 79952258981
- K. Tokuda, H. Zen, S. Sako, T. Yoshimura, J. Yamagishi, M. Tamura, and T. Masuko, "The HMM-based speech synthesis software toolkit," http://hts.ics.nitech.ac.jp/
- The HMM-based speech synthesis software toolkit
- Tokuda, K.¹ Zen, H.² Sako, S.³ Yoshimura, T.⁴ Yamagishi, J.⁵ Tamura, M.⁶ Masuko, T.⁷

31
- 85133439657
- An introduction of trajectory model into HMM-based speech synthesis
- H. Zen, K. Tokuda, and T. Kitamura, "An introduction of trajectory model into HMM-based speech synthesis," Proc. ISCA SSW5, pp. 191-196, 2004.
- (2004) Proc. ISCA SSW5 , pp. 191-196
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

32
- 0003805597
- Ph.D. Thesis, Cambridge University
- J. Odell, The Use of Context in Large Vocabulary Speech Recognition, Ph.D. Thesis, Cambridge University, 1995.
- (1995) The Use of Context in Large Vocabulary Speech Recognition
- Odell, J.¹

33
- 0004087635
- World Scientific Publishing Company
- J. Rissanen, Stochastic Complexity in Stochastic Inquiry, World Scientific Publishing Company, 1980.
- (1980) Stochastic Complexity in Stochastic Inquiry
- Rissanen, J.¹

34
- 85135145174
- Acoustic modeling based on the MDL criterion for speech recognition
- K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," Proc. Eurospeech, pp.99-102, 1997.
- (1997) Proc. Eurospeech , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

35
- 33846462839
- Minutuarization of HMM-based speech synthesis
- Y. Morioka, S. Kataoka, H. Zen, Y. Nankaku, K. Tokuda, and T. Kitamura, "Minutuarization of HMM-based speech synthesis," Proc. Autumn Meeting of ASJ, pp.325-326, 2004.
- (2004) Proc. Autumn Meeting of ASJ , pp. 325-326
- Morioka, Y.¹ Kataoka, S.² Zen, H.³ Nankaku, Y.⁴ Tokuda, K.⁵ Kitamura, T.⁶

36
- 85027177017
- A.S. House, C.E. Williams, M.H.L. Hecker, and K.D. Kryter, Psychoacoustic speech tests: A modified rhyme test, Tech. Rep. ESDTDR-63-403, U.S. Air Force Systems Command, Hanscom Field, Electronics Systems Division, 1963.
- A.S. House, C.E. Williams, M.H.L. Hecker, and K.D. Kryter, "Psychoacoustic speech tests: A modified rhyme test," Tech. Rep. ESDTDR-63-403, U.S. Air Force Systems Command, Hanscom Field, Electronics Systems Division, 1963.

37
- 0030166343
- The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences
- C. Benot, M. Grice, and V. Hazan, "The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences," Speech Commun., vol.18, pp.381-392, 1996.
- (1996) Speech Commun , vol.18 , pp. 381-392
- Benot, C.¹ Grice, M.² Hazan, V.³

38
- 0003571407
- A. Black, P. Taylor, and R. Caley, "The festival speech synthesis system," http://www.festvox.org/festival/
- The festival speech synthesis system
- Black, A.¹ Taylor, P.² Caley, R.³

39
- 84966350572
- Perfect synthesis for all of the people all of the time
- A. Black, "Perfect synthesis for all of the people all of the time," Proc. IEEE Speech Synthesis Workshop, pp. 160-163, 2002.
- (2002) Proc. IEEE Speech Synthesis Workshop , pp. 160-163
- Black, A.¹

40
- 85006631929
- Unit selection and emotional speech
- Interspeech, pp
- A. Black, "Unit selection and emotional speech," Proc. Eurospeech (Interspeech), pp.1649-1652, 2003.
- (2003) Proc. Eurospeech , pp. 1649-1652
- Black, A.¹

41
- 33846463597
- Ph.D. Thesis, Tokyo Institute of Technology
- J. Yamagishi, Average-Voice-Based Speech Synthesis, Ph.D. Thesis, Tokyo Institute of Technology, 2006.
- (2006) Average-Voice-Based Speech Synthesis
- Yamagishi, J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.