메뉴 건너뛰기




Volumn E90-D, Issue 1, 2007, Pages 325-333

Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005

Author keywords

Blizzard challenge 2005; GV; HMM based speech synthesis; HSMM; STRAIGHT

Indexed keywords

AUDIO ACOUSTICS; MARKOV PROCESSES; MATHEMATICAL MODELS; PARAMETER ESTIMATION; PATTERN RECOGNITION SYSTEMS; SPEECH PROCESSING;

EID: 33846405723     PISSN: 09168532     EISSN: 17451361     Source Type: Journal    
DOI: 10.1093/ietisy/e90-1.1.325     Document Type: Article
Times cited : (203)

References (41)
  • 2
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," Proc. Eurospeech, pp.2347-2350, 1999.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 3
    • 0028996993 scopus 로고
    • Speech parameter generation from HMM using dynamic features
    • K. Tokuda, T. Kobayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features," Proc. ICASSP, pp.660-663, 1995.
    • (1995) Proc. ICASSP , pp. 660-663
    • Tokuda, K.1    Kobayashi, T.2    Imai, S.3
  • 4
    • 0034842740 scopus 로고    scopus 로고
    • Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    • M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR," Proc. ICASSP, pp.805-808, 2001.
    • (2001) Proc. ICASSP , pp. 805-808
    • Tamura, M.1    Masuko, T.2    Tokuda, K.3    Kobayashi, T.4
  • 7
    • 33745216749 scopus 로고    scopus 로고
    • The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common datasets
    • Eurospeech, pp
    • K. Tokuda and A. Black, "The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common datasets," Proc. Interspeech (Eurospeech), pp.77-80, 2005.
    • (2005) Proc. Interspeech , pp. 77-80
    • Tokuda, K.1    Black, A.2
  • 8
    • 33846426268 scopus 로고    scopus 로고
    • Speech synthesis research in a new age of cooperation and competition - The Blizzard Challenge
    • K. Tokuda and A. Black, "Speech synthesis research in a new age of cooperation and competition - The Blizzard Challenge," J. ASJ, vol.62, no.6, pp.466-470, 2006.
    • (2006) J. ASJ , vol.62 , Issue.6 , pp. 466-470
    • Tokuda, K.1    Black, A.2
  • 11
    • 33745200051 scopus 로고    scopus 로고
    • Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • Eurospeech, pp
    • T. Toda and K. Tokuda, "Speech parameter generation algorithm considering global variance for HMM-based speech synthesis," Proc. Interspeech (Eurospeech), pp.2801-2804, 2005.
    • (2005) Proc. Interspeech , pp. 2801-2804
    • Toda, T.1    Tokuda, K.2
  • 12
    • 33745206749 scopus 로고    scopus 로고
    • Large scale evaluation of corpus-based synthesizers: Results and lessons from the 2005 Blizzard Challenge
    • Eurospeech, pp
    • C. Bennett, "Large scale evaluation of corpus-based synthesizers: Results and lessons from the 2005 Blizzard Challenge," Proc. Interspeech (Eurospeech), pp. 105-108, 2005.
    • (2005) Proc. Interspeech , pp. 105-108
    • Bennett, C.1
  • 13
    • 33646773080 scopus 로고    scopus 로고
    • CMU ARCTIC databases for speech synthesis
    • Tech. Rep. CMU-LTI-03-177, Carnegie Mellon University
    • J. Kominek and A. Black, "CMU ARCTIC databases for speech synthesis," Tech. Rep. CMU-LTI-03-177, Carnegie Mellon University, 2003.
    • (2003)
    • Kominek, J.1    Black, A.2
  • 14
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," Proc. ICASSP, pp. 137-140, 1992.
    • (1992) Proc. ICASSP , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 15
    • 0032678076 scopus 로고    scopus 로고
    • Hidden Markov models based on multi-space probability distribution for pitch pattern modeling
    • K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Hidden Markov models based on multi-space probability distribution for pitch pattern modeling," Proc. ICASSP, pp.229-232, 1999.
    • (1999) Proc. ICASSP , pp. 229-232
    • Tokuda, K.1    Masuko, T.2    Miyazaki, N.3    Kobayashi, T.4
  • 17
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • Y.J. Wu and R.H. Wang, "Minimum generation error training for HMM-based speech synthesis," Proc. ICASSP, pp.89-92, 2006.
    • (2006) Proc. ICASSP , pp. 89-92
    • Wu, Y.J.1    Wang, R.H.2
  • 18
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," Proc. ICASSP, pp.1315-1318, 2000.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 19
    • 33846442604 scopus 로고    scopus 로고
    • Investigation of state duration model based on gamma distribution for HMM-based speech synthesis
    • SP2001-81, 2001
    • Y. Ishimatsu, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Investigation of state duration model based on gamma distribution for HMM-based speech synthesis," IEICE Technical Report, SP2001-81, 2001.
    • IEICE Technical Report
    • Ishimatsu, Y.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 20
    • 0020596154 scopus 로고
    • Cepstral analysis synthesis on the mel frequency scale
    • S. Imai, "Cepstral analysis synthesis on the mel frequency scale," Proc. ICASSP, pp.93-96, 1983.
    • (1983) Proc. ICASSP , pp. 93-96
    • Imai, S.1
  • 21
    • 85027188775 scopus 로고    scopus 로고
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis, IEICE Trans. Inf. & Syst. (Japanese Edition), J87-D-II, no.8, pp.1563-1571, Aug. 2004.
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis," IEICE Trans. Inf. & Syst. (Japanese Edition), vol.J87-D-II, no.8, pp.1563-1571, Aug. 2004.
  • 22
    • 33846443006 scopus 로고    scopus 로고
    • Improving naturalness using residual excitation for HMM-based speech synthesis
    • M. Koike, K. Iwano, and S. Furui, "Improving naturalness using residual excitation for HMM-based speech synthesis," Proc. Spring Meeting of ASJ, pp.241-242, 2003.
    • (2003) Proc. Spring Meeting of ASJ , pp. 241-242
    • Koike, M.1    Iwano, K.2    Furui, S.3
  • 24
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
    • H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight," Proc. MAVEBA, pp. 13-15, 2001.
    • (2001) Proc. MAVEBA , pp. 13-15
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 25
    • 0001052406 scopus 로고
    • Discrete representation of signals
    • A. Oppenheim and D. Johnson, "Discrete representation of signals," Proc. IEEE, pp.681-691, 1972.
    • (1972) Proc. IEEE , pp. 681-691
    • Oppenheim, A.1    Johnson, D.2
  • 26
    • 0025543906 scopus 로고
    • Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • E. Moulines and F. Charpentier, "Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech Commun., vol.9, pp.453-467, 1990.
    • (1990) Speech Commun , vol.9 , pp. 453-467
    • Moulines, E.1    Charpentier, F.2
  • 27
    • 0022685753 scopus 로고
    • Continuously variable duration hidden Markov models for automatic speech recognition
    • S. Levinson, "Continuously variable duration hidden Markov models for automatic speech recognition," Comput. Speech Lang., vol.1, pp.29-45, 1986.
    • (1986) Comput. Speech Lang , vol.1 , pp. 29-45
    • Levinson, S.1
  • 31
    • 85133439657 scopus 로고    scopus 로고
    • An introduction of trajectory model into HMM-based speech synthesis
    • H. Zen, K. Tokuda, and T. Kitamura, "An introduction of trajectory model into HMM-based speech synthesis," Proc. ISCA SSW5, pp. 191-196, 2004.
    • (2004) Proc. ISCA SSW5 , pp. 191-196
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 34
    • 85135145174 scopus 로고    scopus 로고
    • Acoustic modeling based on the MDL criterion for speech recognition
    • K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," Proc. Eurospeech, pp.99-102, 1997.
    • (1997) Proc. Eurospeech , pp. 99-102
    • Shinoda, K.1    Watanabe, T.2
  • 36
    • 85027177017 scopus 로고    scopus 로고
    • A.S. House, C.E. Williams, M.H.L. Hecker, and K.D. Kryter, Psychoacoustic speech tests: A modified rhyme test, Tech. Rep. ESDTDR-63-403, U.S. Air Force Systems Command, Hanscom Field, Electronics Systems Division, 1963.
    • A.S. House, C.E. Williams, M.H.L. Hecker, and K.D. Kryter, "Psychoacoustic speech tests: A modified rhyme test," Tech. Rep. ESDTDR-63-403, U.S. Air Force Systems Command, Hanscom Field, Electronics Systems Division, 1963.
  • 37
    • 0030166343 scopus 로고    scopus 로고
    • The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences
    • C. Benot, M. Grice, and V. Hazan, "The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences," Speech Commun., vol.18, pp.381-392, 1996.
    • (1996) Speech Commun , vol.18 , pp. 381-392
    • Benot, C.1    Grice, M.2    Hazan, V.3
  • 39
    • 84966350572 scopus 로고    scopus 로고
    • Perfect synthesis for all of the people all of the time
    • A. Black, "Perfect synthesis for all of the people all of the time," Proc. IEEE Speech Synthesis Workshop, pp. 160-163, 2002.
    • (2002) Proc. IEEE Speech Synthesis Workshop , pp. 160-163
    • Black, A.1
  • 40
    • 85006631929 scopus 로고    scopus 로고
    • Unit selection and emotional speech
    • Interspeech, pp
    • A. Black, "Unit selection and emotional speech," Proc. Eurospeech (Interspeech), pp.1649-1652, 2003.
    • (2003) Proc. Eurospeech , pp. 1649-1652
    • Black, A.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.