SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 18, Issue 5, 2010, Pages 984-1004

Thousands of voices for HMM-based speech synthesis - Analysis and application of TTS systems fuilt on various ASR corpora

(13) Yamagishi, Junichi a Usabaev, Bela b King, Simon a Watts, Oliver a Dines, John c Tian, Jilei d Guan, Yong d Hu, Rile d Oura, Keiichiro e Wu, Yi Jian e,f Tokuda, Keiichi e Karhila, Reima g Kurimo, Mikko g

a UNIVERSITY OF EDINBURGH (United Kingdom)

b UNIVERSITY OF TÜBINGEN (Germany)

c IDIAP RESEARCH INSTITUTE (Switzerland)

d NOKIA RESEARCH CENTER (United States)

e NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

f TTS Group (China)

g AALTO UNIVERSITY (Finland)

Author keywords

Automatic speech recognition (ASR); Average voice; H Triple S (HTS); Hidden Markov model (HMM) based speech synthesis; Speaker adaptation; Speech synthesis; SPEECON database; Voice conversion; WSJ database

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION (ASR); AVERAGE VOICE; SPEAKER ADAPTATION; VOICE CONVERSION;

DATABASE SYSTEMS; HIDDEN MARKOV MODELS; SPEECH SYNTHESIS;

SPEECH RECOGNITION;

EID: 77953708096 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2010.2045237 Document Type: Article

Times cited : (74)

References (70)

1
- 70450161300
- Thousands of voices for HMM-based speech synthesis
- Brighton, U.K., Sep.
- J. Yamagishi et al., "Thousands of voices for HMM-based speech synthesis," in Proc. Interspeech-99, Brighton, U.K., Sep. 2009, pp. 420-423.
- (2009) Proc. Interspeech-99 , pp. 420-423
- Yamagishi, J.¹

2
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
- Budapest, Hungary, Sep.
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. EUROSPEECH-99, Budapest, Hungary, Sep. 1999, pp. 2374-12350
- (1999) Proc. EUROSPEECH-99 , pp. 2374-12350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

3
- 79952258981
- Version 2.1. [Online]. Available:
- K. Tokuda, H. Zen, J. Yamagishi, T. Masuko, S. Sako, A. B. Black, and T. Nose, "The HMM-Based Speech Synthesis System (HTS) Version 2.1." [Online]. Available: http://hts.sp.nitech.ac.jp/
- The HMM-Based Speech Synthesis System (HTS)
- Tokuda, K.¹ Zen, H.² Yamagishi, J.³ Masuko, T.⁴ Sako, S.⁵ Black, A.B.⁶ Nose, T.⁷

4
- 85133720638
- The HMM-based speech synthesis system (HTS)
- version 2.0, Bonn, Germany, Aug.
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. W. Black, and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2.0," in Proc. 6th ISCA Workshop Speech Synth. (SSW-6), Bonn, Germany, Aug. 2007.
- (2007) Proc. 6th ISCA Workshop Speech Synth. (SSW-6)
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.W.⁶ Tokuda, K.⁷

5
- 85008006694
- A robust speaker-adaptive HMM-based text-to-speech synthesis
- Aug.
- J.Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, "A robust speaker-adaptive HMM-based text-to-speech synthesis," IEEE Trans. Speech, Audio, Lang. Process., vol.17, no.6, pp. 1208-1230, Aug. 2009.
- (2009) IEEE Trans. Speech, Audio, Lang. Process. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

6
- 84867223798
- Robustness of HMM-based speech synthesis
- Brisbane, Australia, Sep.
- J. Yamagishi, Z.-H. Ling, and S. King, "Robustness of HMM-based speech synthesis," in Proc. Interspeech-08, Brisbane, Australia, Sep. 2008, pp. 581-584.
- (2008) Proc. Interspeech-08 , pp. 581-584
- Yamagishi, J.¹ Ling, Z.-H.² King, S.³

7
- 67651002140
- Statistical parametric speech synthesis
- Nov.
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Commun., vol.51, no.11, pp. 1039-1064, Nov. 2009.
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

8
- 0012330750
- The design for thewall street journal-based CSR corpus
- Harriman, NY
- D. B. Paul and J. M. Baker, "The design for thewall street journal-based CSR corpus," in Proc.Workshop Speech Natural Lang., Harriman, NY, 1992, pp. 357-362.
- (1992) Proc.Workshop Speech Natural Lang. , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

9
- 0028996854
- WSJCAM0: A British English speech corpus for large vocabulary continuous speech recognition
- Detroit, MI, May
- T. Robinson, J. Fransen, D. Pye, J. Foote, and S. Renals, "WSJCAM0: A British English speech corpus for large vocabulary continuous speech recognition," in Proc. ICASSP-95, Detroit, MI, May 1995, pp. 81-84.
- (1995) Proc. ICASSP-95 , pp. 81-84
- Robinson, T.¹ Fransen, J.² Pye, D.³ Foote, J.⁴ Renals, S.⁵

10
- 84936692751
- DARPAresource management bench
- Hidden Valley, PA, Jun.
- D. S. Pallet, J. G. Fiscus, and J. S. Garofolo, "DARPAresource management bench," in Proc. Workshop Speech Natural Lang., Hidden Valley, PA, Jun. 1990, pp. 298-305.
- (1990) Proc. Workshop Speech Natural Lang. , pp. 298-305
- Pallet, D.S.¹ Fiscus, J.G.² Garofolo, J.S.³

11
- 85009274666
- GlobalPhone: A multilingual speech and text database developed at Karlsruhe university
- Denver, CO, Sep.
- T. Schultz, "GlobalPhone: A multilingual speech and text database developed at Karlsruhe university," in Proc. ICSLP'02, Denver, CO, Sep. 2002, pp. 345-348.
- (2002) Proc. ICSLP'02 , pp. 345-348
- Schultz, T.¹

12
- 84910032186
- SPEECON-speech databases for consumer devices: Database specification and validation
- Canary Islands, Spain, May
- D. Iskra, B. Grosskopf, K. Marasek, H. V. D. Heuvel, F. Diehl, and A. Kiessling, "SPEECON-speech databases for consumer devices: Database specification and validation," in Proc. LREC'02, Canary Islands, Spain, May 2002, pp. 329-333.
- (2002) Proc. LREC'02 , pp. 329-333
- Iskra, D.¹ Grosskopf, B.² Marasek, K.³ Heuvel, H.V.D.⁴ Diehl, F.⁵ Kiessling, A.⁶

13
- 70349227947
- The application of hidden Markov models in speech recognition
- M. J. F. Gales and S. J. Young, "The application of hidden Markov models in speech recognition," Foundations Trends R Signal Process., vol.1, no.3, pp. 195-304, 2008.
- (2008) Foundations Trends R Signal Process , vol.1 , Issue.3 , pp. 195-304
- Gales, M.J.F.¹ Young, S.J.²

14
- 85128361526
- The design of the newspaper- based Japanese large vocabulary continuous speech recognition corpus
- Sydney, Australia, Dec.
- K. Itou, M. Yamamoto, K. Takeda, T. Takezawa, T. Matsuoka, T. Kobayashi, K. Shikano, and S. Itahashi, "The design of the newspaper- based Japanese large vocabulary continuous speech recognition corpus," in Proc. ICSLP-98, Sydney, Australia, Dec. 1998, pp. 3261-3264.
- (1998) Proc. ICSLP-98 , pp. 3261-3264
- Itou, K.¹ Yamamoto, M.² Takeda, K.³ Takezawa, T.⁴ Matsuoka, T.⁵ Kobayashi, T.⁶ Shikano, K.⁷ Itahashi, S.⁸

15
- 0002985991
- Mora and syllable
- N. Tsujimura, Ed. New York: Blackwell
- H. Kubozono, "Mora and syllable," in The Handbook of Japanese Linguistics, N. Tsujimura, Ed. New York: Blackwell, 1995, pp. 31-61.
- (1995) The Handbook of Japanese Linguistics , pp. 31-61
- Kubozono, H.¹

16
- 85030493378
- Synthesis of regional English using a keyword lexicon
- Budapest, Hungary, Sep.
- S. Fitt and S. Isard, "Synthesis of regional English using a keyword lexicon," in Proc. Eurospeech-99, Budapest, Hungary, Sep. 1999, vol.2, pp. 823-826.
- (1999) Proc. Eurospeech-99 , vol.2 , pp. 823-826
- Fitt, S.¹ Isard, S.²

17
- 34047123652
- Multisyn: Open-domain unit selection for the Festival speech synthesis system
- R. A. J. Clark, K. Richmond, and S. King, "Multisyn: Open-domain unit selection for the Festival speech synthesis system," Speech Commun., vol.49, no.4, pp. 317-330, 2007.
- (2007) Speech Commun , vol.49 , Issue.4 , pp. 317-330
- Clark, R.A.J.¹ Richmond, K.² King, S.³

18
- 77953725740
- [Online].Available:
- [Online]. Available: http://www.lc-star.com

19
- 77249139677
- An HMM-based Mandarin Chinese text-to-speech system
- Singapore, Dec.
- Y. Qian, F. Soong, Y. Chen, and M. Chu, "An HMM-based Mandarin Chinese text-to-speech system," in Proc. ISCSLP'06, Singapore, Dec. 2006, pp. 223-232.
- (2006) Proc. ISCSLP'06 , pp. 223-232
- Qian, Y.¹ Soong, F.² Chen, Y.³ Chu, M.⁴

20
- 77953713775
- Deliverable Report D2.1 EMIME Project, 2008
- Deliverable Report D2.1 EMIME Project, 2008.

21
- 77953728396
- An efficient and unified approach of Mandarin HTS system
- Dallas, TX, Mar.
- Y. Guan, J. Tian, Y.-J. Wu, J. Yamagishi, and J. Nurminen, "An efficient and unified approach of Mandarin HTS system," in Proc. ICASSP'10, Dallas, TX, Mar. 2010.
- (2010) Proc. ICASSP'10
- Guan, Y.¹ Tian, J.² Wu, Y.-J.³ Yamagishi, J.⁴ Nurminen, J.⁵

22
- 85123861026
- XIMERA: A new TTS from ATR based on corpus-based technologies
- Workshop, Pittsburgh, PA, Jun.
- H. Kawai, T. Toda, J. Ni, M. Tsuzaki, and K. Tokuda, "XIMERA: A new TTS from ATR based on corpus-based technologies," in Proc. ISCA 5th Speech Synth. Workshop, Pittsburgh, PA, Jun. 2004, pp. 179-184.
- (2004) Proc. ISCA 5th Speech Synth , pp. 179-184
- Kawai, H.¹ Toda, T.² Ni, J.³ Tsuzaki, M.⁴ Tokuda, K.⁵

23
- 60649102582
- XIMERA: A concatenative speech synthesis system with large scale corpora
- Dec.
- H. Kawai, T. Toda, J. Yamagishi, T. Hirai, J. Ni, N. Nishizawa, M. Tsuzaki, and K. Tokuda, "XIMERA: A concatenative speech synthesis system with large scale corpora," IEICE Trans. Inf. Syst., vol.J89-D-II, no.12, pp. 2688-2698, Dec. 2006.
- (2006) IEICE Trans. Inf. Syst. , vol.J89-D-II , Issue.12 , pp. 2688-2698
- Kawai, H.¹ Toda, T.² Yamagishi, J.³ Hirai, T.⁴ Ni, J.⁵ Nishizawa, N.⁶ Tsuzaki, M.⁷ Tokuda, K.⁸

24
- 33751057590
- The ATR multilingual speech-to-speech translation system
- Mar.
- S. Nakamura, K. Markov, H. Nakaiwa, G. Kikui, H. Kawai, T. Jitsuhiro, J.-S. Zhang, H. Yamamoto, E. Sumita, and S. Yamamoto, "The ATR multilingual speech-to-speech translation system," IEEE Trans. Speech, Audio, Lang. Process., vol.14, no.2, pp. 365-376, Mar. 2006.
- (2006) IEEE Trans. Speech, Audio, Lang. Process. , vol.14 , Issue.2 , pp. 365-376
- Nakamura, S.¹ Markov, K.² Nakaiwa, H.³ Kikui, G.⁴ Kawai, H.⁵ Jitsuhiro, T.⁶ Zhang, J.-S.⁷ Yamamoto, H.⁸ Sumita, E.⁹ Yamamoto, S.¹⁰

25
- 77949915957
- Generacion de una voz sintetica en Castellano basada en HSMM para la Evaluacion Albayzin 2008: Conversion texto a voz
- Bilbao, Spain, Nov. [Online]. Available:
- R. Barra-Chicote, J. Yamagishi, J. Montero, S. King, S. Lutfi, and J. Macias-Guarasa, "Generacion de una voz sintetica en Castellano basada en HSMM para la Evaluacion Albayzin 2008: Conversion texto a voz," in V Jornadas en Tecnologia del Habla (in Spanish), Bilbao, Spain, Nov. 2008, pp. 115-118 [Online]. Available: http://www.cstr.inf.ed.ac.uk/downloads/ publications/ 2008/tts-jth08.pdf
- (2008) V Jornadas en Tecnologia Del Habla (In Spanish) , pp. 115-118
- Barra-Chicote, R.¹ Yamagishi, J.² Montero, J.³ King, S.⁴ Lutfi, S.⁵ MacIas-Guarasa, J.⁶

26
- 33645758767
- HMM-based approach to multilingual speech synthesis
- S. Narayanan and A. Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall
- K. Tokuda, H. Zen, and A. W. Black, "HMM-based approach to multilingual speech synthesis," in Text to Speech Synthesis: New Paradigms and Advances, S. Narayanan and A. Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall, 2004.
- (2004) Text to Speech Synthesis: New Paradigms and Advances
- Tokuda, K.¹ Zen, H.² Black, A.W.³

27
- 0002144369
- Tree-based state tying for high accuracy acoustic modeling
- Workshop, Plainsboro, NJ, Mar.
- S. J. Young, J. J. Odell, and P. C. Woodland, "Tree-based state tying for high accuracy acoustic modeling," in Proc. ARPA Human Lang. Technol. Workshop, Plainsboro, NJ, Mar. 1994, pp. 307-312.
- (1994) Proc. ARPA Human Lang. Technol , pp. 307-312
- Young, S.J.¹ Odell, J.J.² Woodland, P.C.³

28
- 70449126171
- The HTS- 2008 system: Yet another evaluation of the speaker-adaptive HMMbased speech synthesis system in the 2008 Blizzard Challenge
- Brisbane, Australia, Sep.
- J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, and K. Tokuda, "The HTS- 2008 system: Yet another evaluation of the speaker-adaptive HMMbased speech synthesis system in the 2008 Blizzard Challenge," in Proc. Blizzard Challenge 2008, Brisbane, Australia, Sep. 2008.
- (2008) Proc. Blizzard Challenge 2008
- Yamagishi, J.¹ Zen, H.² Wu, Y.-J.³ Toda, T.⁴ Tokuda, K.⁵

29
- 79952269421
- The blizzard challenge 2007
- Bonn, Germany, Aug.
- M. Fraser and S. King, "The Blizzard Challenge 2007," in Proc. BLZ3- 2007 (in Proc. SSW6), Bonn, Germany, Aug. 2007.
- (2007) Proc. BLZ3- 2007 (In Proc. SSW6)
- Fraser, M.¹ King, S.²

30
- 67650790758
- The blizzard challenge 2008
- Brisbane, Australia, Sep.
- V. Karaiskos, S. King, R. A. J. Clark, and C. Mayo, "The Blizzard Challenge 2008," in Proc. Blizzard Challenge 2008, Brisbane, Australia, Sep. 2008.
- (2008) Proc. Blizzard Challenge 2008
- Karaiskos, V.¹ King, S.² Clark, R.A.J.³ Mayo, C.⁴

31
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol.27, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigné, A.³

32
- 33846405723
- Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
- Jan.
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda, "Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol.E90-D, no.1, pp. 325-333, Jan. 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

33
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- May
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system," IEICE Trans. Inf. Syst., vol.E90-D, no.5, pp. 825-834, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

34
- 0002629270
- Maximum likelihood from incomplete data via the em algorithm
- A. Dempster, N. Laird, and D. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Statist. Soc., Series B, vol.39, no.1, pp. 1-38, 1977.
- (1977) J. R. Statist. Soc., Series B , vol.39 , Issue.1 , pp. 1-38
- Dempster, A.¹ Laird, N.² Rubin, D.³

35
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- Mar.
- K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Japan (E), vol.21, pp. 79-86, Mar. 2000.
- (2000) J. Acoust. Soc. Japan (E) , vol.21 , pp. 79-86
- Shinoda, K.¹ Watanabe, T.²

36
- 77953719894
- Evaluation of flat start labeling for phoneme based Mandarin HTS system
- Aug.
- Y. Guan and J. Tian, "Evaluation of flat start labeling for phoneme based Mandarin HTS system," in Proc. ORIENTAL-COCOSDA-09, Aug. 2009, pp. 187-190.
- (2009) Proc. ORIENTAL-COCOSDA-09 , pp. 187-190
- Guan, Y.¹ Tian, J.²

37
- 0030362995
- A compact model for speaker-adaptive training
- Philadelphia, PA, Oct.
- T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, "A compact model for speaker-adaptive training," in Proc. ICSLP-96, Philadelphia, PA, Oct. 1996, pp. 1137-1140.
- (1996) Proc. ICSLP-96 , pp. 1137-1140
- Anastasakos, T.¹ McDonough, J.² Schwartz, R.³ Makhoul, J.⁴

38
- 0032050110
- Maximum likelihood linear transformations for HMMbased speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMMbased speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
- (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

39
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- Jan.
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Speech, Audio, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
- (2009) IEEE Trans. Speech, Audio, Lang. Process. , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

40
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- May
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol.E90-D, no.5, pp. 816-824, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

41
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech Commun., vol.9, no.5-6, pp. 453-468, 1990.
- (1990) Speech Commun , vol.9 , Issue.5-6 , pp. 453-468
- Moulines, E.¹ Charpentier, F.²

42
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- San Francisco, CA, Mar.
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP-92, San Francisco, CA, Mar. 1992, pp. 137-140.
- (1992) Proc. ICASSP-92 , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

43
- 77953705589
- The blizzard challenge 2009
- Edinburgh, U.K., Sep.
- S. King and V. Karaiskos, "The Blizzard Challenge 2009," in Proc. Blizzard Challenge Workshop, Edinburgh, U.K., Sep. 2009.
- (2009) Proc. Blizzard Challenge Workshop
- King, S.¹ Karaiskos, V.²

44
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- Feb.
- J.Yamagishi and T.Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training," IEICE Trans. Inf. Syst., vol.E90-D, no.2, pp. 533-543, Feb. 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

45
- 70450183638
- Measuring the gap between HMM-based ASR and TTS
- Brighton, U.K., Sep.
- J. Dines, J. Yamagishi, and S. King, "Measuring the gap between HMM-based ASR and TTS," in Proc. Interspeech-09, Brighton, U.K., Sep. 2009, pp. 1391-1394.
- (2009) Proc. Interspeech-09 , pp. 1391-1394
- Dines, J.¹ Yamagishi, J.² King, S.³

46
- 77953728395
- Measuring the gap between HMM-based ASR and TTS
- to be published
- J. Dines, J. Yamagishi, and S. King, "Measuring the gap between HMM-based ASR and TTS," IEEE J. Sel. Topics Signal Process., 2010, to be published.
- (2010) IEEE J. Sel. Topics Signal Process.
- Dines, J.¹ Yamagishi, J.² King, S.³

47
- 0141760645
- 1993 benchmark tests for the ARPA spoken language program
- Morristown, NJ
- D. S. Pallett, J. G. Fiscus, W. M. Fisher, J. S. Garofolo, B. A. Lund, and M. A. Przybocki, "1993 benchmark tests for the ARPA spoken language program," in Proc. HLT '94: Workshop Human Lang. Technol., Morristown, NJ, 1994, pp. 49-74.
- (1994) Proc. HLT '94: Workshop Human Lang. Technol. , pp. 49-74
- Pallett, D.S.¹ Fiscus, J.G.² Fisher, W.M.³ Garofolo, J.S.⁴ Lund, B.A.⁵ Przybocki, M.A.⁶

48
- 60849092922
- Cross-lingual speaker adaptation for HMM-based speech synthesis
- Kunming, China
- Y.-J. Wu, S. King, and K. Tokuda, "Cross-lingual speaker adaptation for HMM-based speech synthesis," in Proc. ISCSLP-08, Kunming, China, 2008, pp. 9-12.
- (2008) Proc. ISCSLP-08 , pp. 9-12
- Wu, Y.-J.¹ King, S.² Tokuda, K.³

49
- 70450192740
- State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
- Brighton, U.K., Sep.
- Y.-J. Wu and K. Tokuda, "State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis," in Proc. Interspeech- 09, Brighton, U.K., Sep. 2009, pp. 528-531.
- (2009) Proc. Interspeech- 09 , pp. 528-531
- Wu, Y.-J.¹ Tokuda, K.²

50
- 0017097474
- Distance measures for speech processing
- Oct.
- J. A. Gray and J. Markel, "Distance measures for speech processing," IEEE Trans. Acoust., Speech, Signal Process., vol.ASSP-24, no.5, pp. 380-391, Oct. 1976.
- (1976) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-24 , Issue.5 , pp. 380-391
- Gray, J.A.¹ Markel, J.²

51
- 0019146354
- Correlation analysis of subjective and objective measures for speech quality
- Denver, CO
- T. P. Barnwell, III, "Correlation analysis of subjective and objective measures for speech quality," in Proc. ICASSP-80, Denver, CO, 1980, pp. 706-709.
- (1980) Proc. ICASSP-80 , pp. 706-709
- Barnwell III, T.P.¹

52
- 0029725605
- Speech synthesis using HMMs with dynamic features
- Atlanta, GA, May
- T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, "Speech synthesis using HMMs with dynamic features," in Proc. ICASSP-96, Atlanta, GA, May 1996, pp. 389-392.
- (1996) Proc. ICASSP-96 , pp. 389-392
- Masuko, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

53
- 70349208664
- Optimizing segment label boundaries for statistical speech synthesis
- Taipei, Taiwan, Apr.
- A. W. Black and J. Kominek, "Optimizing segment label boundaries for statistical speech synthesis," in Proc. ICASSP-09, Taipei, Taiwan, Apr. 2009, pp. 3785-3788.
- (2009) Proc. ICASSP-09 , pp. 3785-3788
- Black, A.W.¹ Kominek, J.²

54
- 67650832556
- Statistical analysis of the Blizzard Challenge 2007 listening test results
- Bonn, Germany, Aug.
- R. A. J. Clark, M. Podsiadlo, M. Fraser, C. Mayo, and S. King, "Statistical analysis of the Blizzard Challenge 2007 listening test results," in Proc. BLZ3-2007 (in Proc. SSW6), Bonn, Germany, Aug. 2007.
- (2007) Proc. BLZ3-2007 (In Proc. SSW6)
- Clark, R.A.J.¹ Podsiadlo, M.² Fraser, M.³ Mayo, C.⁴ King, S.⁵

55
- 33646800617
- Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models
- Jeju Island, Korea, Oct.
- M. Shozakai and G. Nagino, "Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models," in Proc. ICSLP-04, Jeju Island, Korea, Oct. 2004, pp. 717-720.
- (2004) Proc. ICSLP-04 , pp. 717-720
- Shozakai, M.¹ Nagino, G.²

56
- 70449388052
- QMOS-A robust visualization method for speaker dependencies with different microphones
- A. Maier, M. Schuster, U. Eysholdt, T. Haderlein, T. Cincarek, S. Steidl, A. Batliner, S. Wenhardt, and E. Noth, "QMOS-A robust visualization method for speaker dependencies with different microphones," J. Pattern Recognition Res., vol.1, pp. 32-51, 2009.
- (2009) J. Pattern Recognition Res. , vol.1 , pp. 32-51
- Maier, A.¹ Schuster, M.² Eysholdt, U.³ Haderlein, T.⁴ Cincarek, T.⁵ Steidl, S.⁶ Batliner, A.⁷ Wenhardt, S.⁸ Noth, E.⁹

57
- 0003825410
- London U.K.: Chapman & Hall
- T. Cox and M. Cox, Multidimensional Scaling. London, U.K.: Chapman & Hall, 2001.
- (2001) Multidimensional Scaling
- Cox, T.¹ Cox, M.²

58
- 33646781551
- Acoustic training from heterogeneous data sources: Experiments in Mandarin conversational telephone speech transcription
- S. Tsakalidis and W. Byrne, "Acoustic training from heterogeneous data sources: Experiments in Mandarin conversational telephone speech transcription," in Proc. ICASSP-05, 18-23, 2005, vol.1, pp. 461-464.
- (2005) Proc. ICASSP-05, 18-23 , vol.1 , pp. 461-464
- Tsakalidis, S.¹ Byrne, W.²

59
- 77953712724
- Cross-corpus normalization of diverse acoustic training data for robustHMMtraining
- Cambridge, U.K.
- S. Tsakalidis and W. Byrne, "Cross-corpus normalization of diverse acoustic training data for robustHMMtraining," Cambridge Univ. Eng. Dept., Cambridge, U.K., 2005.
- (2005) Cambridge Univ. Eng. Dept.
- Tsakalidis, S.¹ Byrne, W.²

60
- 77953723444
- Reformulating the HMM as a trajectory model
- Dec.
- K. Tokuda, H. Zen, and T. Kitamura, "Reformulating the HMM as a trajectory model," IEICE Tech. Rep. Natural Lang. Understanding Models of Commun., vol.104, no.538, pp. 43-48, Dec. 2004.
- (2004) IEICE Tech. Rep. Natural Lang. Understanding Models of Commun. , vol.104 , Issue.538 , pp. 43-48
- Tokuda, K.¹ Zen, H.² Kitamura, T.³

61
- 77953697940
- Ph.D. dissertation, Univ. Politecnica de Catalunya, Barcelona, Spain
- D. Erro, "Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models," Ph.D. dissertation, Univ. Politecnica de Catalunya, Barcelona, Spain, 2008.
- (2008) Intra-lingual and Cross-lingual Voice Conversion Using Harmonic Plus Stochastic Models
- Erro, D.¹

62
- 84970205467
- Attractive faces are only average
- J. H. Langlois and L. A. Roggman, "Attractive faces are only average," Psychol. Sci., vol.1, no.2, pp. 115-121, 1990.
- (1990) Psychol. Sci. , vol.1 , Issue.2 , pp. 115-121
- Langlois, J.H.¹ Roggman, L.A.²

63
- 77953710433
- Analysis of unsupervised and noise-robust speaker-adaptive HMM-based speech synthesis systems toward a unified ASR and TTS framework
- Edinburgh, U.K., Sep.
- J. Yamagishi, M. Lincoln, S. King, J. Dines, M. Gibson, J. Tian, and Y. Guan, "Analysis of unsupervised and noise-robust speaker-adaptive HMM-based speech synthesis systems toward a unified ASR and TTS framework," in Proc. Blizzard Challenge Workshop, Edinburgh, U.K., Sep. 2009.
- (2009) Proc. Blizzard Challenge Workshop
- Yamagishi, J.¹ Lincoln, M.² King, S.³ Dines, J.⁴ Gibson, M.⁵ Tian, J.⁶ Guan, Y.⁷

64
- 85131821539
- Mel-generalized cepstral analysis-A unified approach to speech spectral estimation
- Yokohama, Japan, Sep.
- K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Mel-generalized cepstral analysis-A unified approach to speech spectral estimation," in Proc. ICSLP-94, Yokohama, Japan, Sep. 1994, pp. 1043-1046.
- (1994) Proc. ICSLP-94 , pp. 1043-1046
- Tokuda, K.¹ Kobayashi, T.² Masuko, T.³ Imai, S.⁴

65
- 84962901028
- Adaptive training for robust ASR
- Madonna di Campiglio, Italy
- M. J. F. Gales, "Adaptive training for robust ASR," in Proc. IEEE Workshop Autom. Speech Recognition Understanding, Madonna di Campiglio, Italy, 2001, pp. 15-20.
- (2001) Proc. IEEE Workshop Autom. Speech Recognition Understanding , pp. 15-20
- Gales, M.J.F.¹

66
- 0036567794
- The development of the HTK broadcast news transcription system: An overview
- P. C. Woodland, "The development of the HTK broadcast news transcription system: An overview," Speech Commun., vol.37, no.1-2, pp. 47-67, 2002.
- (2002) Speech Commun , vol.37 , Issue.1-2 , pp. 47-67
- Woodland, P.C.¹

67
- 77953693885
- Building personalised synthesised voices for individuals with dysarthria using the HTS toolkit
- J. W. Mullennix and S. E. Stern, Eds. Hershey, PA: IGI Global, Jan.
- S. Creer, P. Green, S. Cunningham, and J. Yamagishi, "Building personalised synthesised voices for individuals with dysarthria using the HTS toolkit," in Computer Synthesized Speech Technologies: Tools for Aiding Impairment, J. W. Mullennix and S. E. Stern, Eds. Hershey, PA: IGI Global, Jan. 2010.
- (2010) Computer Synthesized Speech Technologies: Tools for Aiding Impairment
- Creer, S.¹ Green, P.² Cunningham, S.³ Yamagishi, J.⁴

68
- 85135274466
- On the security of HMM-based speaker verification systems against imposture using synthetic speech
- Budapest, Hungary, Sep.
- T. Masuko, T. Hitotsumatsu, K. Tokuda, and T. Kobayashi, "On the security of HMM-based speaker verification systems against imposture using synthetic speech," in Proc. Eurospeech-99, Budapest, Hungary, Sep. 1999, pp. 1223-1226.
- (1999) Proc. Eurospeech-99 , pp. 1223-1226
- Masuko, T.¹ Hitotsumatsu, T.² Tokuda, K.³ Kobayashi, T.⁴

69
- 85009077529
- Imposture using synthetic speech against speaker verification based on spectrum and pitch
- Beijing, China, Oct.
- T. Masuko, K. Tokuda, and T. Kobayashi, "Imposture using synthetic speech against speaker verification based on spectrum and pitch," in Proc. ICSLP-00, Beijing, China, Oct. 2000, pp. 302-305.
- (2000) Proc. ICSLP-00 , pp. 302-305
- Masuko, T.¹ Tokuda, K.² Kobayashi, T.³

70
- 78049409687
- Revisiting the security of speaker verification systems against imposture using synthetic speech
- Dallas, TX, Mar.
- P. L. De Leon, V. R. Apsingekar, M. Pucher, and J. Yamagishi, "Revisiting the security of speaker verification systems against imposture using synthetic speech," in Proc. ICASSP-10, Dallas, TX, Mar. 2010.
- (2010) Proc. ICASSP-10
- De Leon, P.L.¹ Apsingekar, V.R.² Pucher, M.³ Yamagishi, J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.