SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 17, Issue 6, 2009, Pages 1208-1230

Robust Speaker-Adaptive HMM-Based Text-to-Speech Synthesis

(8) Yamagishi, Junichi a King, Simon a Renals, Steve a Nose, Takashi b Zen, Heiga c Tokuda, Keiichi c Ling, Zhen Hua d Toda, Tomoki e

a UNIVERSITY OF EDINBURGH (United Kingdom)

b TOKYO INSTITUTE OF TECHNOLOGY (Japan)

c NAGOYA INSTITUTE OF TECHNOLOGY (Japan)

d UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

e NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

Author keywords

Average voice; HMM Speech Synthesis System; HMM based speech synthesis; HTS; speaker adaptation; speech synthesis; voice conversion

Indexed keywords

EID: 85008006694 PISSN: 15587916 EISSN: 15587924 Source Type: Journal
DOI: 10.1109/TASL.2009.2016394 Document Type: Article

Times cited : (161)

References (77)

1
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- Sep.
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” in Proc. EUROSPEECH-99, Sep. 1999, pp. 2374–2350.
- (1999) Proc. EUROSPEECH-99 , pp. 2374
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

2
- 7044242284
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- in Japanese, Nov.
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” (in Japanese) IEICE Trans., vol. J83-D-II, no. 11, pp. 2099–2107, Nov. 2000.
- (2000) IEICE Trans. , vol.J83-D-II , Issue.11 , pp. 2099-2107
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

3
- 34547526960
- Statistical parametric speech synthesis
- Apr.
- A. Black, H. Zen, and K. Tokuda, “Statistical parametric speech synthesis,” in Proc. ICASSP 2007, Apr. 2007, pp. 1229–1232.
- (2007) Proc. ICASSP 2007 , pp. 1229-1232
- Black, A.¹ Zen, H.² Tokuda, K.³

4
- 79952258981
- [Online]. Available: http://www.hts.sp.nitech.ac.jp/
- K. Tokuda, H. Zen, J. Yamagishi, T. Masuko, S. Sako, A. Black, and T. Nose, The HMM-Based Speech Synthesis System (HTS) Version 2.0.1 [Online]. Available: http://www.hts.sp.nitech.ac.jp/
- The HMM-Based Speech Synthesis System (HTS) Version 2.0.1
- Tokuda, K.¹ Zen, H.² Yamagishi, J.³ Masuko, T.⁴ Sako, S.⁵ Black, A.⁶ Nose, T.⁷

5
- 0028996993
- Speech parameter generation from HMM using dynamic features
- May
- K. Tokuda, T. Kobayashi, and S. Imai, “Speech parameter generation from HMM using dynamic features,” in Proc. ICASSP-95, May 1995, pp. 660–663.
- (1995) Proc. ICASSP-95 , pp. 660-663
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

6
- 0038582234
- An algorithm for speech parameter generation from HMM using dynamic features
- in Japanese, Mar.
- K. Tokuda, T. Masuko, T. Kobayashi, and S. Imai “An algorithm for speech parameter generation from HMM using dynamic features,” (in Japanese) J. Acoust. Soc. Japan, vol. 53, no. 3, pp. 192–200, Mar. 1997.
- (1997) J. Acoust. Soc. Japan , vol.53 , Issue.3 , pp. 192-200
- Tokuda, K.¹ Masuko, T.² Kobayashi, T.³ Imai, S.⁴

7
- 0029725605
- Speech synthesis using HMMs with dynamic features
- May
- T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai, “Speech synthesis using HMMs with dynamic features,” in Proc. ICASSP-96, May 1996, pp. 389–392.
- (1996) Proc. ICASSP-96 , pp. 389-392
- Masuko, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

8
- 0002025578
- HMM-based speech synthesis using dynamic features
- in Japanese, Dec.
- T. Masuko, K. Tokuda, T. Kobayashi, and S. Imai “HMM-based speech synthesis using dynamic features,” (in Japanese) IEICE Trans., vol. J79-D-II, no. 12, pp. 2184–2190, Dec. 1996.
- (1996) IEICE Trans. , vol.J79-D-II , Issue.12 , pp. 2184-2190
- Masuko, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

9
- 0033708106
- Speech parameter generation algorigthms for HMM-based speech synthesis
- Jun.
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorigthms for HMM-based speech synthesis,” in Proc. ICASSP 2000, Jun. 2000, pp. 1315–1318.
- (2000) Proc. ICASSP 2000 , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

10
- 0036522887
- Multi-space probability distribution HMM
- Mar.
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, “Multi-space probability distribution HMM,” IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455–464, Mar. 2002.
- (2002) IEICE Trans. Inf. Syst. , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

11
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- May
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “A hidden semi-Markov model-based speech synthesis system,” IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825–834, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

12
- 0002585974
- Variable duration models for speech
- J. Ferguson, “Variable duration models for speech,” in Proc. Symp. Applicat. Hidden Markov Models to Text and Speech, 1980, pp. 143–179.
- (1980) Proc. Symp. Applicat. Hidden Markov Models to Text and Speech , pp. 143-179
- Ferguson, J.¹

13
- 0022234383
- Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition
- Mar.
- M. Russell and R. Moore, “Explicit modelling of state occupancy in hidden Markov models for automatic speech recognition,” in Proc. ICASSP-85, Mar. 1985, pp. 5–8.
- (1985) Proc. ICASSP-85 , pp. 5-8
- Russell, M.¹ Moore, R.²

14
- 0022685753
- Continuously variable duration hidden Markov models for automatic speech recognition
- S. Levinson, “Continuously variable duration hidden Markov models for automatic speech recognition,” Comput. Speech Lang., vol. 1, no. 1, pp. 29–45, 1986.
- (1986) Comput. Speech Lang. , vol.1 , Issue.1 , pp. 29-45
- Levinson, S.¹

15
- 0029341719
- A mixed excitation LPC vocoder model for low bit rate speech coding
- Jul.
- A. McCree and T. Barnwell, III “A mixed excitation LPC vocoder model for low bit rate speech coding,” IEEE Trans. Speech Audio Process., vol. 3, no. 4, pp. 242–250, Jul. 1995.
- (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.4 , pp. 242-250
- McCree, A.¹ Barnwell, T.²

16
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- Sep.
- H. Kawahara, J. Estill, and O. Fujimura, “Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT,” in Proc. 2nd MAVEBA, Sep. 2001, pp. 13–15.
- (2001) Proc. 2nd MAVEBA , pp. 13-15
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

17
- 0024060644
- Multiband excitation vocoder
- Aug.
- D. W. Griffin and J. S. Lim “Multiband excitation vocoder,” IEEE Trans. Acoust., Speech, Signal Audio Process., vol. 36, no. 8, pp. 1223–1235, Aug. 1988.
- (1988) IEEE Trans. Acoust., Speech, Signal Audio Process. , vol.36 , Issue.8 , pp. 1223-1235
- Griffin, D.W.¹ Lim, J.S.²

18
- 85009097254
- Mixed excitation for HMM-based speech synthesis
- Sep.
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Mixed excitation for HMM-based speech synthesis,” in Proc. Eurospeech'01, Sep. 2001, 22632266.
- (2001) Proc. Eurospeech'01
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

19
- 78049361102
- Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
- in Japanese, Aug.
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis,” (in Japanese) IEICE Trans., vol. J87-D-II, no. 8, pp. 1565–1571, Aug. 2004.
- (2004) IEICE Trans. , vol.J87-D-II , Issue.8 , pp. 1565-1571
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

20
- 33846405723
- Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
- Jan.
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda, “Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005,” IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325–333, Jan. 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

21
- 34547542349
- Improving Arabic HMM based speech synthesis quality
- Sep.
- A.-H. Ossama, A. S. Mahdy, and R. Mohsen, “Improving Arabic HMM based speech synthesis quality,” in Proc. Interspeech 2006, Sep. 2006, pp. 1332–1335.
- (2006) Proc. Interspeech 2006 , pp. 1332-1335
- Ossama, A.-H.¹ Mahdy, A.S.² Mohsen, R.³

22
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- May
- T. Toda and K. Tokuda, “A speech parameter generation algorithm considering global variance for HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816–824, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

23
- 68249104241
- The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006
- Jun.
- H. Zen, T. Toda, and K. Tokuda, “The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006,” IEICE Trans. Inf. Syst., vol. E91-D, no. 6, pp. 1764–1773, Jun. 2008.
- (2008) IEICE Trans. Inf. Syst. , vol.E91-D , Issue.6 , pp. 1764-1773
- Zen, H.¹ Toda, T.² Tokuda, K.³

24
- 67650851754
- USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method
- Sep.
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, “USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method,” in Proc. Blizzard Challenge 2006, Sep. 2006.
- (2006) Proc. Blizzard Challenge 2006
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

25
- 77953693469
- Speaker-independent HMM-based speech synthesis system—HTS-2007 system for the Blizzard Challenge 2007
- Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_008.html, paper 003
- J. Yamagishi, H. Zen, T. Toda, and K. Tokuda, “Speaker-independent HMM-based speech synthesis system—HTS-2007 system for the Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_008.html, paper 003.
- (2007) Proc. BLZ3-2007 (in Proc. SSW6)
- Yamagishi, J.¹ Zen, H.² Toda, T.³ Tokuda, K.⁴

26
- 33745216749
- The Blizzard Challenge—2005: Evaluating corpus-based speech synthesis on common datasets
- Sep.
- A. Black and K. Tokuda, “The Blizzard Challenge—2005: Evaluating corpus-based speech synthesis on common datasets,” in Proc. Eurospeech 2005, Sep. 2005, pp. 77–80.
- (2005) Proc. Eurospeech 2005 , pp. 77-80
- Black, A.¹ Tokuda, K.²

27
- 68249083782
- The blizzard challenge 2006
- Sep., [Online]. Available: http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf
- C. Bennett and A. Black, “The blizzard challenge 2006,” in Proc. Blizzard Challenge 2006, Sep. 2006 [Online]. Available: http://festvox.org/blizzard/bc2006/eval_blizzard2006.pdf
- (2006) Proc. Blizzard Challenge 2006
- Bennett, C.¹ Black, A.²

28
- 79952269421
- The Blizzard Challenge 2007
- Aug., [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_001.html, paper 001
- M. Fraser and S. King, “The Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_001.html, paper 001.
- (2007) Proc. BLZ3-2007 (in Proc. SSW6)
- Fraser, M.¹ King, S.²

29
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,” Speech Commun., vol. 27, pp. 187–207, 1999.
- (1999) Speech Commun. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.³

30
- 0032638856
- Semi-tied covariance matrices for hidden Markov models
- Mar.
- M. Gales “Semi-tied covariance matrices for hidden Markov models,” IEEE Trans. Speech Audio Process., vol. 7, pp. 272–281, Mar. 1999.
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , pp. 272-281
- Gales, M.¹

31
- 84892187452
- Maximum likelihood modeling with Gaussian distributions for classfication
- May
- R. Gopinath, “Maximum likelihood modeling with Gaussian distributions for classfication,” in Proc. ICASSP-98, May 1998, pp. 661–664.
- (1998) Proc. ICASSP-98 , pp. 661-664
- Gopinath, R.¹

32
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- Feb.
- J. Yamagishi and T. Kobayashi, “Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training,” IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533–543, Feb. 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

33
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- Jan. 2009
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm,” IEEE Trans. Speech, Audio, Lang. Process., vol. 17, no. 1, pp. 66–83, Jan. 2009, 2007.
- (2007) IEEE Trans. Speech, Audio, Lang. Process , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

34
- 0007985533
- Speaker adaptation for HMM-based speech synthesis system using MLLR
- Nov.
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Speaker adaptation for HMM-based speech synthesis system using MLLR,” in Proc. 3rd ESCA/COCOSDA Workshop Speech Synth., Nov. 1998, pp. 273–276.
- (1998) Proc. 3rd ESCA/COCOSDA Workshop Speech Synth. , pp. 273-276
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

35
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- C. Leggetter and P. Woodland “Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models,” Comput. Speech Lang., vol. 9, no. 2, pp. 171–185, 1995.
- (1995) Comput. Speech Lang. , vol.9 , Issue.2 , pp. 171-185
- Leggetter, C.¹ Woodland, P.²

36
- 0034842740
- Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
- May
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR,” in Proc. ICASSP-01, May 2001, pp. 805–808.
- (2001) Proc. ICASSP-01 , pp. 805-808
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

37
- 85008066911
- Speaker adaptation of pitch and spectrum for HMM-based speech synthesis
- in Japanese, Apr.
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “Speaker adaptation of pitch and spectrum for HMM-based speech synthesis,” (in Japanese) IEICE Trans., vol. J85-D-II, no. 4, pp. 545–553, Apr. 2002.
- (2002) IEICE Trans. , vol.J85-D-II , Issue.4 , pp. 545-553
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

38
- 0030362995
- A compact model for speaker-adaptive training
- Oct.
- T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, “A compact model for speaker-adaptive training,” in Proc. ICSLP-96, Oct. 1996, pp. 1137–1140.
- (1996) Proc. ICSLP-96 , pp. 1137-1140
- Anastasakos, T.¹ McDonough, J.² Schwartz, R.³ Makhoul, J.⁴

39
- 0142007308
- A training method of average voice model for HMM-based speech synthesis
- Aug.
- J. Yamagishi, M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, “A training method of average voice model for HMM-based speech synthesis,” IEICE Trans. Fundamentals, vol. E86-A, no. 8, pp. 1956–1963, Aug. 2003.
- (2003) IEICE Trans. Fundamentals , vol.E86-A , Issue.8 , pp. 1956-1963
- Yamagishi, J.¹ Tamura, M.² Masuko, T.³ Tokuda, K.⁴ Kobayashi, T.⁵

40
- 33645768204
- A style adaptation technique for speech synthesis using HSMM and suprasegmental features
- Mar.
- M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, “A style adaptation technique for speech synthesis using HSMM and suprasegmental features,” IEICE Trans. Inf. Syst., vol. E89-D, no. 3, pp. 1092–1099, Mar. 2006.
- (2006) IEICE Trans. Inf. Syst. , vol.E89-D , Issue.3 , pp. 1092-1099
- Tachibana, M.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

41
- 70350485779
- HMM-based emotional speech synthesis using average emotion model
- Dec.
- L. Qin, Z. Ling, Y. Wu, B. Zhang, and R. Wang, “HMM-based emotional speech synthesis using average emotion model,” in Proc. ISCSLP-06 (Springer LNAI Book), Dec. 2006, pp. 233–240.
- (2006) Proc. ISCSLP-06 (Springer LNAI Book) , pp. 233-240
- Qin, L.¹ Ling, Z.² Wu, Y.³ Zhang, B.⁴ Wang, R.⁵

42
- 33748468338
- New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer
- J. Latorre, K. Iwano, and S. Furui, “New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer,” Speech Commun., vol. 48, no. 10, pp. 1227–1242, 2006.
- (2006) Speech Commun. , vol.48 , Issue.10 , pp. 1227-1242
- Latorre, J.¹ Iwano, K.² Furui, S.³

43
- 0030189744
- Speaker adaptation using combined transformation and Bayesian methods
- Jul.
- V. Digalakis and L. Neumeyer “Speaker adaptation using combined transformation and Bayesian methods,” IEEE Trans. Speech Audio Process., vol. 4, pp. 294–300, Jul. 1996.
- (1996) IEEE Trans. Speech Audio Process. , vol.4 , pp. 294-300
- Digalakis, V.¹ Neumeyer, L.²

44
- 0035279111
- A structural Bayes approach to speaker adaptation
- Mar.
- K. Shinoda and C. Lee, “A structural Bayes approach to speaker adaptation,” IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 276–287, Mar. 2001.
- (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.3 , pp. 276-287
- Shinoda, K.¹ Lee, C.²

45
- 11144317887
- Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency
- Dec.
- D. Arifianto, T. Tanaka, T. Masuko, and T. Kobayashi, “Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency,” IEICE Trans. Inf. Syst., vol. E87-D, no. 12, pp. 2812–2820, Dec. 2004.
- (2004) IEICE Trans. Inf. Syst. , vol.E87-D , Issue.12 , pp. 2812-2820
- Arifianto, D.¹ Tanaka, T.² Masuko, T.³ Kobayashi, T.⁴

46
- 84928118106
- Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity
- Sep.
- H. Kawahara, H. Katayose, A. Cheveigne, and R. Patterson, “Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity,” in Proc. Eurospeech 1999, Sep. 1999, pp. 2781–2784.
- (1999) Proc. Eurospeech 1999 , pp. 2781-2784
- Kawahara, H.¹ Katayose, H.² Cheveigne, A.³ Patterson, R.⁴

47
- 0001455934
- A robust algorithm for pitch tracking (RAPT)
- W. Kleijn and K. Paliwal, Eds. New York: Elsevier
- D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Eds. New York: Elsevier, 1995, pp. 495–518.
- (1995) Speech Coding and Synthesis , pp. 495-518
- Talkin, D.¹

48
- 0003887830
- ESPS Programs Version 5.0. Entropic Research Laboratory Inc., 1993.
- (1993) ESPS Programs Version 5.0

49
- 84966348891
- An HMM-based speech synthesis system applied to English
- Sep.
- K. Tokuda, H. Zen, and A. Black, “An HMM-based speech synthesis system applied to English,” in Proc. IEEE Speech Synth. Workshop, Sep. 2002, pp. 227–230.
- (2002) Proc. IEEE Speech Synth. Workshop , pp. 227-230
- Tokuda, K.¹ Zen, H.² Black, A.³

50
- 0002985991
- Mora and syllable
- N. Tsujimura, Ed. Chichester, U.K.: Blackwell
- H. Kubozono, “Mora and syllable,” in The handbook of Japanese Linguistics, N. Tsujimura, Ed. Chichester, U.K.: Blackwell, 1995, pp. 31–61.
- (1995) The handbook of Japanese Linguistics , pp. 31-61
- Kubozono, H.¹

51
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. Gales “Maximum likelihood linear transformations for HMM-based speech recognition,” Comput. Speech Lang., vol. 12, no. 2, pp. 75–98, 1998.
- (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
- Gales, M.¹

52
- 0029375590
- Speaker adaptation using constrained reestimation of Gaussian mixtures
- Sep.
- V. Digalakis, D. Rtischev, and L. Neumeyer “Speaker adaptation using constrained reestimation of Gaussian mixtures,” IEEE Trans. Speech Audio Process., vol. 3, no. 5, pp. 357–366, Sep. 1995.
- (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.5 , pp. 357-366
- Digalakis, V.¹ Rtischev, D.² Neumeyer, L.³

53
- 85008042245
- Maximum likelihood from incomplete data via the EM algorithm
- A. Dempster, N. Laird, and D. Rubin “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Statist. Soc., Series B, vol. 39, no. 1, pp. 1–38, 1977.
- (1977) J. R. Statist. Soc., Series B , vol.39 , Issue.1 , pp. 1-38
- Dempster, A.¹ Laird, N.² Rubin, D.³

54
- 24144497811
- Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis
- Mar.
- J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, “Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E88-D, no. 3, pp. 503–509, Mar. 2005.
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 503-509
- Yamagishi, J.¹ Onishi, K.² Masuko, T.³ Kobayashi, T.⁴

55
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- Mar.
- K. Shinoda and T. Watanabe, “MDL-based context-dependent subword modeling for speech recognition,” J. Acoust. Soc. Japan (E), vol. 21, pp. 79–86, Mar. 2000.
- (2000) J. Acoust. Soc. Japan (E) , vol.21 , pp. 79-86
- Shinoda, K.¹ Watanabe, T.²

56
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- E. Moulines and F. Charpentier “Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones,” Speech Commun., vol. 9, no. 5–6, pp. 453–468, 1990.
- (1990) Speech Commun. , vol.9 , Issue.5-6 , pp. 453-468
- Moulines, E.¹ Charpentier, F.²

57
- 44949143155
- Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
- Sep.
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, “Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation,” in Proc. Interspeech 2006, Sep. 2006, pp. 2266–2269.
- (2006) Proc. Interspeech 2006 , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

58
- 84959174906
- HMM-based synthesis of child speech
- Oct.
- O. Watts, J. Yamagishi, K. Berkling, and S. King, “HMM-based synthesis of child speech,” in Proc. 1st Workshop Child, Comput., Interaction (ICMI'08 Post-Conf. Workshop), Oct. 2008.
- (2008) Proc. 1st Workshop Child, Comput., Interaction (ICMI'08 Post-Conf. Workshop)
- Watts, O.¹ Yamagishi, J.² Berkling, K.³ King, S.⁴

59
- 34547529978
- Model adaptation approach to speech synthesis with diverse voices and styles
- Apr.
- J. Yamagishi, T. Kobayashi, M. Tachibana, K. Ogata, and Y. Nakano, “Model adaptation approach to speech synthesis with diverse voices and styles,” in Proc. ICASSP-07, Apr. 2007, pp. 1233–1236.
- (2007) Proc. ICASSP-07 , pp. 1233-1236
- Yamagishi, J.¹ Kobayashi, T.² Tachibana, M.³ Ogata, K.⁴ Nakano, Y.⁵

60
- 85008037473
- ATRECSS—ATR English speech corpus for speech synthesis
- Aug.
- J. Ni, T. Hirai, H. Kawai, T. Toda, K. Tokuda, M. Tsuzaki, S. Sakai, R. Maia, and S. Nakamura, “ATRECSS—ATR English speech corpus for speech synthesis,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007.
- (2007) Proc. BLZ3-2007 (in Proc. SSW6)
- Ni, J.¹ Hirai, T.² Kawai, H.³ Toda, T.⁴ Tokuda, K.⁵ Tsuzaki, M.⁶ Sakai, S.⁷ Maia, R.⁸ Nakamura, S.⁹

61
- 0003571407
- Edinburgh, U.K.: Univ. of Edinburgh
- A. Black, P. Taylor, and R. Caley, The Festival Speech Synthesis System. Edinburgh, U.K.: Univ. of Edinburgh, 1999.
- (1999) The Festival Speech Synthesis System
- Black, A.¹ Taylor, P.² Caley, R.³

62
- 0037278070
- An efficient forward-backward algorithm for an explicit-duration hidden Markov model
- Jan.
- S.-Z. Yu and H. Kobayashi, “An efficient forward-backward algorithm for an explicit-duration hidden Markov model,” IEEE Signal Process. Lett., vol. 10, no. 1, pp. 11–14, Jan. 2003.
- (2003) IEEE Signal Process. Lett. , vol.10 , Issue.1 , pp. 11-14
- Yu, S.-Z.¹ Kobayashi, H.²

63
- 0000176621
- On the complexity of explicit duration HMM's
- May
- C. Mitchell, M. Harper, and L. Jamieson “On the complexity of explicit duration HMM's,” IEEE Trans. Speech Audio Process., vol. 3, no. 3, pp. 213–217, May 1995.
- (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.3 , pp. 213-217
- Mitchell, C.¹ Harper, M.² Jamieson, L.³

64
- 33947110905
- State duration modeling for HMM-based speech synthesis
- Mar.
- H. Zen, K. Tokuda, T. Masuko, T. Yoshimura, T. Kobayashi, and T. Kitamura, “State duration modeling for HMM-based speech synthesis,” IEICE Trans. Inf. Syst., vol. E90-D, no. 3, pp. 692–693, Mar. 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.3 , pp. 692-693
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Yoshimura, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

65
- 69849091128
- Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm Nagoya Inst
- TR-SP-0001, Dec., Tech. Rep.
- H. Zen, Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm Nagoya Inst. of Technol., TR-SP-0001, Dec. 2007, Tech. Rep.
- (2007) Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm Nagoya Inst. of Technol.
- Zen, H.¹

66
- 67650832556
- Statistical analysis of the Blizzard Challenge 2007 listening test results
- Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_003.html, paper 003
- R. Clark, M. Podsiadlo, M. Fraser, C. Mayo, and S. King, “Statistical analysis of the Blizzard Challenge 2007 listening test results,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_003.html, paper 003.
- (2007) Proc. BLZ3-2007 (in Proc. SSW6)
- Clark, R.¹ Podsiadlo, M.² Fraser, M.³ Mayo, C.⁴ King, S.⁵

67
- 85008031526
- The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007
- Aug., [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_017.html, paper 017
- Z.-H. Ling, L. Qin, H. Lu, Y. Gao, L.-R. Dai, R.-H. Wang, Y. Jiang, Z.-W. Zhao, J.-H.Y.J. Chen, and G.-P. Hu, “The USTC and iFlytek speech synthesis systems for Blizzard Challenge 2007,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox.org/blizzard/bc2007/blizzard_2007/blz3_017.html, paper 017.
- (2007) Proc. BLZ3-2007 (in Proc. SSW6)
- Ling, Z.-H.¹ Qin, L.² Lu, H.³ Gao, Y.⁴ Dai, L.-R.⁵ Wang, R.-H.⁶ Jiang, Y.⁷ Zhao, Z.-W.⁸ Chen, J.-H.Y.J.⁹ Hu, G.-P.¹⁰

68
- 51449101140
- Festival Multisyn voices for the 2007 Blizzard Challenge
- Aug., [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_006.html, paper 006
- K. Richmond, V. Strom, R. Clark, J. Yamagishi, and S. Fitt, “Festival Multisyn voices for the 2007 Blizzard Challenge,” in Proc. BLZ3-2007 (in Proc. SSW6), Aug. 2007 [Online]. Available: http://festvox. org/blizzard/bc2007/blizzard_2007/blz3_006.html, paper 006.
- (2007) Proc. BLZ3-2007 (in Proc. SSW6)
- Richmond, K.¹ Strom, V.² Clark, R.³ Yamagishi, J.⁴ Fitt, S.⁵

69
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- May
- A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Proc. ICASSP-96, May 1996, pp. 373–376.
- (1996) Proc. ICASSP-96 , pp. 373-376
- Hunt, A.¹ Black, A.²

70
- 34547503417
- HMM-based unit selection using frame sized speech segments
- Sep.
- Z.-H. Ling and R.-H. Wang, “HMM-based unit selection using frame sized speech segments,” in Proc. Interspeech 2006, Sep. 2006, pp. 2034–2037.
- (2006) Proc. Interspeech 2006 , pp. 2034-2037
- Ling, Z.-H.¹ Wang, R.-H.²

71
- 34547612590
- HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion
- Apr.
- Z.-H. Ling and R.-H. Wang, “HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion,” in Proc. ICASSP-07, Apr. 2007, pp. 1245–1248.
- (2007) Proc. ICASSP-07 , pp. 1245-1248
- Ling, Z.-H.¹ Wang, R.-H.²

72
- 34047123652
- Multisyn: Open-domain unit selection for the Festival speech synthesis system
- R. A. J. Clark, K. Richmond, and S. King, “Multisyn: Open-domain unit selection for the Festival speech synthesis system,” Speech Commun., vol. 49, no. 4, pp. 317–330, 2007.
- (2007) Speech Commun. , vol.49 , Issue.4 , pp. 317-330
- Clark, R.A.J.¹ Richmond, K.² King, S.³

73
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- May, [Online]. Available: http://festvox.org/blizzard/bc2008/hts_Blizzard2008.pdf
- Y. Wu and R.-H. Wang, “Minimum generation error training for HMM-based speech synthesis,” in Proc. ICASSP-06, May 2006, pp. 89–92 [Online]. Available: http://festvox.org/blizzard/bc2008/hts_Blizzard2008.pdf
- (2006) Proc. ICASSP-06 , pp. 89-92
- Wu, Y.¹ Wang, R.-H.²

74
- 0030166343
- The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences
- C. Benoit, M. Grice, and V. Hazan, “The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences,” Speech Commun., vol. 18, no. 4, pp. 381–392, 1996.
- (1996) Speech Commun. , vol.18 , Issue.4 , pp. 381-392
- Benoit, C.¹ Grice, M.² Hazan, V.³

75
- 85030493378
- Synthesis of regional English using a keyword lexicon
- Sep.
- S. Fitt and S. Isard, “Synthesis of regional English using a keyword lexicon,” in Proc. Eurospeech 1999, Sep. 1999, vol. 2, pp. 823–826.
- (1999) Proc. Eurospeech 1999 , vol.2 , pp. 823-826
- Fitt, S.¹ Isard, S.²

76
- 70449126171
- Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
- Sep.
- J. Yamagishi, H. Zen, Y.-J. Wu, T. Toda, and K. Tokuda, “The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge,” in Proc. Blizzard Challenge 2008, Sep. 2008.
- (2008) Proc. Blizzard Challenge 2008
- Yamagishi, J.¹ Zen, H.² Wu, Y.-J.³ Toda, T.⁴ Tokuda, K.⁵

77
- 67650803663
- Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning
- Feb., [Online]. Available: http://www. langtech.it/en/poster/03_AYLETT.pdf
- M. Aylett and J. Yamagishi, “Combining statistical parametric speech synthesis and unit-selection for automatic voice cloning,” in Proc. LangTech 2008, Feb. 2008 [Online]. Available: http://www. langtech.it/en/poster/03_AYLETT.pdf
- (2008) Proc. LangTech 2008
- Aylett, M.¹ Yamagishi, J.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.