SCOPUS 정보 검색 플랫폼

Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010

Volumn , Issue , 2010, Pages 418-421

Roles of the average voice in speaker-adaptive HMM-based speech synthesis

(4) Yamagishi, Junichi a Watts, Oliver a King, Simon a Usabaev, Bela b

a UNIVERSITY OF EDINBURGH (United Kingdom)

b UNIVERSITY OF TÜBINGEN (Germany)

Author keywords

Average voice; HMM; Speaker adaptation; Speech synthesis

Indexed keywords

SPEECH SYNTHESIS; SPEECH COMMUNICATION; SPEECH RECOGNITION;

ADAPTATION STRATEGIES; ADAPTIVE HMM; AVERAGE-VOICE; CEPSTRAL; NEGATIVE CORRELATION; SPEAKER ADAPTATION; SYNTHETIC SPEECH; HMM; HMM-BASED SPEECH SYNTHESIS; SPEECH SOUNDS; TRAINING STRATEGY;

SPEECH COMMUNICATION; SPEECH SYNTHESIS;

EID: 79959858171 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (16)

References (19)

1
- 67651002140
- Statistical parametric speech synthesis
- Nov.
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Communication, vol. 51, no. 11, pp. 1039-1064, Nov. 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 85008006694
- A robust speaker-adaptive HMM-based text-to-speech synthesis
- Aug.
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, "A robust speaker-adaptive HMM-based text-to-speech synthesis," IEEE Trans. Speech, Audio & Language Process., vol. 17, no. 6, pp. 1208-1230, Aug. 2009.
- (2009) IEEE Trans. Speech, Audio & Language Process. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

3
- 70450161300
- Thousands of voices for HMM-based speech synthesis
- Brighton, U.K., Sep.
- J. Yamagishi et al., "Thousands of voices for HMM-based speech synthesis," in Proc. Interspeech 2009, Brighton, U.K., Sep. 2009, pp. 420-423.
- (2009) Proc. Interspeech 2009 , pp. 420-423
- Yamagishi, J.¹

4
- 77953708096
- Thousands of voices for HMM-based speech synthesis - Analysis and application of TTS systems built on various ASR corpora
- (in press)
- J. Yamagishi, B. Usabaev, S. King, O. Watts, J. Dines, J. Tian, R. Hu, Y. Guan, K. Oura, K. Tokuda, R. Karhila, and M. Kurimo, "Thousands of voices for HMM-based speech synthesis - Analysis and application of TTS systems built on various ASR corpora," IEEE Trans. Speech, Audio & Language Process., 2010, (in press).
- (2010) IEEE Trans. Speech, Audio & Language Process
- Yamagishi, J.¹ Usabaev, B.² King, S.³ Watts, O.⁴ Dines, J.⁵ Tian, J.⁶ Hu, R.⁷ Guan, Y.⁸ Oura, K.⁹ Tokuda, K.¹⁰ Karhila, R.¹¹ Kurimo, M.¹²

5
- 77953693885
- Building personalised synthesised voices for individuals with dysarthria using the HTS toolkit
- J. W. Mullennix and S. E. Stern, Eds. IGI Global, Jan.
- S. Creer, P. Green, S. Cunningham, and J. Yamagishi, "Building personalised synthesised voices for individuals with dysarthria using the HTS toolkit," in Computer Synthesized Speech Technologies: Tools for Aiding Impairment, J. W. Mullennix and S. E. Stern, Eds. IGI Global, Jan. 2010.
- (2010) Computer Synthesized Speech Technologies: Tools for Aiding Impairment
- Creer, S.¹ Green, P.² Cunningham, S.³ Yamagishi, J.⁴

6
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- DOI 10.1093/ietisy/e90-d.2.533
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training," IEICE Trans. Inf. & Syst., vol. E90-D, no. 2, pp. 533-543, Feb. 2007. (Pubitemid 46279829)
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

7
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- 1
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Speech, Audio & Language Process., vol. 17, no. 1, pp. 66-83, 1 2009.
- (2009) IEEE Trans. Speech, Audio & Language Process. , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

8
- 0141760645
- 1993 benchmark tests for the ARPA spoken language program
- Morristown, NJ, USA: Association for Computational Linguistics
- D. S. Pallett, J. G. Fiscus, W. M. Fisher, J. S. Garofolo, B. A. Lund, and M. A. Przybocki, "1993 benchmark tests for the ARPA spoken language program," in HLT '94: Proceedings of the workshop on Human Language Technology. Morristown, NJ, USA: Association for Computational Linguistics, 1994, pp. 49-74.
- (1994) HLT '94: Proceedings of the Workshop on Human Language Technology , pp. 49-74
- Pallett, D.S.¹ Fiscus, J.G.² Fisher, W.M.³ Garofolo, J.S.⁴ Lund, B.A.⁵ Przybocki, M.A.⁶

9
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds," Speech Communication, vol. 27, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigné, A.³

10
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Computer Speech and Language, vol. 12, no. 2, pp. 75-98, 1998. (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

11
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- May
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. & Syst., vol. E90-D, no. 5, pp. 816-824, May 2007.
- (2007) IEICE Trans. Inf. & Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

12
- 0012330750
- The design for the wall street journal-based CSR corpus
- Harriman, New York
- D. B. Paul and J. M. Baker, "The design for the wall street journal-based CSR corpus," in Proceedings of the workshop on Speech and Natural Language, Harriman, New York, 1992, pp. 357-362.
- (1992) Proceedings of the Workshop on Speech and Natural Language , pp. 357-362
- Paul, D.B.¹ Baker, J.M.²

13
- 33646800617
- Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models
- Jeju Island, Korea, Oct.
- M. Shozakai and G. Nagino, "Analysis of speaking styles by two-dimensional visualization of aggregate of acoustic models," in Proc. ICSLP 2004, Jeju Island, Korea, Oct. 2004, pp. 717-720.
- (2004) Proc. ICSLP 2004 , pp. 717-720
- Shozakai, M.¹ Nagino, G.²

14
- 70449388052
- QMOS - A robust visualization method for speaker dependencies with different microphones
- A. Maier, M. Schuster, U. Eysholdt, T. Haderlein, T. Cincarek, S. Steidl, A. Batliner, S. Wenhardt, and E. Nöth, "QMOS - a robust visualization method for speaker dependencies with different microphones," Journal of Pattern Recognition Research, vol. 1, pp. 32 - 51, 2009.
- (2009) Journal of Pattern Recognition Research , vol.1 , pp. 32-51
- Maier, A.¹ Schuster, M.² Eysholdt, U.³ Haderlein, T.⁴ Cincarek, T.⁵ Steidl, S.⁶ Batliner, A.⁷ Wenhardt, S.⁸ Nöth, E.⁹

15
- 0003825410
- Chapman and Hall
- T. Cox and M. Cox, Multidimensional Scaling. Chapman and Hall, 2001.
- (2001) Multidimensional Scaling
- Cox, T.¹ Cox, M.²

16
- 77953723444
- Reformulating the HMM as a trajectory model
- Dec.
- K. Tokuda, H. Zen, and T. Kitamura, "Reformulating the HMM as a trajectory model," IEICE technical report. Natural language understanding and models of communication, vol. 104, no. 538, pp. 43-48, Dec. 2004.
- (2004) IEICE Technical Report. Natural Language Understanding and Models of Communication , vol.104 , Issue.538 , pp. 43-48
- Tokuda, K.¹ Zen, H.² Kitamura, T.³

17
- 77953697940
- Ph.D. dissertation, Universitat Politecnica de Catalunya
- D. Erro, "Intra-lingual and cross-lingual voice conversion using harmonic plus stochastic models," Ph.D. dissertation, Universitat Politecnica de Catalunya, 2008.
- (2008) Intra-lingual and Cross-lingual Voice Conversion Using Harmonic Plus Stochastic Models
- Erro, D.¹

18
- 84970205467
- Attractive faces are only average
- J. H. Langlois and L. A. Roggman, "Attractive faces are only average," Psychological Science, vol. 1, no. 2, pp. 115-121, 1990.
- (1990) Psychological Science , vol.1 , Issue.2 , pp. 115-121
- Langlois, J.H.¹ Roggman, L.A.²

19
- 74549192033
- Vocal attractiveness increases by averaging
- L. Bruckert, P. Bestelmeyer, M. Latinus, J. Rouger, I. Charest, G. A. Rousselet, H. Kawahara, and P. Belin, "Vocal attractiveness increases by averaging." Current Biology, vol. 20, no. 2, pp. 116-120, 2010.
- (2010) Current Biology , vol.20 , Issue.2 , pp. 116-120
- Bruckert, L.¹ Bestelmeyer, P.² Latinus, M.³ Rouger, J.⁴ Charest, I.⁵ Rousselet, G.A.⁶ Kawahara, H.⁷ Belin, P.⁸

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.