SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 52, Issue 5, 2010, Pages 394-404

Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech

(5) Barra Chicote, Roberto b Yamagishi, Junichi a King, Simon a Montero, Juan Manuel b Macias Guarasa, Javier c

a UNIVERSITY OF EDINBURGH (United Kingdom)

b UNIVERSIDAD POLITÉCNICA DE MADRID (Spain)

c UNIVERSITY OF ALCALÁ (Spain)

Author keywords

Emotional speech synthesis; HMM based synthesis; Unit selection

Indexed keywords

CONTEXT DEPENDENT; EMOTIONAL SPEECH; EMOTIONAL SPEECH SYNTHESIS; IDENTIFICATION RATES; PERCEPTUAL TEST; PROSODIC MODELING; SPECTRAL MODELING; SPEECH QUALITY; SYNTHETIC SPEECH; TWO-STATE; UNIT SELECTION; UNIT-SELECTION SPEECH SYNTHESIS;

SPEECH ANALYSIS; SPEECH SYNTHESIS;

QUALITY CONTROL;

EID: 77949913458 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2009.12.007 Document Type: Article

Times cited : (63)

References (44)

1
- 77949915957
- Generacion de una voz sintetica en Castellano basada en HSMM para la Evaluacion Albayzin 2008: conversion texto a voz
- (in Spanish)
- Barra-Chicote R., Yamagishi J., Montero J., King S., Lutfi S., and Macias-Guarasa J. Generacion de una voz sintetica en Castellano basada en HSMM para la Evaluacion Albayzin 2008: conversion texto a voz. In V J. Tecnol. Habla (2008) 115-118 (in Spanish)
- (2008) In V J. Tecnol. Habla , pp. 115-118
- Barra-Chicote, R.¹ Yamagishi, J.² Montero, J.³ King, S.⁴ Lutfi, S.⁵ Macias-Guarasa, J.⁶

2
- 77956781171
- Spanish expressive voices: Corpus for emotion research in Spanish
- Barra-Chicote, R., Montero, J., Macias-Guarasa, J., Lufti, S., Lucas, J.M., Fernandez, F., D'haro, L., San-Segundo, R., Ferreiros, J., Cordoba, R., Pardo, J., 2008b. Spanish expressive voices: corpus for emotion research in Spanish. In: Proc. of 6th international conference on Language Resources and Evaluation.
- (2008) Proc. of 6th international conference on Language Resources and Evaluation
- Barra-Chicote, R.¹ Montero, J.² Macias-Guarasa, J.³ Lufti, S.⁴ Lucas, J.M.⁵ Fernandez, F.⁶ D'haro, L.⁷ San-Segundo, R.⁸ Ferreiros, J.⁹ Cordoba, R.¹⁰ Pardo, J.¹¹

3
- 33947617827
- Prosodic and segmental rubrics in emotion identification
- Barra, R., Montero, J., Macias-Guarasa, J., D'Haro, L., San-Segundo, R., Cordoba, R., 2006. Prosodic and segmental rubrics in emotion identification. In: ICASSP 2006, pp. 1085-1088.
- (2006) ICASSP 2006 , pp. 1085-1088
- Barra, R.¹ Montero, J.² Macias-Guarasa, J.³ D'Haro, L.⁴ San-Segundo, R.⁵ Cordoba, R.⁶

4
- 77949917061
- On the limitations of voice conversion techniques in emotion identication tasks
- Barra, R., Montero, J., Macias-Guarasa, J., Gutierrez-Arriola, J., Ferreiros, J., Pardo, J., 2007. On the limitations of voice conversion techniques in emotion identication tasks. In: Proc. Interspeech 2007, pp. 2233-2236.
- (2007) Proc. Interspeech , pp. 2233-2236
- Barra, R.¹ Montero, J.² Macias-Guarasa, J.³ Gutierrez-Arriola, J.⁴ Ferreiros, J.⁵ Pardo, J.⁶

5
- 68249083782
- The Blizzard Challenge 2006
- Bennett, C., Black, A.W., 2006. The Blizzard Challenge 2006. In: Proc. Blizzard Challenge 2006.
- (2006) Proc. Blizzard Challenge
- Bennett, C.¹ Black, A.W.²

6
- 85006631929
- Unit selection and emotional speech
- Black, A.W., 2003. Unit selection and emotional speech. In: Proc. EUROSPEECH 2003, pp. 1649-1652.
- (2003) Proc. EUROSPEECH 2003 , pp. 1649-1652
- Black, A.W.¹

7
- 84966398940
- Optimising selection of units from speech database for concatenative synthesis
- Black, A.W., Cambpbell, N., 1995. Optimising selection of units from speech database for concatenative synthesis. In: Proc. EUROSPEECH-95, pp. 581-584.
- (1995) Proc. EUROSPEECH-95 , pp. 581-584
- Black, A.W.¹ Cambpbell, N.²

8
- 33745216749
- The Blizzard Challenge - 2005: Evaluating corpus-based speech synthesis on common datasets
- Black, A.W., Tokuda, K., 2005. The Blizzard Challenge - 2005: evaluating corpus-based speech synthesis on common datasets. In: Proc. EUROSPEECH 2005, pp.77-80.
- (2005) Proc. EUROSPEECH 2005 , pp. 77-80
- Black, A.W.¹ Tokuda, K.²

9
- 85009247888
- Expressive speech synthesis using a concatenative synthesizer
- Bulut, M., Narayan, S., Syrdal, A., 2002. Expressive speech synthesis using a concatenative synthesizer. In: Proc. ICSLP 2002, pp. 1265-1268.
- (2002) Proc. ICSLP 2002 , pp. 1265-1268
- Bulut, M.¹ Narayan, S.² Syrdal, A.³

10
- 33745202280
- A database of German emotional speech
- Burkhardt, F., Paeschke, A., Rolfes, M., Sendlmeier, W., Weiss, B., 2005. A database of German emotional speech. In: Proc. Interspeech 2005, pp. 1517-1520.
- (2005) Proc. Interspeech , pp. 1517-1520
- Burkhardt, F.¹ Paeschke, A.² Rolfes, M.³ Sendlmeier, W.⁴ Weiss, B.⁵

11
- 70450147157
- Automatic phone segmentation of expressive speech
- Charonnat, L., Vidal, G., Boeffard, O., 2008. Automatic phone segmentation of expressive speech. In: Proc. Language Resources and Evaluation Conference, pp. 2376-2379.
- (2008) Proc. Language Resources and Evaluation Conference , pp. 2376-2379
- Charonnat, L.¹ Vidal, G.² Boeffard, O.³

12
- 77949917246
- Multisyn voice for the Blizzard Challenge 2006
- Clark, R., Richmond, K., Strom, V., King, S., 2006. Multisyn voice for the Blizzard Challenge 2006. In: Proc. Blizzard Challenge Workshop 2006.
- (2006) Proc. Blizzard Challenge Workshop
- Clark, R.¹ Richmond, K.² Strom, V.³ King, S.⁴

13
- 77949915923
- Statistical analysis of the Blizzard Challenge 2007 listening test results
- Clark, R., Podsiadlo, M., Fraser, M., Mayo, C., King, S., 2007a. Statistical analysis of the Blizzard Challenge 2007 listening test results. In: Proc. BLZ3-2007 (in Proc. SSW6).
- (2007) Proc. BLZ3-2007 (in Proc
- Clark, R.¹ Podsiadlo, M.² Fraser, M.³ Mayo, C.⁴ King, S.⁵

14
- 34047123652
- Multisyn: open-domain unit selection for the festival speech synthesis system
- Clark R.A., Richmond K., and King S. Multisyn: open-domain unit selection for the festival speech synthesis system. Speech Comm. 49 4 (2007) 317-330
- (2007) Speech Comm. , vol.49 , Issue.4 , pp. 317-330
- Clark, R.A.¹ Richmond, K.² King, S.³

15
- 0032651722
- A hidden Markov-model-based trainable speech synthesizer
- Donovan R., and Woodland P. A hidden Markov-model-based trainable speech synthesizer. Comput. Speech Lang. 13 3 (1999) 223-241
- (1999) Comput. Speech Lang. , vol.13 , Issue.3 , pp. 223-241
- Donovan, R.¹ Woodland, P.²

16
- 84966356293
- Preservation, identification, and use of emotion in a text-to-speech system
- Eide, E., 2002. Preservation, identification, and use of emotion in a text-to-speech system. In: Proc. IEEE Workshop on Speech Synthesis, pp. 127-130.
- (2002) Proc. IEEE Workshop on Speech Synthesis , pp. 127-130
- Eide, E.¹

17
- 77949915011
- The Blizzard Challenge 2007
- Fraser, M., King, S., 2007. The Blizzard Challenge 2007. In: Proc. BLZ3-2007 (in Proc. SSW6).
- (2007) Proc. BLZ3-2007 (in Proc
- Fraser, M.¹ King, S.²

18
- 56149117453
- Automatic phonetic segmentation of Spanish emotional speech
- Gallardo-Antolin, A., Barra, R., Schröder, M., Krstulovic, S., Montero, J., 2007. Automatic phonetic segmentation of Spanish emotional speech. In: Proc. InterSpeech 2007.
- (2007) Proc. InterSpeech
- Gallardo-Antolin, A.¹ Barra, R.² Schröder, M.³ Krstulovic, S.⁴ Montero, J.⁵

19
- 35048829796
- Gomes, C., Sellmann, M., Es, C.V., Es, H.V., 2004. The challenge of generating spatially balanced scientific experiment designs. In: CP-AI-OR'04, pp. 387-394.
- Gomes, C., Sellmann, M., Es, C.V., Es, H.V., 2004. The challenge of generating spatially balanced scientific experiment designs. In: CP-AI-OR'04, pp. 387-394.

20
- 56149096472
- The IBM expressive speech synthesis system
- Hamza, W., Bakis, R., Eide, E., Picheny, M., Pitrelli, J., 2004. The IBM expressive speech synthesis system. In Proc. ICSLP 2004.
- (2004) Proc. ICSLP
- Hamza, W.¹ Bakis, R.² Eide, E.³ Picheny, M.⁴ Pitrelli, J.⁵

21
- 33745202619
- Informed blending of databases for emotional speech synthesis
- Hofer, G., Richmond, K., Clark, R., 2005. Informed blending of databases for emotional speech synthesis. In: Proc. Interspeech 2005, pp. 501-504.
- (2005) Proc. Interspeech , pp. 501-504
- Hofer, G.¹ Richmond, K.² Clark, R.³

22
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- Hunt, A., Black, A.W., 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. ICASSP-96, pp. 373-376.
- (1996) Proc. ICASSP-96 , pp. 373-376
- Hunt, A.¹ Black, A.W.²

23
- 67650790758
- The Blizzard Challenge 2008
- Brisbane, Australia
- Karaiskos, V., King, S., Clark, R.A.J., Mayo, C., 2008. The Blizzard Challenge 2008. In: Proc. Blizzard Challenge Workshop 2008, Brisbane, Australia.
- (2008) Proc. Blizzard Challenge Workshop
- Karaiskos, V.¹ King, S.² Clark, R.A.J.³ Mayo, C.⁴

24
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds
- Kawahara H., Masuda-Katsuse I., and Cheveigné A. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds. Speech Comm. 27 (1999) 187-207
- (1999) Speech Comm. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigné, A.³

25
- 85086867515
- Emotional speech synthesis: From speech database to TTS
- Montero, J.M., Gutierrez-Arriola, J.M., Palazuelos, S., Enriquez, E., Aguilera, S., Pardo, J.M., 1998. Emotional speech synthesis: from speech database to TTS. In: Proc. ICSLP-98, pp. 923-926.
- (1998) Proc. ICSLP-98 , pp. 923-926
- Montero, J.M.¹ Gutierrez-Arriola, J.M.² Palazuelos, S.³ Enriquez, E.⁴ Aguilera, S.⁵ Pardo, J.M.⁶

26
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- Moulines E., and Charpentier F. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Comm. 9 5-6 (1990) 453-468
- (1990) Speech Comm. , vol.9 , Issue.5-6 , pp. 453-468
- Moulines, E.¹ Charpentier, F.²

27
- 51449114529
- A style control technique for HMM-based expressive speech synthesis
- Nose T., Yamagishi J., and Kobayashi T. A style control technique for HMM-based expressive speech synthesis. IEICE Trans. Inf. Systems E 90-D 9 (2007) 1406-1413
- (2007) IEICE Trans. Inf. Systems E , vol.90 -D , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Kobayashi, T.³

28
- 34047275265
- The IBM expressive text-to-speech synthesis system for American English
- Pitrelli J., Bakis R., Eide E., Fernandez R., Hamza W., and Picheny M. The IBM expressive text-to-speech synthesis system for American English. IEEE Trans. Speech Audio Process. 14 4 (2006) 1099-1108
- (2006) IEEE Trans. Speech Audio Process. , vol.14 , Issue.4 , pp. 1099-1108
- Pitrelli, J.¹ Bakis, R.² Eide, E.³ Fernandez, R.⁴ Hamza, W.⁵ Picheny, M.⁶

29
- 0037384712
- Vocal communication of emotion: a review of research paradigms
- Scherer K.R. Vocal communication of emotion: a review of research paradigms. Speech Comm. 40 1-2 (2003) 227-256
- (2003) Speech Comm. , vol.40 , Issue.1-2 , pp. 227-256
- Scherer, K.R.¹

30
- 84971539709
- Emotional speech synthesis: A review
- Schröder, M., 2001. Emotional speech synthesis: a review. In: Proc. EUROSPEECH 2001, pp. 561-564.
- (2001) Proc. EUROSPEECH 2001 , pp. 561-564
- Schröder, M.¹

31
- 9444257562
- Ph.D. Thesis, Saarland University, Saarland
- Schröder, M., 2004. Speech and emotion research: an overview of research frameworks and a dimensional approach to emotional speech synthesis. Ph.D. Thesis, Saarland University, Saarland.
- (2004) Speech and emotion research: An overview of research frameworks and a dimensional approach to emotional speech synthesis
- Schröder, M.¹

32
- 84867199052
- Investigating Festival's target cost function using perceptual experiments
- Strom, V., King, S., 2008. Investigating Festival's target cost function using perceptual experiments. In: Proc. Interspeech 2008, pp. 1873-1876.
- (2008) Proc. Interspeech , pp. 1873-1876
- Strom, V.¹ King, S.²

33
- 85001632375
- Corpus-based techniques in the AT& T NEXTGEN synthesis system
- Syrdal, A., Wightman, C., Conkie, A., Stylianou, Y., Beutnagel, M., Schroeter, J., Storm, V., Lee, K., Makashay, M., 2000. Corpus-based techniques in the AT& T NEXTGEN synthesis system. In: Proc. ICSLP 2000, pp. 411-416.
- (2000) Proc. ICSLP 2000 , pp. 411-416
- Syrdal, A.¹ Wightman, C.² Conkie, A.³ Stylianou, Y.⁴ Beutnagel, M.⁵ Schroeter, J.⁶ Storm, V.⁷ Lee, K.⁸ Makashay, M.⁹

34
- 29144475179
- Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing
- Tachibana M., Yamagishi J., Masuko T., and Kobayashi T. Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing. IEICE Trans. Inf. Systems E 88-D 11 (2005) 2484-2491
- (2005) IEICE Trans. Inf. Systems E , vol.88 -D , Issue.11 , pp. 2484-2491
- Tachibana, M.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

35
- 33645768204
- A style adaptation technique for speech synthesis using HSMM and suprasegmental features
- Tachibana M., Yamagishi J., Masuko T., and Kobayashi T. A style adaptation technique for speech synthesis using HSMM and suprasegmental features. IEICE Trans. Inf. Systems E 89-D 3 (2006) 1092-1099
- (2006) IEICE Trans. Inf. Systems E , vol.89 -D , Issue.3 , pp. 1092-1099
- Tachibana, M.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

36
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- Toda T., and Tokuda K. A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inf. Systems E 90-D 5 (2007) 816-824
- (2007) IEICE Trans. Inf. Systems E , vol.90 -D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

37
- 77949917834
- Tokuda, K, Zen, H, Yamagishi, J, Masuko, T, Sako, S, Black, A, Nose, T, 2008. The HMM-based speech synthesis system (HTS) Version 2.1
- Tokuda, K., Zen, H., Yamagishi, J., Masuko, T., Sako, S., Black, A., Nose, T., 2008. The HMM-based speech synthesis system (HTS) Version 2.1. .

38
- 70449126171
- The HTS-2008 system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
- Yamagishi, J., Zen, H., Wu, Y.-J., Toda, T., Tokuda, K., 2008. The HTS-2008 system: yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge. In: Proc. Blizzard Challenge 2008.
- (2008) Proc. Blizzard Challenge
- Yamagishi, J.¹ Zen, H.² Wu, Y.-J.³ Toda, T.⁴ Tokuda, K.⁵

39
- 85008006694
- A robust speaker-adaptive HMM-based text-to-speech synthesis
- Yamagishi J., Nose T., Zen H., Ling Z.-H., Toda T., Tokuda K., King S., and Renals S. A robust speaker-adaptive HMM-based text-to-speech synthesis. IEEE Trans. Speech Audio Lang. Process. 17 6 (2009) 1208-1230
- (2009) IEEE Trans. Speech Audio Lang. Process. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

40
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 1999. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. EUROSPEECH-99, pp. 2374-2350.
- (1999) Proc. EUROSPEECH-99 , pp. 2374-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

41
- 7044242284
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- (in Japanese)
- Yoshimura T., Tokuda K., Masuko T., Kobayashi T., and Kitamura T. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. IEICE Trans. J. 83-D-II 11 (2000) 2099-2107 (in Japanese)
- (2000) IEICE Trans. J. , vol.83 -D-II , Issue.11 , pp. 2099-2107
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

42
- 33846405723
- Details of Nitech HMM based speech synthesis system for the Blizzard Challenge 2005
- Zen H., Toda T., Nakamura M., and Tokuda K. Details of Nitech HMM based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Systems E 90-D 1 (2007) 325-333
- (2007) IEICE Trans. Inf. Systems E , vol.90 -D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

43
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- Zen H., Tokuda K., Masuko T., Kobayashi T., and Kitamura T. A hidden semi-Markov model-based speech synthesis system. IEICE Trans. Inf. Systems E 90-D 5 (2007) 825-834
- (2007) IEICE Trans. Inf. Systems E , vol.90 -D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

44
- 67651002140
- Statistical parametric speech synthesis
- Zen H., Tokuda K., and Black A.W. Statistical parametric speech synthesis. Speech Comm. 51 11 (2009) 1039-1064
- (2009) Speech Comm. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.