SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 48, Issue 9, 2006, Pages 1057-1078

Tone-Group F0 selection for modeling focus prominence in small-footprint speech synthesis

(2) Xydas, Gerasimos a Kouroupetroglou, Georgios a

a UNIVERSITY OF ATHENS (Greece)

Author keywords

Intonation and emphasis in speech synthesis; Text to speech synthesis; Tone Group unit selection

Indexed keywords

DATABASE SYSTEMS; HUMAN ENGINEERING; MATHEMATICAL MODELS; OPTIMIZATION; REGRESSION ANALYSIS; SPEECH ANALYSIS;

HUMAN SPEECH; INTONATION AND EMPHASIS IN SPEECH SYNTHESIS; TONE GROUP UNIT SELECTION; TONE GROUP UNITS;

SPEECH SYNTHESIS;

EID: 33746431355 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2006.02.002 Document Type: Article

Times cited : (5)

References (62)

1
- 0029342671
- Automatic pitch contour stylization using a model of tonal perception
- d'Alessandro C., and Mertens P. Automatic pitch contour stylization using a model of tonal perception. Comput. Speech Language 9 (1995) 257-288
- (1995) Comput. Speech Language , vol.9 , pp. 257-288
- d'Alessandro, C.¹ Mertens, P.²

2
- 33646268734
- Intonational analysis and prosodic annotation of Greek spoken corpora
- Sun-Ah Jun (Ed), Oxford University Press
- Arvaniti A., and Baltazani M. Intonational analysis and prosodic annotation of Greek spoken corpora. In: Sun-Ah Jun (Ed). Prosodic Typology: The Phonology of Intonation and Phrasing (2005), Oxford University Press 84-117
- (2005) Prosodic Typology: The Phonology of Intonation and Phrasing , pp. 84-117
- Arvaniti, A.¹ Baltazani, M.²

3
- 0031627266
- Stability of tonal alignment: the case of Greek prenuclear accents
- Arvaniti A., Ladd D.R., and Mennen I. Stability of tonal alignment: the case of Greek prenuclear accents. J. Phonetics 26 (1998) 3-25
- (1998) J. Phonetics , vol.26 , pp. 3-25
- Arvaniti, A.¹ Ladd, D.R.² Mennen, I.³

4
- 33746471050
- Aulanko, R., 1985. Microprosodic features in speech: experiments on Finnish. In: Aaltonen, O., Hulkko, T. (Eds.), Fonetiikan Paivat Turku 1985, Publications of the Department of Finnish and General Linguistics of the University of Turku, pp. 33-54.

5
- 21844440585
- SFC: A trainable prosodic model
- Bailly G., and Holm B. SFC: A trainable prosodic model. Speech Comm. 46 (2005) 364-384
- (2005) Speech Comm. , vol.46 , pp. 364-384
- Bailly, G.¹ Holm, B.²

6
- 33746447512
- Beutnagel, M., Conkie, A., Schroeter, J., Stylianou, Y., Sydral, A., 1999. The AT&T Next-Gen TTS system. In: Proc. Joint Meeting of ASA, EAA and DAGA, Berling, Germany, pp. 18-24.

7
- 85006631929
- Black, A.W., 2003. Unit Selection and Emotional Speech. In: Proc. EUROSPEECH-2003, Geneva, Switzerland, pp. 1649-1652.

8
- 0030355540
- 0 contours from the ToBI labels using linear regression. In: Proc. ICSLP-96, Philadelphia, USA, Vol. 3, pp. 1385-1388.

9
- 84966301419
- Black, A.W., Lenzo, K.A., 2000a. Limited domain synthesis. In: Proc. ICSLP-2000, Beijing, China, Vol. 2, pp. 411-414.

10
- 33746443185
- Black, A.W., Lenzo, K.A., 2000b. Building voices in the Festival speech synthesis System. Available from: .

11
- 33746415062
- Black, A.W., Lenzo, K.A., 2001. Flite: a small fast run-time synthesis engine. In: Proc. SSW4 - 4th ISCA Workshop on Speech Synthesis, pp. 204-207.

12
- 0142153902
- Optimal utterance selection for unit selection speech synthesis databases
- Black A.W., and Lenzo K. Optimal utterance selection for unit selection speech synthesis databases. Internat. J. Speech Technol. 6 4 (2003) 357-363
- (2003) Internat. J. Speech Technol. , vol.6 , Issue.4 , pp. 357-363
- Black, A.W.¹ Lenzo, K.²

13
- 33746444147
- Black, A.W., Taylor, P., Caley, R., 1998. The FESTIVAL speech synthesis system. Available from: .

14
- 85143191594
- Bulyko, I., Ostendorf, M., 2001. Joint prosody prediction and unit selection for concatenative speech synthesis. In: Proc. ICASSP-2001, Vol. 2, pp. 781-784.

15
- 33745192035
- Multilingual personalised information objects
- Multimodal Intelligent Information Presentation. Stock O., and Zancanaro M. (Eds), Springer
- Calder J., Melengoglou A.C., Callaway C., Not E., Pianesi F., Androutsopoulos I., Spyropoulos C., Xydas G., Kouroupetroglou G., and Roussou M. Multilingual personalised information objects. In: Stock O., and Zancanaro M. (Eds). Multimodal Intelligent Information Presentation. Text, Speech and Language Technology Vol. 27 (2005), Springer 177-201
- (2005) Text, Speech and Language Technology , vol.27 , pp. 177-201
- Calder, J.¹ Melengoglou, A.C.² Callaway, C.³ Not, E.⁴ Pianesi, F.⁵ Androutsopoulos, I.⁶ Spyropoulos, C.⁷ Xydas, G.⁸ Kouroupetroglou, G.⁹ Roussou, M.¹⁰

16
- 33746459629
- Campbell, N., 1994. Prosody and the selection of units for concatenation synthesis. In: Proc. SSW2 - 2nd ESCA/IEEE Workshop on Speech Synthesis, NY, USA, pp. 61-64.

17
- 24144437793
- Developments in corpus-based speech synthesis: approaching natural conversational speech
- Campbell N. Developments in corpus-based speech synthesis: approaching natural conversational speech. IEICE Trans. Inf. Syst. E88-D 3 (2005) 376-383
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 376-383
- Campbell, N.¹

18
- 33746458126
- Clark, R., 2003. Generating synthetic pitch contours using prosodic structure. Ph.D. Dissertation, University of Edinburgh.

19
- 33746390000
- Conkie, A., Isard, I., 1994. Optimal coupling of diphones. In: Proc. SSW2 - 2nd ESCA/IEEE Workshop on Speech Synthesis, NY, USA, pp. 119-122.

20
- 33746423764
- Donovan, R., Woodland, P., 1995. Improvements in a HMM-based speech synthesizer. In: Proc. EUROSPEECH-95, Madrid, Spain, Vol. 1, pp. 573-576.

21
- 33746408436
- 0 contours for speech synthesis using the tilt intonation theory. In: Botinis, A., Kouroupetroglou, G., Carayiannis, G. (Eds.), Intonation: Theory, Models and Applications. Proc. ESCA Workshop, Athens, pp. 107-110.

22
- 0003834176
- Kluwer Academic Publishers, Dordrecht
- Dutoit T. An Introduction to Text-to-Speech Synthesis (1997), Kluwer Academic Publishers, Dordrecht
- (1997) An Introduction to Text-to-Speech Synthesis
- Dutoit, T.¹

23
- 0030355972
- Dutoit, T., Pagel, V., Pierret, N., Bataille, F., Van Der Vreken, O., 1996. The MBROLA Project: towards a set of high-quality speech synthesizers free of use for non-commercial purposes. In: Proc. ICSLP-96, Philadelphia, Vol. 3, pp. 1393-1396.

24
- 33746445740
- Eide, E., Aaron, A., Bakis, R., Hamza, W., Picheny, M., Pitrelli, J., 2003. A corpus-based approach to expressive speech synthesis. In: Proc. SSW5 - 5th ISCA ITRW on Speech Synthesis, Pittsburgh, PA, USA, pp 79-84.

25
- 0032618167
- Acoustic characteristics of Greek vowels
- Fourakis M., Botinis A., and Katsaiti M. Acoustic characteristics of Greek vowels. Phonetica 56 1-2 (1999) 28-43
- (1999) Phonetica , vol.56 , Issue.1-2 , pp. 28-43
- Fourakis, M.¹ Botinis, A.² Katsaiti, M.³

26
- 0037380186
- The role of voice quality in communicating emotion, mood and attitude
- Gobl C., and Chasaide A.N. The role of voice quality in communicating emotion, mood and attitude. Speech Comm. 40 1-2 (2003) 189-212
- (2003) Speech Comm. , vol.40 , Issue.1-2 , pp. 189-212
- Gobl, C.¹ Chasaide, A.N.²

27
- 0003788784
- Cambridge University Press, Cambridge
- 't Hart J., Collier R., and Cohen A. A Perceptual Study of Intonation - An Experimental-Phonetic Approach to Speech Melody (1990), Cambridge University Press, Cambridge
- (1990) A Perceptual Study of Intonation - An Experimental-Phonetic Approach to Speech Melody
- 't Hart, J.¹ Collier, R.² Cohen, A.³

28
- 33746410399
- Hitzeman, J., Black, A.W., Mellish, C., Oberlander, J., Poesio, M., Taylor, P., 1999. An annotation scheme for concept-to-speech synthesis. In: Proc. 7th European Workshop on Natural Language Generation, Toulouse, France, pp. 59-66.

29
- 33746448144
- Huang, X., Acero, A., Adcock, J., Hon, H., Goldsmith, J., Liu, J., Plumpe, M., 1996. Whistler: a trainable text-to-speech system. In: Proc. ICSLP-96, Philadelphia, PA, pp. 659-662.

30
- 0029765811
- Hunt, A., Black, A.W., 1996. Unit selection in a concatenative speech synthesis system using a large speech database. In: Proc. ICASSP-96, Vol. 1, pp. 373-376.

31
- 85050787615
- How much prosody can you learn from twenty utterances?
- Keller E., and Keller B.Z. How much prosody can you learn from twenty utterances?. Linguistik Online 17 5 (2003) 57-79
- (2003) Linguistik Online , vol.17 , Issue.5 , pp. 57-79
- Keller, E.¹ Keller, B.Z.²

32
- 85009179208
- Kishore, S.P., Black, A.W., 2003. Unit size in unit selection speech synthesis. In: Proc. EUROSPEECH-2003, Geneva, Switzerland, pp. 1317-1320.

33
- 84928453855
- Intonational phrasing: the case for recursive prosodic structure
- Ladd D.R. Intonational phrasing: the case for recursive prosodic structure. Phonology 3 (1986) 311-340
- (1986) Phonology , vol.3 , pp. 311-340
- Ladd, D.R.¹

34
- 0004251776
- MIT Press, Cambridge, MA
- Lieberman P. Intonation, Perception and Language (1967), MIT Press, Cambridge, MA
- (1967) Intonation, Perception and Language
- Lieberman, P.¹

35
- 33746429364
- Malfrere, F., Dutoit, T., Mertens, P., 1998. Automatic prosody generation using supra-segmental unit selection. In: SSW3 - 3rd ESCA/COCOSDA Workshop on Speech Synthesis, Blue Mountains, Australia, pp. 323-328.

36
- 33746392201
- Meron, J., 2001. Prosodic unit selection using an imitation speech database. In: Proc. SSW4 - 4th ISCA ITRW on Speech Synthesis, Perthshire, Scotland, 113.

37
- 33746385244
- Monaghan, A.I.C., 1992. Extracting microprosodic information from diphones - a simple way to model segmental effects on prosody for synthetic speech. In: ICSLP-1992, Banff, Canada, pp. 1159-1162.

38
- 0025543906
- Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones
- Moulines E., and Charpentier F. Pitch synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Comm. 9 5/6 (1990) 453-467
- (1990) Speech Comm. , vol.9 , Issue.5-6 , pp. 453-467
- Moulines, E.¹ Charpentier, F.²

39
- 33746462569
- Mozziconacci, S.J., 2000. The expression of emotion considered in the framework of an intonation model. In: Proc. ISCA/ITRW on Speech and Emotion, Belfast, Northern Ireland, pp. 45-52.

40
- 33746437840
- Mozziconacci, S., Hermes, D.J., 1999. Role of Intonation Patterns in Conveying Emotion in Speech. In: Proc. Internat. Conf. of Phonetic Sciences, pp. 2001-2004.

41
- 0004276363
- Kluwer Academic Publishers, Dordrecht
- Nespor M., and Vogel I. Prosodic Phonology (1986), Kluwer Academic Publishers, Dordrecht
- (1986) Prosodic Phonology
- Nespor, M.¹ Vogel, I.²

42
- 33746411331
- Pierrehumbert, J.B., 1980. The Phonology and Phonetics of English Intonation. Ph.D. Dissertation, MIT.

43
- 44849128406
- Pitrelli, J.F., Eide, E.M., 2003. Expressive speech synthesis using American English ToBI: questions and contrastive emphasis. In: Proc. IEEE ASRU-2003, pp. 694-699.

44
- 33746449039
- Quazza, S., Donetti, L., Moisa, L., Salza, P.L., 2001. ACTOR: A multilingual unit-selection speech synthesis system. In Proc. SSW4 - 4th ISCA ITRW on Speech Synthesis, Pertshire, Scotland, paper 209.

45
- 84946736935
- 0 modeling and its application to emphasis. In: Proc. IEEE ASRU-2003, pp. 700-705.

46
- 84971539709
- Schroeder, M., 2001. Emotional speech synthesis: a review. In: Proc. EUROSPEECH-2001, Aalborg, Denmark, Vol. 1, pp. 561-564.

47
- 33745203492
- Schweitzer, A., Braunschweiler, N., Klankert, T., Mobius, B., Sauberlich, B., 2003. Restricted unlimited domain synthesis, In Proc. EUROSPEECH-2003, Geneva, Switzerland, pp. 1321-1324.

48
- 0010992336
- On prosodic structure and its relation to syntactic structure
- Fretheim T. (Ed), TAPIR, Trodheim
- Selkirk E. On prosodic structure and its relation to syntactic structure. In: Fretheim T. (Ed). Nordic Prosody 2 (1978), TAPIR, Trodheim
- (1978) Nordic Prosody 2
- Selkirk, E.¹

49
- 33746418764
- Selkirk, E., 1986. On derived domains in sentence phonology. In: Phonology Yearbook, Vol. 3, 371-405.

50
- 33746410398
- Selkirk, E., 1995. The prosodic structure of function words. University of Massachusetts Occasional Papers 18: Papers in Optimality Theory, pp. 439-469.

51
- 33746405937
- Silverman, K., Beckman, M., Pitrelli, J., Ostendorf, M., Wightman, C., Price, P., Pierrehumbert, J., Hirschberg, J., 1992. ToBI: a standard for labeling English prosody. In: Proc. ICSLP-92, pp. 867-870.

52
- 0004161686
- Sproat R. (Ed), Kluwer Academic Publishers, Dordrecht
- In: Sproat R. (Ed). Multilingual Text-to-Speech Synthesis - The Bell Labs Approach (1998), Kluwer Academic Publishers, Dordrecht
- (1998) Multilingual Text-to-Speech Synthesis - The Bell Labs Approach

53
- 0034008810
- Analysis and synthesis of intonation using the Tilt model
- Taylor P. Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107 3 (2000) 1697-1714
- (2000) J. Acoust. Soc. Am. , vol.107 , Issue.3 , pp. 1697-1714
- Taylor, P.¹

54
- 0035155093
- Heterogeneous Relation Graphs as a mechanism for representing linguistic information
- Taylor P., Black A.W., and Caley R. Heterogeneous Relation Graphs as a mechanism for representing linguistic information. Speech Comm. 33 (2001) 153-174
- (2001) Speech Comm. , vol.33 , pp. 153-174
- Taylor, P.¹ Black, A.W.² Caley, R.³

55
- 33746408880
- Vainio, M., 2001. Artificial neural network based prosody models for Finnish Text-to-Speech synthesis. Ph.D. Thesis, University of Helsinki, Department of Phonetics.

56
- 0000083157
- 0 of vowels
- 0 of vowels. J. Phonetics 23 (1995) 349-366
- (1995) J. Phonetics , vol.23 , pp. 349-366
- Whalen, D.H.¹ Levitt, A.G.²

57
- 33746410879
- Wightman, C., Syrdal, A., Stemmer, G., Conkie, A., Beutnagel, M., 2000. Perceptually based automatic prosody labeling and prosodically enriched unit selection improve concatenative speech synthesis. In: Proc. ICSLP-2000, Vol. 2, pp. 71-74.

58
- 0036193374
- Maximum speed of pitch change and how it may relate to speech
- Xub Y., and Sun X. Maximum speed of pitch change and how it may relate to speech. J. Acoust. Soc. Am. 111 3 (2002) 1388-1413
- (2002) J. Acoust. Soc. Am. , vol.111 , Issue.3 , pp. 1388-1413
- Xub, Y.¹ Sun, X.²

59
- 33746398750
- Xydas, G., Kouroupetroglou, G., 2001. The DEMOSTHeNES Speech Composer. In: Proc. SSW4 - 4th ISCA ITRW on Speech Synthesis, Perthshire, Scotland, paper 206, pp. 167-172.

60
- 85009113574
- 0 samples. In: Proc. ICSLP-2004, Vol. 1, pp. 801-804.

61
- 24144446449
- Modeling improved prosody generation from high-level linguistically annotated corpora
- Xydas G., Spiliotopoulos D., and Kouroupetroglou G. Modeling improved prosody generation from high-level linguistically annotated corpora. IEICE Trans. Inf. Syst. E88-D 3 (2005) 510-518
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 510-518
- Xydas, G.¹ Spiliotopoulos, D.² Kouroupetroglou, G.³

62
- 33746466438
- Zervas, P., Fakotakis, N., Kokkinakis, G., 2005. Development of a prosodic database for Greek speech synthesis. In: Proc. SPECOM 2005 - 10th International Conference on Speech and Computer, Patras, Greece, Vol. 2, pp. 603-606.

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.