SCOPUS 정보 검색 플랫폼

Springer Handbooks

Volumn , Issue , 2008, Pages 413-428

Basic Principles of Speech Synthesis

(1) Schroeter, Juergen a

a AT AND T LABS RESEARCH (United States)

Author keywords

Dynamic Time Warping; Spectral Envelope; Speech Synthesis; Synthetic Speech; Vocal Tract

Indexed keywords

EID: 85075916050 PISSN: 25228692 EISSN: 25228706 Source Type: Book Series
DOI: 10.1007/978-3-540-49127-9_19 Document Type: Chapter

Times cited : (12)

References (44)

1
- 0003757962
- Springer, Berlin, Heidelberg
- J.L. Flanagan: Speech Analysis, Synthesis and Perception (Springer, Berlin, Heidelberg 1972) pp. 204– 210, http://www.haskins.yale.edu/featured/heads/ SIMULACRA/kempelen.html
- (1972) Speech Analysis, Synthesis and Perception , pp. 204-210
- Flanagan, J.L.¹

2
- 0042675518
- A synthetic speaker
- H. Dudley, R.R. Riesz, S.A. Watkins: A synthetic speaker, J. Franklin Inst. 227, 739–764 (1939), http://www.bell-labs.com/org/1133/Heritage/Vocoder/
- (1939) J. Franklin Inst. , vol.227 , pp. 739-764
- Dudley, H.¹ Riesz, R.R.² Watkins, S.A.³

3
- 85075941746
- W3C Standard Generalized Markup Language: http://www.w3.org/MarkUp/SGML/

4
- 85075926801
- (http://www.xml.com/pub/a/2004/10/20/ssml.html)
- W3C Speech Synthesis Markup Language Version 1.0: http://www.w3.org/TR/2003/CR-speech-synthesis-20031218/(http://www.xml.com/pub/a/2004/10/20/ssml.html)

5
- 3543008813
- Timing
- ed. by R. Sproat (Springer, New York,) pp
- J.P. van Santen: Timing. In: Multilingual Text-to-Speech Synthesis – The Bell Labs Approach, ed. by R. Sproat (Springer, New York 1998) pp. 115–139
- (1998) Multilingual Text-To-Speech Synthesis – the Bell Labs Approach , pp. 115-139
- van Santen, J.P.¹

6
- 0016952322
- Linguistic use of segmental duration in English: Acoustic and perceptual evidence
- D.H. Klatt: Linguistic use of segmental duration in English: Acoustic and perceptual evidence, J. Acoust. Soc. Am. 59, 1208–1221 (1976)
- (1976) J. Acoust. Soc. Am. , vol.59 , pp. 1208-1221
- Klatt, D.H.¹

7
- 33745196452
- Exemplar-based production of prosody: Evidence from segment and syllable durations
- ed. by B. Bel, I. MarlienISCA, Grenoble
- A. Schweitzer, B. Moebius: Exemplar-based production of prosody: Evidence from segment and syllable durations, Proc. Speech Prosody 2004 (Nara), ed. by B. Bel, I. Marlien (ISCA, Grenoble 2004)
- (2004) Proc. Speech Prosody 2004 (Nara)
- Schweitzer, A.¹ Moebius, B.²

8
- 85119213703
- TOBI: A standard for labeling English prosody, Proc
- pp
- K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, J. Hirschberg: TOBI: A standard for labeling English prosody, Proc. ICSLP’92 Banff (1992) pp. 867–870
- (1992) ICSLP’92 Banff , pp. 867-870
- Silverman, K.¹ Beckman, M.² Pitrelli, J.³ Ostendorf, M.⁴ Wightman, C.⁵ Price, P.⁶ Pierrehumbert, J.⁷ Hirschberg, J.⁸

9
- 0742283611
- The role of quantitative modeling in the study of intonation
- pp
- H. Fujisaki: The role of quantitative modeling in the study of intonation, Proc. Int. Symp. Japanese Prosody (1992) pp. 163–174
- (1992) Proc. Int. Symp. Japanese Prosody , pp. 163-174
- Fujisaki, H.¹

10
- 0347437427
- Numerical simulations of fluid flow in the vocal tract
- pp
- G. Richard, M. Liu, D. Sinder, H. Duncan, Q. Lin, J. Flanagan, S. Levinson, D. Davis, S. Slimon: Numerical simulations of fluid flow in the vocal tract, Proc. of Eurospeech Madrid (1995) pp. 18–21
- (1995) Proc. of Eurospeech Madrid , pp. 18-21
- Richard, G.¹ Liu, M.² Sinder, D.³ Duncan, H.⁴ Lin, Q.⁵ Flanagan, J.⁶ Levinson, S.⁷ Davis, D.⁸ Slimon, S.⁹

11
- 0025316435
- A three-dimensional model of tongue movement based on ultrasound and x-ray mi-crobeam data
- M. Stone: A three-dimensional model of tongue movement based on ultrasound and x-ray mi-crobeam data, J. Acoust. Soc. Am. 87, 2207–2217 (1990)
- (1990) J. Acoust. Soc. Am. , vol.87 , pp. 2207-2217
- Stone, M.¹

12
- 0025739174
- Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels
- T. Baer, J.C. Gore, L.C. Gracco, P.W. Nye: Analysis of vocal tract shape and dimensions using magnetic resonance imaging: Vowels, J. Acoust. Soc. Am. 90, 799–828 (1991)
- (1991) J. Acoust. Soc. Am. , vol.90 , pp. 799-828
- Baer, T.¹ Gore, J.C.² Gracco, L.C.³ Nye, P.W.⁴

13
- 0016940126
- A model of articulatory dynamics and control
- C.H. Coker: A model of articulatory dynamics and control, Proc. IEEE 64, 452–459 (1976)
- (1976) Proc. IEEE , vol.64 , pp. 452-459
- Coker, C.H.¹

14
- 77956779481
- A dynamical approach to gestural patterning in speech production
- E.L. Saltzman, K.G. Munhall: A dynamical approach to gestural patterning in speech production, Ecol. Psychol. 1(4), 333–382 (1989)
- (1989) Ecol. Psychol. , vol.1 , Issue.4 , pp. 333-382
- Saltzman, E.L.¹ Munhall, K.G.²

15
- 0003515694
- Speech coding based on physiological models of speech production
- ed. by S. Furui, M.M. Sondhi (Marcel Dekker, New York,) pp
- J. Schroeter, M.M. Sondhi: Speech coding based on physiological models of speech production. In: Advances in Speech Signal Processing, ed. by S. Furui, M.M. Sondhi (Marcel Dekker, New York 1991) pp. 231–268
- (1991) Advances in Speech Signal Processing , pp. 231-268
- Schroeter, J.¹ Sondhi, M.M.²

16
- 0003874959
- Springer, New York
- J.D. Markel, A.H. Gray: Linear Prediction of Speech (Springer, New York 1976)
- (1976) Linear Prediction of Speech
- Markel, J.D.¹ Gray, A.H.²

17
- 0018986665
- Software for a cascade/parallel formant synthesizer
- D.H. Klatt: Software for a cascade/parallel formant synthesizer, J. Acoust. Soc. Am. 67, 971–995 (1980)
- (1980) J. Acoust. Soc. Am. , vol.67 , pp. 971-995
- Klatt, D.H.¹

18
- 0036711819
- A quasiarticulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn
- H.M. Hanson, K.N. Stevens: A quasiarticulatory approach to controlling acoustic source parameters in a Klatt-type formant synthesizer using HLsyn, J. Acoust. Soc. Am. 112, 1158–1182 (2002)
- (2002) J. Acoust. Soc. Am. , vol.112 , pp. 1158-1182
- Hanson, H.M.¹ Stevens, K.N.²

19
- 21844464776
- Combinatorial issues in text-to-speech synthesis, EuroSpeech ’97 5th European Conference on Speech Communication and
- J.P.H. van Santen: Combinatorial issues in text-to-speech synthesis, EuroSpeech ’97 5th European Conference on Speech Communication and Technology 5, 2511–2514 (1997)
- (1997) Technology , vol.5 , pp. 2511-2514
- van Santen, J.P.H.¹

20
- 85068112784
- Rule synthesis of speech from diadic units
- J.P. Olive: Rule synthesis of speech from diadic units, Proc. ICASSP 77, 568–570 (1977)
- (1977) Proc. ICASSP , vol.77 , pp. 568-570
- Olive, J.P.¹

21
- 0000813409
- Syllables as concatenative phonetic elements
- ed. by A. Bell, J.B. Hooper (North-Holland, New York,) pp
- O. Fujimura, J. Lovins: Syllables as concatenative phonetic elements. In: Syllables and Segments, ed. by A. Bell, J.B. Hooper (North-Holland, New York 1978) pp. 107–120
- (1978) Syllables and Segments , pp. 107-120
- Fujimura, O.¹ Lovins, J.²

22
- 0004161686
- Synthesis
- ed. by R. SproatKluwer Academic, Dordrecht, Chap. 7
- J. Olive, J. van Santen, B. Möbius, C. Shih: Synthesis. In: Multilingual Text-to-Speech Synthesis – The Bell Labs Approach, ed. by R. Sproat (Kluwer Academic, Dordrecht 1998), Chap. 7
- (1998) Multilingual Text-To-Speech Synthesis – the Bell Labs Approach
- Olive, J.¹ van Santen, J.² Möbius, B.³ Shih, C.⁴

23
- 0023756465
- Speech synthesis by rule using an optimal selection of non-uniform synthesis units
- Y. Sagisaka: Speech synthesis by rule using an optimal selection of non-uniform synthesis units, Proc. ICASSP 88, 679–682 (1988)
- (1988) Proc. ICASSP , vol.88 , pp. 679-682
- Sagisaka, Y.¹

24
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- A. Hunt, A.W. Black: Unit selection in a concatenative speech synthesis system using a large speech database, Proc. ICASSP 96, 373–376 (1996)
- (1996) Proc. ICASSP 96 , pp. 373-376
- Hunt, A.¹ Black, A.W.²

25
- 84966398940
- Optimising selection of units from speech databases for concatenative synthesis
- A.W. Black, N. Campbell: Optimising selection of units from speech databases for concatenative synthesis, ESCA Eurospeech 95, 581–584 (1995)
- (1995) ESCA Eurospeech , vol.95 , pp. 581-584
- Black, A.W.¹ Campbell, N.²

26
- 0004244302
- Prentice-Hall, Englewood Cliffs,) pp
- L. Rabiner, B.H. Juang: Fundamentals of Speech Recognition (Prentice-Hall, Englewood Cliffs 1993) pp. 339–341
- (1993) Fundamentals of Speech Recognition , pp. 339-341
- Rabiner, L.¹ Juang, B.H.²

27
- 70349848071
- Join cost for unit selection speech synthesis
- ed. by S. Narayanan, A. AlwanPrentice-Hall, Upper Saddle River, Chap. 3
- J. Vepa, S. King: Join cost for unit selection speech synthesis. In: Text-to-Speech Synthesis – New Paradigms and Advances, Professional Technical Reference, ed. by S. Narayanan, A. Alwan (Prentice-Hall, Upper Saddle River 2004) pp. 35–62, Chap. 3
- (2004) Text-To-Speech Synthesis – New Paradigms and Advances, Professional Technical Reference , pp. 35-62
- Vepa, J.¹ King, S.²

28
- 84924415275
- Toward expressive synthetic speech
- ed. by S. Narayanan, A. AlwanPrentice-Hall, Upper Saddle River, Chap. 11
- E. Eide, R. Bakis, W. Hamza, J.F. Petrelli: Toward expressive synthetic speech. In: Text-to-Speech Synthesis – New Paradigms and Advances, Professional Technical Reference, ed. by S. Narayanan, A. Alwan (Prentice-Hall, Upper Saddle River 2004) pp. 219–248, Chap. 11
- (2004) Text-To-Speech Synthesis – New Paradigms and Advances, Professional Technical Reference , pp. 219-248
- Eide, E.¹ Bakis, R.² Hamza, W.³ Petrelli, J.F.⁴

29
- 85133526552
- Automatically clustering similar units for unit selection in speech synthesis
- A.W. Black, P. Taylor: Automatically clustering similar units for unit selection in speech synthesis, Proc. Eurospeech 97, 601–604 (1997)
- (1997) Proc. Eurospeech 97 , pp. 601-604
- Black, A.W.¹ Taylor, P.²

30
- 33645758767
- An HMM-based approach to multilingual speech synthesis
- ed. by S. Narayanan, A. AlwanPrentice-Hall, Upper Saddle River, Chap. 7
- K. Tokuda, H. Zen, A.W. Black: An HMM-based approach to multilingual speech synthesis. In: Text-to-Speech Synthesis – New Paradigms and Advances, Professional Technical Reference, ed. by S. Narayanan, A. Alwan (Prentice-Hall, Upper Saddle River 2004) pp. 135–153, Chap. 7
- (2004) Text-To-Speech Synthesis – New Paradigms and Advances, Professional Technical Reference , pp. 135-153
- Tokuda, K.¹ Zen, H.² Black, A.W.³

31
- 0003834176
- Kluwer Academic, Dordrecht
- T. Dutoit: An Introduction to Text-to-Speech Synthesis (Kluwer Academic, Dordrecht 1997)
- (1997) An Introduction to Text-To-Speech Synthesis
- Dutoit, T.¹

32
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- E. Moulines, F. Charpentier: Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones, Speech Commun. 9(5-6), 453–467 (1990)
- (1990) Speech Commun , vol.9 , Issue.5-6 , pp. 453-467
- Moulines, E.¹ Charpentier, F.²

33
- 3643119290
- Intelligibility as a function of speech coding method for template-based speech synthesis
- M. Macchi, M.J. Altom, D. Kahn, S. Singhal, M. Spiegel: Intelligibility as a function of speech coding method for template-based speech synthesis, Proc. Eurospeech 93, 893–896 (1993)
- (1993) Proc. Eurospeech , vol.93 , pp. 893-896
- Macchi, M.¹ Altom, M.J.² Kahn, D.³ Singhal, S.⁴ Spiegel, M.⁵

34
- 0003637864
- Elsevier, Amsterdam
- W. Kleijn, K. Paliwal (Eds.): Speech Coding and Synthesis (Elsevier, Amsterdam 1995)
- (1995) Speech Coding and Synthesis
- Kleijn, W.¹ Paliwal, K.²

35
- 0026830163
- Shape invariant time-scale and pitch modification of speech
- T.F. Quartieri, R.J. McAulay: Shape invariant time-scale and pitch modification of speech, IEEE Trans. Signal Process. 40(3), 497–510 (1992)
- (1992) IEEE Trans. Signal Process. , vol.40 , Issue.3 , pp. 497-510
- Quartieri, T.F.¹ McAulay, R.J.²

36
- 0035127703
- Applying the harmonic plus noise model in concatenative speech synthesis
- Y. Stylianou: Applying the harmonic plus noise model in concatenative speech synthesis, IEEE Trans. Speech Audio Process. 9(1), 21–29 (2001)
- (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.1 , pp. 21-29
- Stylianou, Y.¹

37
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, A. de Cheveigne: Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, Speech Commun. 27(3-4), 187–207 (1999)
- (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² de Cheveigne, A.³

38
- 0023739214
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, H. Kuwahara: Voice conversion through vector quantization, Proc. IEEE ICASSP 88, 655–658 (1990), S14.1
- (1990) Proc. IEEE ICASSP , vol.88 , Issue.655-658 , pp. S14.1
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwahara, H.⁴

39
- 77951059248
- Ecole Nationole Supérieure des Télécommu-nications, Paris, in French
- I. Stylianou: Modèles Harmoniques plus Bruit combines avec des Méthodes Statistiques, pour la Modication de la Parole et du Locuteur, Doctoral Thesis (Ecole Nationole Supérieure des Télécommu-nications, Paris 1996), in French
- (1996) Modèles Harmoniques plus Bruit Combines Avec Des Méthodes Statistiques, Pour La Modication De La Parole Et Du Locuteur, Doctoral Thesis
- Stylianou, I.¹

40
- 0031623661
- Spectral voice conversion for text-to-speech synthesis
- A. Kain, M. Macon: Spectral voice conversion for text-to-speech synthesis, Proc. IEEE ICASPP 98, 285–288 (1998)
- (1998) Proc. IEEE ICASPP , vol.98 , pp. 285-288
- Kain, A.¹ Macon, M.²

41
- 0025475690
- Comprehensive assessment of the telephone intelligibility of synthesized and natural speech
- M.F. Spiegel, M.J. Altom, M.J. Macchi: Comprehensive assessment of the telephone intelligibility of synthesized and natural speech, Speech Commun. 9, 279–291 (1990)
- (1990) Speech Commun , vol.9 , pp. 279-291
- Spiegel, M.F.¹ Altom, M.J.² Macchi, M.J.³

42
- 85075924369
- A. Syrdal: Development of a standard for the evaluation of intelligibility of text-to-speech synthesis systems by ANSI Accredited Standards Committee S3, Bioacoustics, working group S3/WG 91, Text-to-Speech Synthesis Systems, Personal communication (2007)
- (2007) Development of a Standard for the Evaluation of Intelligibility of Text-To-Speech Synthesis Systems by ANSI Accredited Standards Committee S3, Bioacoustics, Working Group S3/WG 91, Text-To-Speech Synthesis Systems, Personal Communication
- Syrdal, A.¹

43
- 0012392720
- International Telecommunications Union, Geneva, Recommendation
- ITU-T: A Method for Subjective Performance Assessment of the Quality of Speech Output Devices (International Telecommunications Union, Geneva 1994), Recommendation P.85
- (1994) A Method for Subjective Performance Assessment of the Quality of Speech Output Devices , pp. 85

44
- 85009279016
- The reliability of the ITU-T P.85 standard for the evaluation of text-to-speech systems
- Y.V. Alvarez, M. Huckvale: The reliability of the ITU-T P.85 standard for the evaluation of text-to-speech systems, Proc. ICSLP 2002, 329–332 (2002)
- (2002) Proc. ICSLP 2002 , pp. 329-332
- Alvarez, Y.V.¹ Huckvale, M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.