SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 14, Issue 4, 2006, Pages 1099-1108

The IBM expressive text-to-speech synthesis system for american english

(6) Pitrelli, John F a,b Bakis, Raimo b Eide, Ellen M b Fernandez, Raul b Hamza, Wael b Picheny, Michael A a,b

b IBM T J WATSON RESEARCH CENTER (United States)

Author keywords

Corpus driven text to speech (TTS); Expressive speech synthesis; Prosodic phonology; Text to speech (TTS); Tones and break indices (ToBI)

Indexed keywords

CORPUS DRIVEN TEXT TO SPEECH (TTS); EXPRESSIVE SPEECH SYNTHESIS; PROSODIC PHONOLOGY; TEXT TO SPEECH (TTS); TONES AND BREAK INDICES (TOBI);

DATA RECORDING; DECISION THEORY; HUMAN COMPUTER INTERACTION; MARKUP LANGUAGES; MATHEMATICAL MODELS;

SPEECH SYNTHESIS;

EID: 34047275265 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2006.876123 Document Type: Article

Times cited : (111)

References (28)

1
- 0012925721
- Perceptual bandwidth
- Mar
- B. Reeves and C. Nass, "Perceptual bandwidth," Commun. ACM, vol. 43, no. 3, pp. 65-70, Mar. 2000.
- (2000) Commun. ACM , vol.43 , Issue.3 , pp. 65-70
- Reeves, B.¹ Nass, C.²

2
- 34047248476
- Multi-layered extensions to the speech synthesis markup language for describing expressiveness
- Geneva, Switzerland
- E. Eide, R. Bakis, W. Hamza, and J. Pitrelli, "Multi-layered extensions to the speech synthesis markup language for describing expressiveness," in Proc. Eurospeech, Geneva, Switzerland, 2003, pp. 1645-1648.
- (2003) Proc. Eurospeech , pp. 1645-1648
- Eide, E.¹ Bakis, R.² Hamza, W.³ Pitrelli, J.⁴

3
- 34047256173
- Online, Available
- World Wide Web Consortium (W3C) Speech Synthesis Markup Language (SSML) Specification, [Online]. Available: http://www.w3.org/TR/speech- symhesis.
- World Wide Web Consortium (W3C) Speech Synthesis Markup Language (SSML) Specification

4
- 85001632375
- Corpus-based techniques in the AT&T NextGen synthesis system
- Beijing, China
- A. K. Syrdal, C. W. Wightman, A. Conkie, Y. Stylianou, M. Beutnagel, J. Schroeter, V. Strom, K. Lee, and M. J. Makashay, "Corpus-based techniques in the AT&T NextGen synthesis system," in Proc. ICSLP, Beijing, China, 2000, pp. 431-434.
- (2000) Proc. ICSLP , pp. 431-434
- Syrdal, A.K.¹ Wightman, C.W.² Conkie, A.³ Stylianou, Y.⁴ Beutnagel, M.⁵ Schroeter, J.⁶ Strom, V.⁷ Lee, K.⁸ Makashay, M.J.⁹

5
- 84985926077
- Segment Sektion in the L&H realspeak laboratory TTS system
- Beijing, China
- G. Coorman, J. Fackrell, P. Rutten, and B. Van Coile, "Segment Sektion in the L&H realspeak laboratory TTS system," in Proc. ICSLP, Beijing, China, 2000, pp. 395-398.
- (2000) Proc. ICSLP , pp. 395-398
- Coorman, G.¹ Fackrell, J.² Rutten, P.³ Van Coile, B.⁴

6
- 0004131347
- Trainable speech synthesis,
- Ph.D. dissertation, Cambridge Univ. Eng. Dept, Cambridge, U.K
- R. E. Donovan, "Trainable speech synthesis," Ph.D. dissertation, Cambridge Univ. Eng. Dept., Cambridge, U.K., 1996.
- (1996)
- Donovan, R.E.¹

7
- 0003802343
- Monterey, CA: Wadsworth and Brooks/Cole
- L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees. Monterey, CA: Wadsworth and Brooks/Cole, 1984.
- (1984) Classification and Regression Trees
- Breiman, L.¹ Friedman, J.H.² Olshen, R.A.³ Stone, C.J.⁴

8
- 0039885315
- Context dependent vector quantization for continuous speech recognition
- Minneapolis, MN
- L. R. Bahl, P. V. deSouza, P. S. Gopalakrishnan, and M. A. Picheny, "Context dependent vector quantization for continuous speech recognition," in Proc. ICASSP, Minneapolis, MN, 1993, pp. 632-635.
- (1993) Proc. ICASSP , pp. 632-635
- Bahl, L.R.¹ deSouza, P.V.² Gopalakrishnan, P.S.³ Picheny, M.A.⁴

9
- 85135181226
- Improvements in an HMM-based speech synthesiser
- R. Donovan and P. Woodland, "Improvements in an HMM-based speech synthesiser," in Proc. Eurospeech, 1995, pp. 573-576.
- (1995) Proc. Eurospeech , pp. 573-576
- Donovan, R.¹ Woodland, P.²

10
- 85133526552
- Automatically Clustering Similar Units for Unit Selection in Speech Synthesis
- A. W. Black and P. Taylor, "Automatically Clustering Similar Units for Unit Selection in Speech Synthesis," in Proc. Eurospeech, 1997, pp. 601-604.
- (1997) Proc. Eurospeech , pp. 601-604
- Black, A.W.¹ Taylor, P.²

11
- 80051612889
- A new distance measure for costing spectral discontinuities in concatenate speech synthesisers
- Perthshire, U.K
- R. E. Donovan, "A new distance measure for costing spectral discontinuities in concatenate speech synthesisers," in Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire, U.K., 2001, pp. 59-62.
- (2001) Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis , pp. 59-62
- Donovan, R.E.¹

12
- 34047249700
- Intrinsic phone durations are speaker-specific
- Denver, CO
- H. R. Pfitzinger, "Intrinsic phone durations are speaker-specific," in Proc. ICSLP, vol. 2, Denver, CO, 2002, pp. 1113-1116.
- (2002) Proc. ICSLP , vol.2 , pp. 1113-1116
- Pfitzinger, H.R.¹

13
- 33745197584
- Reconciling pronunciation differences between the front-end and back-end in the IBM speech synthesis system
- Jeju, South Korea, Oct
- W. Hamza, R. Bakis, and E. Eide, "Reconciling pronunciation differences between the front-end and back-end in the IBM speech synthesis system," in Proc. ICSLP, Jeju, South Korea, Oct. 2004, pp. 2561-2564.
- (2004) Proc. ICSLP , pp. 2561-2564
- Hamza, W.¹ Bakis, R.² Eide, E.³

14
- 85009274956
- Data-driven segment preselection in the IBM trainable speech synthesis system
- Denver, CO
- W. Hamza and R. Donovan, "Data-driven segment preselection in the IBM trainable speech synthesis system," in Proc. ICSLP, Denver, CO, 2002, pp. 2609-1612.
- (2002) Proc. ICSLP , pp. 2609-1612
- Hamza, W.¹ Donovan, R.²

15
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- Atlanta, GA
- A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. ICASSP, Atlanta, GA, 1996, pp. 373-376.
- (1996) Proc. ICASSP , pp. 373-376
- Hunt, A.¹ Black, A.²

16
- 34047274359
- Re-defining intonation from selected units for nonuniform units based speech synthesis
- Leuven, Belgium
- B. Bozkurt, T. Dutoit, and V. Pagel, "Re-defining intonation from selected units for nonuniform units based speech synthesis," in Proc. SPS-IEEE Benelux Signal Process. Symp., Leuven, Belgium, 2002, pp. 141-144.
- (2002) Proc. SPS-IEEE Benelux Signal Process. Symp , pp. 141-144
- Bozkurt, B.¹ Dutoit, T.² Pagel, V.³

17
- 84889623475
- New York: Wiley
- U. Zölzer, Digital Audio Signal Processing. New York: Wiley, 1997.
- (1997) Digital Audio Signal Processing
- Zölzer, U.¹

18
- 85009247888
- Expressive speech synthesis : Using a concatenative synthesizer
- Denver, CO
- M. Bulut, S. Narayanan, and A. Syrdal, "Expressive speech synthesis : using a concatenative synthesizer," in Proc. ICSLP, Denver, CO, 2002, pp. 1265-1268.
- (2002) Proc. ICSLP , pp. 1265-1268
- Bulut, M.¹ Narayanan, S.² Syrdal, A.³

19
- 84966356293
- Preservation, identification, and use of emotion in a text-to-speech system
- Santa Monica, CA, Sep
- E. Eide, "Preservation, identification, and use of emotion in a text-to-speech system," in Proc. IEEE Workshop on Speech Synthesis, Santa Monica, CA, Sep. 2002.
- (2002) Proc. IEEE Workshop on Speech Synthesis
- Eide, E.¹

20
- 0010051546
- Thousand Oaks, CA: Sage
- J. R. Turner and J. F. Thayer, Introduction to Analysis of Variance. Thousand Oaks, CA: Sage, 2001.
- (2001) Introduction to Analysis of Variance
- Turner, J.R.¹ Thayer, J.F.²

21
- 0030355540
- 0 contours from ToBI labels using linear regression
- Philadelphia, PA, pp
- 0 contours from ToBI labels using linear regression," in Proc. ICSLP, Philadelphia, PA, pp. 1385-1388.
- Proc. ICSLP , pp. 1385-1388
- Black, A.W.¹ Hunt, A.J.²

22
- 85030872484
- Evaluation of prosodic transcription labeling reliability in the ToBI framework
- Yokohama, Japan, Sep
- J. F. Pitrelli, M. E. Beckman, and J. Hirschberg, "Evaluation of prosodic transcription labeling reliability in the ToBI framework," in Proc. ICSLP, vol. I, Yokohama, Japan, Sep. 1994, pp. 123-126.
- (1994) Proc. ICSLP , vol.1 , pp. 123-126
- Pitrelli, J.F.¹ Beckman, M.E.² Hirschberg, J.³

23
- 85009080611
- Inter-transcriber reliability of ToBI prosodie labeling
- Beijing, China
- A. K. Syrdal and J. McGory, "Inter-transcriber reliability of ToBI prosodie labeling," in Proc. ICSLP, Beijing, China., 2000, pp. 235-238.
- (2000) Proc. ICSLP , pp. 235-238
- Syrdal, A.K.¹ McGory, J.²

24
- 85119213703
- TOBI: A standard for labeling english prosody
- Banff, AB, Canada, Oct
- K. Silverman, M. Beckman, J. Pitrelli, M. Ostendorf, C. Wightman, P. Price, J. Pierrehumbert, and J. Hirschberg, "TOBI: A standard for labeling english prosody," in Proc. ICSLP, vol. 2, Banff, AB, Canada, Oct. 1992, pp. 867-870.
- (1992) Proc. ICSLP , vol.2 , pp. 867-870
- Silverman, K.¹ Beckman, M.² Pitrelli, J.³ Ostendorf, M.⁴ Wightman, C.⁵ Price, P.⁶ Pierrehumbert, J.⁷ Hirschberg, J.⁸

25
- 0343353984
- Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events
- A. Conkie, G. Riccardi, and R. C. Rose, "Prosody recognition from speech utterances using acoustic and linguistic based models of prosodic events," in Proc. Eurospeech, 1999, pp. 523-526.
- (1999) Proc. Eurospeech , pp. 523-526
- Conkie, A.¹ Riccardi, G.² Rose, R.C.³

26
- 21844471192
- ToBI or Not ToBI
- Aix-en-Provence, France, pp
- C. W. Wightman, "ToBI or Not ToBI," in Proc. Speech Prosody, Aix-en-Provence, France, pp. 25-29.
- Proc. Speech Prosody , pp. 25-29
- Wightman, C.W.¹

27
- 0035156005
- Automatic ToBI prediction and alignment to speed manual labeling of prosody
- A. K. Syrdal, J. Hirschberg, J. McGory, and M. Beckman, "Automatic ToBI prediction and alignment to speed manual labeling of prosody," Speech Commun., vol. 33, no. 1-2, pp. 135-151, 2001.
- (2001) Speech Commun , vol.33 , Issue.1-2 , pp. 135-151
- Syrdal, A.K.¹ Hirschberg, J.² McGory, J.³ Beckman, M.⁴

28
- 44849128406
- Expressive speech synthesis using American English ToBI: Questions and contrastive emphasis
- St. Thomas, Dec
- J. F. Pitrelli and E. M. Eide, "Expressive speech synthesis using American English ToBI: Questions and contrastive emphasis," in Proc. IEEE ASRU: Automatic Speech Recognition and Understanding Workshop. St. Thomas, Dec., 1-4 2003, pp. 694-699.
- (2003) Proc. IEEE ASRU: Automatic Speech Recognition and Understanding Workshop
- Pitrelli, J.F.¹ Eide, E.M.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.