SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 48, Issue 6, 2006, Pages 716-726

Modeling stylized invariance and local variability of prosody in text-to-speech synthesis

(3) Chu, Min a Zhao, Yong a Chang, Eric a

a MICROSOFT RESEARCH ASIA (China)

Author keywords

Local variability; Prosody; Soft prediction; Stylized invariance; Text to speech; Unit selection

Indexed keywords

ALGORITHMS; COMPUTER SIMULATION; DATABASE SYSTEMS; MATHEMATICAL MODELS; STATISTICAL METHODS;

LOCAL VARIABILITY; PROSODY; SOFT PREDICTION; STYLIZED INVARIANCE; TEXT-TO-SPEECH; UNIT SELECTION;

SPEECH SYNTHESIS;

EID: 33646469069 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2005.10.003 Document Type: Article

Times cited : (13)

References (33)

1
- 33646473238
- Beckman, M., Ayers Elam, G., 1997. Guidelines for ToBI Labeling, Version 3, March 1997.

2
- 0007956471
- Perception of segmental duration
- Cohen A., and Nooteboom S. (Eds), Springer
- Carlson R., and Granström B. Perception of segmental duration. In: Cohen A., and Nooteboom S. (Eds). Structure and Process in Speech Perception (1975), Springer 90-106
- (1975) Structure and Process in Speech Perception , pp. 90-106
- Carlson, R.¹ Granström, B.²

3
- 0032073761
- An RNN-based prosodic information synthesizer for Mandarin text-to-speech
- Chen S., Hwang S., and Wang Y. An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Trans. Speech Audio Process. 6 3 (1998) 226-239
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.3 , pp. 226-239
- Chen, S.¹ Hwang, S.² Wang, Y.³

4
- 21444431930
- Locating boundaries for prosodic constituents in unrestricted Mandarin texts
- Chu M., and Qian Y. Locating boundaries for prosodic constituents in unrestricted Mandarin texts. Comput. Linguistics Chinese Lang. Process. 6 1 (2001) 61-82
- (2001) Comput. Linguistics Chinese Lang. Process. , vol.6 , Issue.1 , pp. 61-82
- Chu, M.¹ Qian, Y.²

5
- 0034840906
- Chu, M., Peng, H., Yang, H.Y., Chang, E., 2001a. Selecting non-uniform units from a very large corpus for concatenative speech synthesizer. In: Proc. ICASSP'01, Salt Lake City.

6
- 33646475428
- Chu, M., Peng, H., Chang, E., 2001b. A concatenative Mandarin TTS system without prosody model and prosody modification. In: Proc. 4th ISCA Workshop on Speech Synthesis, Scotland.

7
- 0141480034
- Chu, M., Peng, H., Zhao, Y., Niu, Z.Y., Chang, E., 2003. Microsoft Mulan-a bilingual TTS system. In: Proc. ICASSP'03, Hong Kong.

8
- 33745222311
- Dogil, G., Möbius, B., 2001. Towards a model of target oriented production of prosody. In: Proc. Eurospeech 2001, Copenhagen.

9
- 33646491173
- Donovan, R.E., Eide, E.M., 1998. The IBM trainable speech synthesis system. In: Proc. ICSLP'98, Sydney.

10
- 0030373836
- Fant, G., Kruckenberg, A., 1996. On the quantal nature of speech timing. In: Proc. ICSLP'96, Philadelphia.

11
- 0022896756
- Fujisaki, H., Hirose, K., Takahashi, N., Morikawa, H., 1986. Acoustic characteristics and the underlying rules of intonation of the common Japanese used by radio and TV announcers. In: Proc. ICASSP'86, pp. 2039-2042.

12
- 33646484382
- Guenther, F.H., 1995. A modeling framework for speech motor development and kinematic articulator contral. In: Proc. 13th International Congress of Phonetic Sciences, Stockholm 2, pp. 92-99.

13
- 0030145829
- Training intonational phrasing rules automatically for English and Spanish text-to-speech
- Hirschberg J., and Prieto P. Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Commun. 18 (1996) 281-290
- (1996) Speech Commun. , vol.18 , pp. 281-290
- Hirschberg, J.¹ Prieto, P.²

14
- 0030374958
- Huang, X.D., Acero, A., Adcock, J., et al., 1996. Whistler: a trainable text-to-speech system. In: Proc. ICSLP'96, Philadelphia.

15
- 0015319333
- Just noticeable differences for segment duration in natural speech
- Huggins A.W.F. Just noticeable differences for segment duration in natural speech. J. Acoust. Soc. Am. 51 4 (1972) 1270-1278
- (1972) J. Acoust. Soc. Am. , vol.51 , Issue.4 , pp. 1270-1278
- Huggins, A.W.F.¹

16
- 33646476156
- Kato, H., Tsuzaki, M., Sagisaka, Y., 1998. Effects of phonetic quality and duration on perceptual acceptability of temporal changes in speech. In: Proc. ICSLP'98, Sydney.

17
- 0141588508
- Klatt, D.H., 1982. The Klattalk text-to-speech conversion system. In: Proc. ICASSP'82, pp. 1589-1592.

18
- 0037290439
- Prosody modeling with soft templates
- Kochanski G., and Shih C. Prosody modeling with soft templates. Speech Commun. 39 (2003) 311-352
- (2003) Speech Commun. , vol.39 , pp. 311-352
- Kochanski, G.¹ Shih, C.²

19
- 0345734471
- Linear prediction of speech
- Springer
- Markel J., and Gray J. Linear prediction of speech. Communication and Cybernetics vol. 12 (1976), Springer
- (1976) Communication and Cybernetics , vol.12
- Markel, J.¹ Gray, J.²

20
- 33646490297
- Mayer, J., Wildgruber, D., Riecker, A., Dogil, G., Ackermann, H., Grodd, W., 2002. Prosody production and perception: converging evidence from fMRI studies. In: Proc. Speech Prosody 2002, Aix-en-Provence, pp. 487-490.

21
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- Moulines E., and Charpentier F. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9 (1990) 453-467
- (1990) Speech Commun. , vol.9 , pp. 453-467
- Moulines, E.¹ Charpentier, F.²

22
- 33646472448
- Möbius, B., Dogil, G., 2002, Phonemic and postural effects on the production of prosody. In: Proc. Speech Prosody 2002, Aix-en-Provence.

23
- 0003008756
- A hierarchical stochastic model for automatic prediction of prosodic boundary location
- Ostendorf M., and Veilleux N. A hierarchical stochastic model for automatic prediction of prosodic boundary location. Comput. Linguistics 20 1 (1994) 27-54
- (1994) Comput. Linguistics , vol.20 , Issue.1 , pp. 27-54
- Ostendorf, M.¹ Veilleux, N.²

24
- 0000678652
- A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss
- Perkell J.S., Guenther F.H., Lane H., Matthies M.L., Perrier P., Vick J., Wilhelms-Tricarico R., and Zandipour M. A theory of speech motor control and supporting data from speakers with normal hearing and with profound hearing loss. J. Phonetics 28 3 (2000) 233-272
- (2000) J. Phonetics , vol.28 , Issue.3 , pp. 233-272
- Perkell, J.S.¹ Guenther, F.H.² Lane, H.³ Matthies, M.L.⁴ Perrier, P.⁵ Vick, J.⁶ Wilhelms-Tricarico, R.⁷ Zandipour, M.⁸

25
- 0032665603
- A dynamical system model for generating fundamental frequency for speech synthesis
- Ross K.N., and Ostendorf M. A dynamical system model for generating fundamental frequency for speech synthesis. IEEE Trans. Speech Audio Process. 7 3 (1999) 295-309
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.3 , pp. 295-309
- Ross, K.N.¹ Ostendorf, M.²

26
- 33646474731
- Shadle, C.H., Damper, R.I., 2001. Prospects for articulatory synthesis: a position paper. In: Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire.

27
- 84936526529
- On the quantal nature of speech
- Stevens K.N. On the quantal nature of speech. J. Phonetics 17 (1989) 3-45
- (1989) J. Phonetics , vol.17 , pp. 3-45
- Stevens, K.N.¹

28
- 33646488651
- Stylianou, Y., Dutoit, T., Schroeter, J., 1997. Diphone concatenation using a harmonic plus noise model of speech. In: Proc. Eurospeech, Rhodes, pp. 613-616.

29
- 0032047079
- Assigning phrase breaks from part-of-speech sequences
- Taylor P., and Black A.W. Assigning phrase breaks from part-of-speech sequences. Computer Speech Lang. 12 (1998) 99-117
- (1998) Computer Speech Lang. , vol.12 , pp. 99-117
- Taylor, P.¹ Black, A.W.²

30
- 0034008810
- Analysis and synthesis of intonation using the Tilt model
- Taylor P.A. Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107 3 (2000) 1697-1714
- (2000) J. Acoust. Soc. Am. , vol.107 , Issue.3 , pp. 1697-1714
- Taylor, P.A.¹

31
- 33646476859
- Wang, M.Q., Hirschberg, J., 1991. Predicting intonational phrasing from text. In: Proc. Association for Computational Linguistics 29th annual meeting, pp. 285-292.

32
- 0028518062
- Automatic labeling of prosodic patterns
- Wightman C.W., and Ostendorf M. Automatic labeling of prosodic patterns. IEEE Trans. Speech Audio Process. 2 4 (1994) 469-481
- (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 469-481
- Wightman, C.W.¹ Ostendorf, M.²

33
- 33646493382
- Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., 2000. The HTK Book (for HTK version 3.0).

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.