메뉴 건너뛰기




Volumn 48, Issue 6, 2006, Pages 716-726

Modeling stylized invariance and local variability of prosody in text-to-speech synthesis

Author keywords

Local variability; Prosody; Soft prediction; Stylized invariance; Text to speech; Unit selection

Indexed keywords

ALGORITHMS; COMPUTER SIMULATION; DATABASE SYSTEMS; MATHEMATICAL MODELS; STATISTICAL METHODS;

EID: 33646469069     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2005.10.003     Document Type: Article
Times cited : (13)

References (33)
  • 1
    • 33646473238 scopus 로고    scopus 로고
    • Beckman, M., Ayers Elam, G., 1997. Guidelines for ToBI Labeling, Version 3, March 1997.
  • 2
    • 0007956471 scopus 로고
    • Perception of segmental duration
    • Cohen A., and Nooteboom S. (Eds), Springer
    • Carlson R., and Granström B. Perception of segmental duration. In: Cohen A., and Nooteboom S. (Eds). Structure and Process in Speech Perception (1975), Springer 90-106
    • (1975) Structure and Process in Speech Perception , pp. 90-106
    • Carlson, R.1    Granström, B.2
  • 3
    • 0032073761 scopus 로고    scopus 로고
    • An RNN-based prosodic information synthesizer for Mandarin text-to-speech
    • Chen S., Hwang S., and Wang Y. An RNN-based prosodic information synthesizer for Mandarin text-to-speech. IEEE Trans. Speech Audio Process. 6 3 (1998) 226-239
    • (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.3 , pp. 226-239
    • Chen, S.1    Hwang, S.2    Wang, Y.3
  • 4
    • 21444431930 scopus 로고    scopus 로고
    • Locating boundaries for prosodic constituents in unrestricted Mandarin texts
    • Chu M., and Qian Y. Locating boundaries for prosodic constituents in unrestricted Mandarin texts. Comput. Linguistics Chinese Lang. Process. 6 1 (2001) 61-82
    • (2001) Comput. Linguistics Chinese Lang. Process. , vol.6 , Issue.1 , pp. 61-82
    • Chu, M.1    Qian, Y.2
  • 5
    • 0034840906 scopus 로고    scopus 로고
    • Chu, M., Peng, H., Yang, H.Y., Chang, E., 2001a. Selecting non-uniform units from a very large corpus for concatenative speech synthesizer. In: Proc. ICASSP'01, Salt Lake City.
  • 6
    • 33646475428 scopus 로고    scopus 로고
    • Chu, M., Peng, H., Chang, E., 2001b. A concatenative Mandarin TTS system without prosody model and prosody modification. In: Proc. 4th ISCA Workshop on Speech Synthesis, Scotland.
  • 7
    • 0141480034 scopus 로고    scopus 로고
    • Chu, M., Peng, H., Zhao, Y., Niu, Z.Y., Chang, E., 2003. Microsoft Mulan-a bilingual TTS system. In: Proc. ICASSP'03, Hong Kong.
  • 8
    • 33745222311 scopus 로고    scopus 로고
    • Dogil, G., Möbius, B., 2001. Towards a model of target oriented production of prosody. In: Proc. Eurospeech 2001, Copenhagen.
  • 9
    • 33646491173 scopus 로고    scopus 로고
    • Donovan, R.E., Eide, E.M., 1998. The IBM trainable speech synthesis system. In: Proc. ICSLP'98, Sydney.
  • 10
    • 0030373836 scopus 로고    scopus 로고
    • Fant, G., Kruckenberg, A., 1996. On the quantal nature of speech timing. In: Proc. ICSLP'96, Philadelphia.
  • 11
    • 0022896756 scopus 로고    scopus 로고
    • Fujisaki, H., Hirose, K., Takahashi, N., Morikawa, H., 1986. Acoustic characteristics and the underlying rules of intonation of the common Japanese used by radio and TV announcers. In: Proc. ICASSP'86, pp. 2039-2042.
  • 12
    • 33646484382 scopus 로고    scopus 로고
    • Guenther, F.H., 1995. A modeling framework for speech motor development and kinematic articulator contral. In: Proc. 13th International Congress of Phonetic Sciences, Stockholm 2, pp. 92-99.
  • 13
    • 0030145829 scopus 로고    scopus 로고
    • Training intonational phrasing rules automatically for English and Spanish text-to-speech
    • Hirschberg J., and Prieto P. Training intonational phrasing rules automatically for English and Spanish text-to-speech. Speech Commun. 18 (1996) 281-290
    • (1996) Speech Commun. , vol.18 , pp. 281-290
    • Hirschberg, J.1    Prieto, P.2
  • 14
    • 0030374958 scopus 로고    scopus 로고
    • Huang, X.D., Acero, A., Adcock, J., et al., 1996. Whistler: a trainable text-to-speech system. In: Proc. ICSLP'96, Philadelphia.
  • 15
    • 0015319333 scopus 로고
    • Just noticeable differences for segment duration in natural speech
    • Huggins A.W.F. Just noticeable differences for segment duration in natural speech. J. Acoust. Soc. Am. 51 4 (1972) 1270-1278
    • (1972) J. Acoust. Soc. Am. , vol.51 , Issue.4 , pp. 1270-1278
    • Huggins, A.W.F.1
  • 16
    • 33646476156 scopus 로고    scopus 로고
    • Kato, H., Tsuzaki, M., Sagisaka, Y., 1998. Effects of phonetic quality and duration on perceptual acceptability of temporal changes in speech. In: Proc. ICSLP'98, Sydney.
  • 17
    • 0141588508 scopus 로고    scopus 로고
    • Klatt, D.H., 1982. The Klattalk text-to-speech conversion system. In: Proc. ICASSP'82, pp. 1589-1592.
  • 18
    • 0037290439 scopus 로고    scopus 로고
    • Prosody modeling with soft templates
    • Kochanski G., and Shih C. Prosody modeling with soft templates. Speech Commun. 39 (2003) 311-352
    • (2003) Speech Commun. , vol.39 , pp. 311-352
    • Kochanski, G.1    Shih, C.2
  • 20
    • 33646490297 scopus 로고    scopus 로고
    • Mayer, J., Wildgruber, D., Riecker, A., Dogil, G., Ackermann, H., Grodd, W., 2002. Prosody production and perception: converging evidence from fMRI studies. In: Proc. Speech Prosody 2002, Aix-en-Provence, pp. 487-490.
  • 21
    • 0025543906 scopus 로고
    • Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • Moulines E., and Charpentier F. Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones. Speech Commun. 9 (1990) 453-467
    • (1990) Speech Commun. , vol.9 , pp. 453-467
    • Moulines, E.1    Charpentier, F.2
  • 22
    • 33646472448 scopus 로고    scopus 로고
    • Möbius, B., Dogil, G., 2002, Phonemic and postural effects on the production of prosody. In: Proc. Speech Prosody 2002, Aix-en-Provence.
  • 23
    • 0003008756 scopus 로고
    • A hierarchical stochastic model for automatic prediction of prosodic boundary location
    • Ostendorf M., and Veilleux N. A hierarchical stochastic model for automatic prediction of prosodic boundary location. Comput. Linguistics 20 1 (1994) 27-54
    • (1994) Comput. Linguistics , vol.20 , Issue.1 , pp. 27-54
    • Ostendorf, M.1    Veilleux, N.2
  • 25
    • 0032665603 scopus 로고    scopus 로고
    • A dynamical system model for generating fundamental frequency for speech synthesis
    • Ross K.N., and Ostendorf M. A dynamical system model for generating fundamental frequency for speech synthesis. IEEE Trans. Speech Audio Process. 7 3 (1999) 295-309
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.3 , pp. 295-309
    • Ross, K.N.1    Ostendorf, M.2
  • 26
    • 33646474731 scopus 로고    scopus 로고
    • Shadle, C.H., Damper, R.I., 2001. Prospects for articulatory synthesis: a position paper. In: Proc. 4th ISCA Tutorial and Research Workshop on Speech Synthesis, Perthshire.
  • 27
    • 84936526529 scopus 로고
    • On the quantal nature of speech
    • Stevens K.N. On the quantal nature of speech. J. Phonetics 17 (1989) 3-45
    • (1989) J. Phonetics , vol.17 , pp. 3-45
    • Stevens, K.N.1
  • 28
    • 33646488651 scopus 로고    scopus 로고
    • Stylianou, Y., Dutoit, T., Schroeter, J., 1997. Diphone concatenation using a harmonic plus noise model of speech. In: Proc. Eurospeech, Rhodes, pp. 613-616.
  • 29
    • 0032047079 scopus 로고    scopus 로고
    • Assigning phrase breaks from part-of-speech sequences
    • Taylor P., and Black A.W. Assigning phrase breaks from part-of-speech sequences. Computer Speech Lang. 12 (1998) 99-117
    • (1998) Computer Speech Lang. , vol.12 , pp. 99-117
    • Taylor, P.1    Black, A.W.2
  • 30
    • 0034008810 scopus 로고    scopus 로고
    • Analysis and synthesis of intonation using the Tilt model
    • Taylor P.A. Analysis and synthesis of intonation using the Tilt model. J. Acoust. Soc. Am. 107 3 (2000) 1697-1714
    • (2000) J. Acoust. Soc. Am. , vol.107 , Issue.3 , pp. 1697-1714
    • Taylor, P.A.1
  • 31
    • 33646476859 scopus 로고    scopus 로고
    • Wang, M.Q., Hirschberg, J., 1991. Predicting intonational phrasing from text. In: Proc. Association for Computational Linguistics 29th annual meeting, pp. 285-292.
  • 33
    • 33646493382 scopus 로고    scopus 로고
    • Young, S., Kershaw, D., Odell, J., Ollason, D., Valtchev, V., Woodland, P., 2000. The HTK Book (for HTK version 3.0).


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.