SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 47, Issue 1-2, 2005, Pages 182-193

Data-driven multimodal synthesis

(2) Carlson, Rolf a Granström, Björn a

a ROYAL INSTITUTE OF TECHNOLOGY (Sweden)

Author keywords

Data driven synthesis; Multimodal synthesis; Speech synthesis

Indexed keywords

DATA ACQUISITION; DATA REDUCTION; KNOWLEDGE BASED SYSTEMS; MATHEMATICAL MODELS;

ACOUSTIC MODELING; DATA-DRIVEN SYNTHESIS; FORMANT-SYNTHESIS SYSTEM; MULTIMODAL SYNTHESIS;

SPEECH SYNTHESIS;

EID: 24144469759 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2005.02.015 Document Type: Article

Times cited : (5)

References (57)

1
- 85135264071
- Formant analysis and synthesis using hidden Markov models
- Acero, A., 1999. Formant analysis and synthesis using hidden Markov models. In: Proc. Eurospeech'99, pp. 1047-1050.
- (1999) Proc. Eurospeech'99 , pp. 1047-1050
- Acero, A.¹

2
- 0003724033
- Cambridge University Press Cambridge MA
- J. Allen, M.S. Hunnicut, and D. Klatt From text to speech: The MITalk system 1987 Cambridge University Press Cambridge MA
- (1987) From Text to Speech: The MITalk System
- Allen, J.¹ Hunnicut, M.S.² Klatt, D.³

3
- 85009255585
- Seeing tongue movements from outside
- Bailly, G., Badin, P., 2002. Seeing tongue movements from outside. In: Proc. ICSLP2002, pp. 1913-1916.
- (2002) Proc. ICSLP2002 , pp. 1913-1916
- Bailly, G.¹ Badin, P.²

4
- 84883424118
- Rule-based visual speech synthesis
- Madrid, Spain
- Beskow, J., 1995. Rule-based visual speech synthesis. In: Proc. 4th European Conf. on Speech Communication and Technology (Eurospeech'95), Madrid, Spain, pp. 299-302.
- (1995) Proc. 4th European Conf. on Speech Communication and Technology (Eurospeech'95) , pp. 299-302
- Beskow, J.¹

5
- 84886905348
- Animation of talking agents
- Rhodos, Greece
- Beskow, J., 1997. Animation of talking agents. In: Proc. Internat. Conf. on Auditory-Visual Speech Processing (AVSP'97), Rhodos, Greece, pp. 149-152.
- (1997) Proc. Internat. Conf. on Auditory-visual Speech Processing (AVSP'97) , pp. 149-152
- Beskow, J.¹

6
- 9444220246
- Doctoral thesis, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden
- Beskow, J., 2003. Talking heads-models and applications for multimodal speech synthesis. Doctoral thesis, Department of Speech, Music and Hearing, KTH, Stockholm, Sweden.
- (2003) Talking Heads-models and Applications for Multimodal Speech Synthesis
- Beskow, J.¹

7
- 4143072802
- Trainable articulatory control models for visual speech synthesis
- J. Beskow Trainable articulatory control models for visual speech synthesis J. Speech Technol. 7 4 2004 335 349
- (2004) J. Speech Technol. , vol.7 , Issue.4 , pp. 335-349
- Beskow, J.¹

8
- 21844452845
- Resynthesis of facial and intraoral articulation from simultaneous measurements
- Barcelona, Spain
- Beskow, J., Engwall, O., Granström, B., 2003. Resynthesis of facial and intraoral articulation from simultaneous measurements. In: Proc. ICPhS 2003, Barcelona, Spain.
- (2003) Proc. ICPhS 2003
- Beskow, J.¹ Engwall, O.² Granström, B.³

9
- 35048862963
- SYNFACE-a talking head telephone for the hearing-impaired
- Miesenberger, K., Klaus, J., Zagler, W., Burger, D., (Eds.)
- Beskow, J., Karlsson, I., Kewley, J., Salvi, G., 2004. SYNFACE-a talking head telephone for the hearing-impaired. In: Miesenberger, K., Klaus, J., Zagler, W., Burger, D., (Eds.), Computers Helping People with Special Needs, pp. 1178-1186.
- (2004) Computers Helping People with Special Needs , pp. 1178-1186
- Beskow, J.¹ Karlsson, I.² Kewley, J.³ Salvi, G.⁴

10
- 0038533317
- Movetrack-a movement tracking system
- Grenoble, France
- Branderud, P., 1985. Movetrack-a movement tracking system. In: Proc. French-Swedish Symposium on Speech, Grenoble, France, pp. 113-122.
- (1985) Proc. French-Swedish Symposium on Speech , pp. 113-122
- Branderud, P.¹

11
- 0030677313
- Video rewrite: Driving visual speech with audio
- Bregler, C., Covell, M., Laney, M., 1997. Video rewrite: Driving visual speech with audio. In: Proc. ACM SIGGRAPH'97, pp. 353-360.
- (1997) Proc. ACM SIGGRAPH'97 , pp. 353-360
- Bregler, C.¹ Covell, M.² Laney, M.³

12
- 84925596359
- Two- and three-dimensional audio-visual speech synthesis
- Terrigal, Australia
- Brooke, N.M., Scott, D.S., 1998. Two- and three-dimensional audio-visual speech synthesis. In: Proc. Internat. Conf. on Auditory-Visual Speech Processing (AVSP'98), Terrigal, Australia, pp. 213-218.
- (1998) Proc. Internat. Conf. on Auditory-visual Speech Processing (AVSP'98) , pp. 213-218
- Brooke, N.M.¹ Scott, D.S.²

13
- 85067593976
- A text-to-speech system based entirely on rules
- Carlson, R., Granström, B., 1976. A text-to-speech system based entirely on rules. In: Proc. ICASSP-76.
- (1976) Proc. ICASSP-76
- Carlson, R.¹ Granström, B.²

14
- 85009069019
- A multi-language text-to-speech module
- Paris, France
- Carlson, R., Granström, B., Hunnicutt, S., 1982. A multi-language text-to-speech module. In: Proc. 7th Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP'82), Paris, France, Vol. 3, pp. 1604-1607.
- (1982) Proc. 7th Internat. Conf. on Acoustics, Speech, and Signal Processing (ICASSP'82) , vol.3 , pp. 1604-1607
- Carlson, R.¹ Granström, B.² Hunnicutt, S.³

15
- 0026372714
- Experiments with voice modelling in speech synthesis
- R. Carlson, B. Granström, and I. Karlsson Experiments with voice modelling in speech synthesis Speech Comm. 10 1991 481 489
- (1991) Speech Comm. , vol.10 , pp. 481-489
- Carlson, R.¹ Granström, B.² Karlsson, I.³

16
- 0001395349
- Experiments with emotive speech-acted utterances and synthesized replicas
- Banff, Canada
- Carlson, R., Granström, B., Nord, L., 1992. Experiments with emotive speech-acted utterances and synthesized replicas. In: Internat. Conf. on Spoken Language Processing, Banff, Canada, pp 671-674.
- (1992) Internat. Conf. on Spoken Language Processing , pp. 671-674
- Carlson, R.¹ Granström, B.² Nord, L.³

17
- 33645586479
- Data-driven formant synthesis
- Carlson, R., Sigvardson, T., Sjölander, A., 2002. Data-driven formant synthesis. In: Fonetik 2002.
- (2002) Fonetik 2002
- Carlson, R.¹ Sigvardson, T.² Sjölander, A.³

18
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- F. Charpentier, and E. Moulines Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones Speech Comm. 9 5/6 1990 435 467
- (1990) Speech Comm. , vol.9 , Issue.56 , pp. 435-467
- Charpentier, F.¹ Moulines, E.²

19
- 0022896754
- Diphone synthesis using an overlap-add technique for speech waveforms concatenation
- Charpentier, F., Stella, M., 1986. Diphone synthesis using an overlap-add technique for speech waveforms concatenation. In: Proc. ICASSP 86, Vol. 3, pp. 2015-2018.
- (1986) Proc. ICASSP 86 , vol.3 , pp. 2015-2018
- Charpentier, F.¹ Stella, M.²

20
- 0001514782
- Modelling coarticulation in synthetic visual speech
- N. Magnenat Thalmann D. Thalmann Springer Verlag Tokyo
- M.M. Cohen, and D.W. Massaro Modelling coarticulation in synthetic visual speech N. Magnenat Thalmann D. Thalmann Models and techniques in computer animation 1993 Springer Verlag Tokyo 139 156
- (1993) Models and Techniques in Computer Animation , pp. 139-156
- Cohen, M.M.¹ Massaro, D.W.²

21
- 0345093720
- Terminal analog synthesis of continuous speech using the diphone method of segment assembly
- N.R. Dixon, and H.D. Maxey Terminal analog synthesis of continuous speech using the diphone method of segment assembly IEEE Trans. AudioElectroacoust. AU-16 1968 40 50
- (1968) IEEE Trans. AudioElectroacoust. , vol.AU-16 , pp. 40-50
- Dixon, N.R.¹ Maxey, H.D.²

22
- 0038194605
- PhD thesis, KTH, Sweden
- Engwall, O., 2002a. Tongue talking-studies in intraoral speech synthesis. PhD thesis, KTH, Sweden.
- (2002) Tongue Talking-studies in Intraoral Speech Synthesis
- Engwall, O.¹

23
- 0038194614
- Evaluation of a system for concatenative articulatory visual speech synthesis
- Engwall, O., 2002b. Evaluation of a system for concatenative articulatory visual speech synthesis. In: Proc. ICSLP 2002.
- (2002) Proc. ICSLP 2002
- Engwall, O.¹

24
- 85009083853
- From real-time MRI to 3D tongue movements
- Engwall, O., 2004. From real-time MRI to 3D tongue movements. In: Proc. ICSLP 2004.
- (2004) Proc. ICSLP 2004
- Engwall, O.¹

25
- 77953828868
- Trainable videorealistic speech animation
- San Antonio, TX
- Ezzat, T., Geiger, G., Poggio, T., 2002. Trainable videorealistic speech animation. In: Proc. ACM SIGGRAPH 2002, San Antonio, TX, pp. 388-398.
- (2002) Proc. ACM SIGGRAPH 2002 , pp. 388-398
- Ezzat, T.¹ Geiger, G.² Poggio, T.³

26
- 85031438802
- Visual speech synthesis with concatenative speech
- Terrigal, Australia
- Hällgren, Å., Lyberg, B., 1998. Visual speech synthesis with concatenative speech. In: Proc. Internat. Conf. on Auditory-Visual Speech Processing (AVSP'98), Terrigal, Australia, pp. 181-183.
- (1998) Proc. Internat. Conf. on Auditory-visual Speech Processing (AVSP'98) , pp. 181-183
- Hällgren, Å.¹ Lyberg, B.²

27
- 84966440972
- Integration of rule-based formant synthesis and waveform concatenation: A hybrid approach to text-to-speech synthesis
- Santa Monica, USA, 11-13 September 2002
- Hertz, S., 2002. Integration of rule-based formant synthesis and waveform concatenation: A hybrid approach to text-to-speech synthesis. In: Proc. IEEE 2002 Workshop on Speech Synthesis, Santa Monica, USA, 11-13 September 2002.
- (2002) Proc. IEEE 2002 Workshop on Speech Synthesis
- Hertz, S.¹

28
- 33645587401
- Data driven formant synthesis
- Högberg, J., 1997. Data driven formant synthesis. In: Proc. Eurospeech 97.
- (1997) Proc. Eurospeech 97
- Högberg, J.¹

29
- 85009141770
- On the correlation between facial movements, tongue movements and speech acoustics
- Jiang, J., Alwan, A., Bernstein, L., Keating, P., Auer, E., 2000. On the correlation between facial movements, tongue movements and speech acoustics. In: Proc. ICSLP2000, Vol. 1, 42-45.
- (2000) Proc. ICSLP2000 , vol.1 , pp. 42-45
- Jiang, J.¹ Alwan, A.² Bernstein, L.³ Keating, P.⁴ Auer, E.⁵

30
- 0141588508
- The Klattalk text-to-speech conversion system
- Klatt, D., 1982. The Klattalk text-to-speech conversion system. In: Proc. ICASSP 82, pp. 1589-1592.
- (1982) Proc. ICASSP 82 , pp. 1589-1592
- Klatt, D.¹

31
- 0023407575
- Review of text-to-speech conversion for English
- D. Klatt Review of text-to-speech conversion for English J. Acoust. Soc. Amer. 82 3 1987 737 793
- (1987) J. Acoust. Soc. Amer. , vol.82 , Issue.3 , pp. 737-793
- Klatt, D.¹

32
- 85133504159
- Automatic modeling of coarticulation in text-to-visual speech synthesis
- Rhodos, Greece
- Le Goff, B., 1997. Automatic modeling of coarticulation in text-to-visual speech synthesis. In: Proc. 5th European Conf. on Speech Communication and Technology (EUROSPEECH'97), Rhodos, Greece, pp. 1667-1670.
- (1997) Proc. 5th European Conf. on Speech Communication and Technology (EUROSPEECH'97) , pp. 1667-1670
- Le Goff, B.¹

33
- 33645604824
- Formant tracking using segmental phonemic information
- Lee, M., van Santen, J., Möbius, B., Olive, J., 1999. Formant tracking using segmental phonemic information. In: Proc. Eurospeech'99, Vol. 6, pp. 2789-2792.
- (1999) Proc. Eurospeech'99 , vol.6 , pp. 2789-2792
- Lee, M.¹ Van Santen, J.² Möbius, B.³ Olive, J.⁴

34
- 0003116759
- Speech as audible gestures
- W.J. Hardcastle A. Marchal Kluwer Academic Publishers Dordrecht
- A. Löfqvist Speech as audible gestures W.J. Hardcastle A. Marchal Speech production and speech modelling 1990 Kluwer Academic Publishers Dordrecht 289 322
- (1990) Speech Production and Speech Modelling , pp. 289-322
- Löfqvist, A.¹

35
- 0025325827
- A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: Rationale, evaluation, and recommendations for use
- A. MacLeod, and Q. Summerfield A procedure for measuring auditory and audio-visual speech-reception thresholds for sentences in noise: Rationale, evaluation, and recommendations for use Br. J. Audiol. 24 1990 29 43
- (1990) Br. J. Audiol. , vol.24 , pp. 29-43
- MacLeod, A.¹ Summerfield, Q.²

36
- 4544357742
- Formant diphone parameter extraction utilising a labeled single speaker database
- Mannell, R.H., 1998. Formant diphone parameter extraction utilising a labeled single speaker database. In: Proc. ICSLP 98.
- (1998) Proc. ICSLP 98
- Mannell, R.H.¹

37
- 33645595381
- Animated speech: Research progress and applications
- E. Vatikiotis-Bateson G. Bailly P. Perrier MIT Press
- D.W. Massaro, M.M. Cohen, M. Tabain, J. Beskow, and R. Clark Animated speech: Research progress and applications E. Vatikiotis-Bateson G. Bailly P. Perrier Audiovisual Speech Processing 2005 MIT Press
- (2005) Audiovisual Speech Processing
- Massaro, D.W.¹ Cohen, M.M.² Tabain, M.³ Beskow, J.⁴ Clark, R.⁵

38
- 85009156064
- A data-driven approach to source-formant type text-to-speech system
- Mori, H., Ohtsuka, T., Kasuya, H., 2002. A data-driven approach to source-formant type text-to-speech system. In: ICSLP-2002, pp. 2365-2368.
- (2002) ICSLP-2002 , pp. 2365-2368
- Mori, H.¹ Ohtsuka, T.² Kasuya, H.³

39
- 0034224125
- Prosynth: An integrated prosodic approach to device-independent, natural-sounding speech synthesis
- R. Ogden, S. Hawkins, J. House, M. Huckvale, J. Local, P. Carter, J. Dankovicova, and S. Heid Prosynth: An integrated prosodic approach to device-independent, natural-sounding speech synthesis Comput. Speech Language 14 2000 177 210
- (2000) Comput. Speech Language , vol.14 , pp. 177-210
- Ogden, R.¹ Hawkins, S.² House, J.³ Huckvale, M.⁴ Local, J.⁵ Carter, P.⁶ Dankovicova, J.⁷ Heid, S.⁸

40
- 33645589461
- Master thesis, TMH, KTH, Stockholm (in Swedish)
- Öhlin, D., 2004. Formant extraction for data-driven formant synthesis. Master thesis, TMH, KTH, Stockholm (in Swedish).
- (2004) Formant Extraction for Data-driven Formant Synthesis
- Öhlin, D.¹

41
- 33645586478
- Data-driven formant synthesis
- Öhlin, D., Carlson, R., 2004. Data-driven formant synthesis. In: Proc. Fonetik, pp. 160-163.
- (2004) Proc. Fonetik , pp. 160-163
- Öhlin, D.¹ Carlson, R.²

42
- 84937184260
- An audio-visual speech database and automatic measurements of visual speech
- Öhman, T., 1998. An audio-visual speech database and automatic measurements of visual speech. In: KTH TMH QPSR, Vols. 1-2, pp. 61-76.
- (1998) KTH TMH QPSR , vol.1-2 , pp. 61-76
- Öhman, T.¹

43
- 0017632039
- Rule synthesis of speech from diadic units
- Olive, J.P., 1977. Rule synthesis of speech from diadic units. In: Proc. ICASSP-77, pp. 568-570.
- (1977) Proc. ICASSP-77 , pp. 568-570
- Olive, J.P.¹

44
- 0020202671
- Parameterized models for facial animation
- F.I. Parke Parameterized models for facial animation IEEE Comput. Graphics 2 9 1982 61 68
- (1982) IEEE Comput. Graphics , vol.2 , Issue.9 , pp. 61-68
- Parke, F.I.¹

45
- 4143098969
- Visual text-to-speech
- I. Pandzic R. Forchheimer John Wiley & Sons
- C. Pelachaud Visual text-to-speech I. Pandzic R. Forchheimer MPEG-4 facial animation-the standard, implementation and applications 2002 John Wiley & Sons 125 140
- (2002) MPEG-4 Facial Animation-the Standard, Implementation and Applications , pp. 125-140
- Pelachaud, C.¹

46
- 0002473893
- Generating facial expressions for speech
- C. Pelachaud, N.I. Badler, and M. Steedman Generating facial expressions for speech Cognitive Sci. 20 1 1996 1 46
- (1996) Cognitive Sci. , vol.20 , Issue.1 , pp. 1-46
- Pelachaud, C.¹ Badler, N.I.² Steedman, M.³

47
- 85075933509
- Segmentation techniques in speech synthesis
- G. Peterson, W. Wang, and E. Sivertsen Segmentation techniques in speech synthesis J. Acoust. Soc. Amer. 32 1958 639 703
- (1958) J. Acoust. Soc. Amer. , vol.32 , pp. 639-703
- Peterson, G.¹ Wang, W.² Sivertsen, E.³

48
- 84870292720
- Mother: A new generation of talking heads providing a flexible articulatory control for video-realistic speech animation
- Bejing, China
- Reveret, L., Bailly, G., Badin, P., 2000. Mother: A new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In: Proc. 6th Internat. Conf. on Spoken Language Processing (ICSLP'2000). Bejing, China, pp. 755-758.
- (2000) Proc. 6th Internat. Conf. on Spoken Language Processing (ICSLP'2000) , pp. 755-758
- Reveret, L.¹ Bailly, G.² Badin, P.³

49
- 0028823541
- Speech recognition with primarily temporal cues
- R.V. Shannon, F-G. Zeng, V. Kamath, J. Wygonski, and M. Ekelid Speech recognition with primarily temporal cues Science 270 1995 303 304
- (1995) Science , vol.270 , pp. 303-304
- Shannon, R.V.¹ Zeng, F.-G.² Kamath, V.³ Wygonski, J.⁴ Ekelid, M.⁵

50
- 4143153672
- Evaluation of a multilingual synthetic talking face as a communication aid for the hearing impaired
- To appear in Barcelona, Spain
- Siciliano, C., Williams, G., Beskow, J., Faulkner A., 2003. Evaluation of a multilingual synthetic talking face as a communication aid for the hearing impaired. To appear in Proc. 15th Internat. Congress of Phonetic Sciences, Barcelona, Spain.
- (2003) Proc. 15th Internat. Congress of Phonetic Sciences
- Siciliano, C.¹ Williams, G.² Beskow, J.³ Faulkner, A.⁴

51
- 33645606980
- Master thesis, TMH, KTH, Stockholm, Sweden (in Swedish)
- Sigvardson, T., 2002. Data-driven methods for parameter synthesis-description of a system and experiments with CART-analysis. Master thesis, TMH, KTH, Stockholm, Sweden (in Swedish).
- (2002) Data-driven Methods for Parameter Synthesis-description of A System and Experiments with CART-analysis
- Sigvardson, T.¹

52
- 33645599091
- Master thesis, TMH, KTH, Stockholm (in Swedish)
- Sjölander, A., 2001. Data-driven formant synthesis. Master thesis, TMH, KTH, Stockholm (in Swedish).
- (2001) Data-driven Formant Synthesis
- Sjölander, A.¹

53
- 10444263998
- An HMM-based system for automatic segmentation and alignment of speech
- Umeå Universitet, Umeå, Sweden
- Sjölander, K., 2003. An HMM-based system for automatic segmentation and alignment of speech. In: Proc. Fonetik 2003, Umeå Universitet, Umeå, Sweden, pp. 93-96.
- (2003) Proc. Fonetik 2003 , pp. 93-96
- Sjölander, K.¹

54
- 84912906590
- Constraints among parameters simplify control of Klatt formant synthesizer
- K.N. Stevens, and C.A. Bickley Constraints among parameters simplify control of Klatt formant synthesizer J. Phonetics 19 1991 161 174
- (1991) J. Phonetics , vol.19 , pp. 161-174
- Stevens, K.N.¹ Bickley, C.A.²

55
- 33645585669
- Looking at speech
- D. Talkin Looking at speech Speech Technol. 4 1989 74 77
- (1989) Speech Technol. , vol.4 , pp. 74-77
- Talkin, D.¹

56
- 33645593094
- Master thesis, TMH, KTH, Stockholm, Sweden
- Vinet, R., 2004. Enhancing rule-based synthesizer using concatenative synthesis. Master thesis, TMH, KTH, Stockholm, Sweden.
- (2004) Enhancing Rule-based Synthesizer Using Concatenative Synthesis
- Vinet, R.¹

57
- 0032178592
- Quantitative association of vocal-tract and facial behaviour
- H. Yehia, P. Rubin, and E. Vatikiotis-Bateson Quantitative association of vocal-tract and facial behaviour Speech Comm. 26 1998 23 43
- (1998) Speech Comm. , vol.26 , pp. 23-43
- Yehia, H.¹ Rubin, P.² Vatikiotis-Bateson, E.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.