메뉴 건너뛰기




Volumn 20, Issue 8, 2012, Pages 2378-2387

Relating objective and subjective performance measures for AAM-based visual speech synthesis

Author keywords

Active appearance models (AAMs); canonical correlation analysis; visual speech evaluation; visual speech synthesis

Indexed keywords

ACOUSTIC FEATURES; ACTIVE APPEARANCE MODELS; CANONICAL CORRELATION ANALYSIS; DYNAMIC TIME; OBJECTIVE MEASURE; PHONETIC TRANSCRIPTIONS; SMALL REGION; SUBJECTIVE PERFORMANCE; SUBJECTIVE QUALITY; VISUAL SPEECH; VISUAL SPEECH SYNTHESIS;

EID: 84865358428     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2202651     Document Type: Article
Times cited : (17)

References (82)
  • 2
    • 35048862963 scopus 로고    scopus 로고
    • SYNFACE - A talking head telephone for the hearing-impaired
    • Computers Helping People with Special Needs 9th International Conference, ICCHP 2004 Paris, France, July 7-9, 2004 Proceedings
    • J. Beskow, I. Karlsson, J. Kewley, and G. Salvi, "SYNFACE - A talking head telephone for the hearing-impaired," Computers Helping People With Special Needs, pp. 1178-1186, 2004. (Pubitemid 38939738)
    • (2004) Lecture Notes inf Computer Science , Issue.3118 , pp. 1178-1186
    • Beskow, J.1    Karlsson, I.2    Kewley, J.3    Salvi, G.4
  • 4
    • 0020202671 scopus 로고
    • Parametric models for facial animation
    • F. Parke, "Parametric models for facial animation," Comput. Graphics Applicat., vol. 2, no. 9, pp. 61-68, 1982.
    • (1982) Comput. Graphics Applicat. , vol.2 , Issue.9 , pp. 61-68
    • Parke, F.1
  • 8
    • 0035701209 scopus 로고    scopus 로고
    • Geometry-basedmusclemodelling for facial animation
    • K. Kähler, J. Haber, and H. Seidel, "Geometry- basedmusclemodelling for facial animation," Graphics Interface, pp. 27-36, 2001.
    • (2001) Graphics Interface , pp. 27-36
    • Kähler, K.1    Haber, J.2    Seidel, H.3
  • 9
    • 0029182694 scopus 로고
    • Realistic modeling for facial animation
    • Y. Lee, D. Terzopoulos, and K. Waters, "Realistic modeling for facial animation," in Proc. SIGGRAPH, 1995, pp. 55-62.
    • (1995) Proc. SIGGRAPH , pp. 55-62
    • Lee, Y.1    Terzopoulos, D.2    Waters, K.3
  • 10
    • 85020405512 scopus 로고    scopus 로고
    • Real time muscle deformations using mass-spring systems
    • L. Nedel and D. Thalmann, "Real time muscle deformations using mass-spring systems," Comput. Grahpics Int., pp. 156-165, 1998.
    • (1998) Comput. Grahpics Int. , pp. 156-165
    • Nedel, L.1    Thalmann, D.2
  • 11
    • 84937437186 scopus 로고    scopus 로고
    • Voice puppetry
    • Los Angeles, CA
    • M. Brand, "Voice puppetry," in Proc. SIGGRAPH, Los Angeles, CA, 1999, pp. 21-28.
    • (1999) Proc. SIGGRAPH , pp. 21-28
    • Brand, M.1
  • 12
    • 0036885180 scopus 로고    scopus 로고
    • Realistic mouth synthesis based on shape appearance dependence mapping
    • Y. Du and X. Lin, "Realistic mouth synthesis based on shape appearance dependence mapping," Pattern Recogn. Lett., vol. 23, no. 14, pp. 1875-1885, 2002.
    • (2002) Pattern Recogn. Lett. , vol.23 , Issue.14 , pp. 1875-1885
    • Du, Y.1    Lin, X.2
  • 15
    • 84865359288 scopus 로고    scopus 로고
    • A comparative study of direct and ASR-based modular audio to visual speech systems
    • G. Feldhoffer, A. Tihanyi, and O. Balázs, "A comparative study of direct and ASR-based modular audio to visual speech systems," The Phonetician, vol. 97/98, pp. 15-24, 2008.
    • (2008) The Phonetician , vol.97-98 , pp. 15-24
    • Feldhoffer, G.1    Tihanyi, A.2    Balázs, O.3
  • 16
    • 0036650837 scopus 로고    scopus 로고
    • Real-time speech-driven expressive synthetic talking faces using neural networks
    • Jul
    • P. Hong, Z. Wen, and T. Huang, "Real-time speech-driven expressive synthetic talking faces using neural networks," IEEE Trans. Neural Netw., vol. 13, no. 4, pp. 916-927, Jul. 2002.
    • (2002) IEEE Trans. Neural Netw. , vol.13 , Issue.4 , pp. 916-927
    • Hong, P.1    Wen, Z.2    Huang, T.3
  • 17
    • 27844486935 scopus 로고    scopus 로고
    • Partial linear regression for speech-driven talking head application
    • DOI 10.1016/j.image.2005.04.002, PII S0923596505000421
    • C. Hsieh and Y. Chen, "Partial linear regression for speech-driven talking head application," Signal Process.: Image Commun., vol. 21, pp. 1-12, 2006. (Pubitemid 41653199)
    • (2006) Signal Processing: Image Communication , vol.21 , Issue.1 , pp. 1-12
    • Hsieh, C.-K.1    Chen, Y.-C.2
  • 20
    • 79959854294 scopus 로고    scopus 로고
    • Synthesizing photo-real talking head via trajectory-guided sample selection
    • L. Wang, W. Han, X. Qian, and F. Soong, "Synthesizing photo-real talking head via trajectory-guided sample selection," in Proc. Interspeech, 2010.
    • (2010) Proc. Interspeech
    • Wang, L.1    Han, W.2    Qian, X.3    Soong, F.4
  • 21
    • 76649100693 scopus 로고    scopus 로고
    • Real time speech driven facial animation using formant analysis
    • Z. Wen, P. Hong, and T. Huang, "Real time speech driven facial animation using formant analysis," in Proc. Int. Conf. Multimedia Expo, 2001, pp. 817-820.
    • (2001) Proc. Int. Conf. Multimedia Expo , pp. 817-820
    • Wen, Z.1    Hong, P.2    Huang, T.3
  • 22
    • 0030677313 scopus 로고    scopus 로고
    • Video rewrite: Driving visual speech with audio
    • C. Bregler, M. Covell, and M. Slaney, "Video rewrite: Driving visual speech with audio," in Proc. SIGGRAPH, 1997, pp. 353-360.
    • (1997) Proc. SIGGRAPH , pp. 353-360
    • Bregler, C.1    Covell, M.2    Slaney, M.3
  • 23
    • 78650937887 scopus 로고    scopus 로고
    • Visual speech synthesis bymodelling coarticulation dynamics using a non-parametric switching state-space model
    • S.Deena, S. Hou, and A. Galata, "Visual speech synthesis bymodelling coarticulation dynamics using a non-parametric switching state-space model," in Int. Conf. Multimodal Interfaces, 2010, pp. 1-8.
    • (2010) Int. Conf. Multimodal Interfaces , pp. 1-8
    • Deena, S.1    Hou, S.2    Galata, A.3
  • 25
    • 85009254391 scopus 로고    scopus 로고
    • Miketalk: A talking facial display based on morphing visemes
    • T. Ezzat and T. Poggio, "Miketalk: A talking facial display based on morphing visemes," in Proc.Comput. Animat. Conf., 1998, pp. 96-103.
    • (1998) Proc.Comput. Animat. Conf. , pp. 96-103
    • Ezzat, T.1    Poggio, T.2
  • 26
    • 77953828868 scopus 로고    scopus 로고
    • Trainable videorealistic speech animation
    • T. Ezzat, G. Geiger, and T. Poggio, "Trainable videorealistic speech animation," in Proc. SIGGRAPH, 2002, pp. 388-398.
    • (2002) Proc. SIGGRAPH , pp. 388-398
    • Ezzat, T.1    Geiger, G.2    Poggio, T.3
  • 27
    • 44949159884 scopus 로고    scopus 로고
    • TDA: A new trainable trajectory formation system for facial animation
    • O. Govokhina, G. Bailly, G. Breton, and P. Bagshaw, "TDA: A new trainable trajectory formation system for facial animation," in Proc. Interspeech, 2006, pp. 2474-2477.
    • (2006) Proc. Interspeech , pp. 2474-2477
    • Govokhina, O.1    Bailly, G.2    Breton, G.3    Bagshaw, P.4
  • 29
    • 79959829115 scopus 로고    scopus 로고
    • Active appearancemodels for photorealistic visual speech synthesis
    • W.Mattheyses, L. Latacz, andW. Verhelst, "Active appearancemodels for photorealistic visual speech synthesis," in Proc. Interspeech, 2010.
    • (2010) Proc. Interspeech
    • Mattheyses, W.1    Latacz, L.2    Verhelst, W.3
  • 30
    • 10444256499 scopus 로고    scopus 로고
    • Near-videorealistic synthetic talking faces: Implementation and evaluation
    • B. Theobald, J. Bangham, I. Matthews, and G. Cawley, "Near- videorealistic synthetic talking faces: Implementation and evaluation," Speech Commun., vol. 44, pp. 127-140, 2004.
    • (2004) Speech Commun. , vol.44 , pp. 127-140
    • Theobald, B.1    Bangham, J.2    Matthews, I.3    Cawley, G.4
  • 32
    • 0034271782 scopus 로고    scopus 로고
    • Photo-realistic talking-heads fromimage samples
    • Jun
    • E. Cosatto and H. Graf, "Photo-realistic talking-heads fromimage samples," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 152-163, Jun. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 152-163
    • Cosatto, E.1    Graf, H.2
  • 37
    • 85009179201 scopus 로고    scopus 로고
    • ISCA special session: Hot topics in speech synthesis
    • G. Bailly, N. Campbell, and M. Mbius, "ISCA special session: Hot topics in speech synthesis," in Proc. Eurospeech, 2003, pp. 37-40.
    • (2003) Proc. Eurospeech , pp. 37-40
    • Bailly, G.1    Campbell, N.2    Mbius, M.3
  • 40
    • 0032178592 scopus 로고    scopus 로고
    • Quantitative association of vocal-tract and facial behavior
    • PII S016763939800048X
    • H. Yehia, R. P., and E. Vatikiotis-Bateson, "Quantitative association of vocal-tract and facial behaviour," Speech Commun., vol. 26, pp. 23-43, 1998. (Pubitemid 128381217)
    • (1998) Speech Communication , vol.26 , Issue.1-2 , pp. 23-43
    • Yehia, H.1    Rubin, P.2    Vatikiotis-Bateson, E.3
  • 44
    • 0038194614 scopus 로고    scopus 로고
    • Evaluation of a system for concatenative articulatory visual speech synthesis
    • O. Engwall, "Evaluation of a system for concatenative articulatory visual speech synthesis," in Proc. Int. Conf. Spoken Lang. Process., 2002, pp. 665-668.
    • (2002) Proc. Int. Conf. Spoken Lang. Process. , pp. 665-668
    • Engwall, O.1
  • 45
    • 84867227937 scopus 로고    scopus 로고
    • Realistic facial animation system for interactive services
    • K. Liu and J. Ostermann, "Realistic facial animation system for interactive services," in Proc. Interspeech, 2008, pp. 2330-2333.
    • (2008) Proc. Interspeech , pp. 2330-2333
    • Liu, K.1    Ostermann, J.2
  • 46
    • 31344439475 scopus 로고    scopus 로고
    • Accurate visible speech synthesis based on concatenating variable length motion capture data
    • Mar-Apr
    • J. Ma, R. Cole, B. Pellom, W. Ward, and B. Wise, "Accurate visible speech synthesis based on concatenating variable length motion capture data," IEEE Trans. Vis. Comput. Graphics, vol. 12, no. 2, pp. 266-276, Mar.-Apr. 2006.
    • (2006) IEEE Trans. Vis. Comput. Graphics , vol.12 , Issue.2 , pp. 266-276
    • Ma, J.1    Cole, R.2    Pellom, B.3    Ward, W.4    Wise, B.5
  • 47
    • 22144492019 scopus 로고    scopus 로고
    • Expressive audio-visual speech
    • DOI 10.1002/cav.32, The Very Best Papers of CASA 2004
    • E. Bevacqua and C. Pelachaud, "Expressive audio-visual speech," Comput. Animation and Virtual Worlds, vol. 15, pp. 297-304, 2004. (Pubitemid 41108008)
    • (2004) Computer Animation and Virtual Worlds , vol.15 , Issue.3-4 , pp. 297-304
    • Bevacqua, E.1    Pelachaud, C.2
  • 50
    • 84867194643 scopus 로고    scopus 로고
    • A probabilistic trajectory synthesis system for synthesising visual speech
    • B. Theobald and N. Wilkinson, "A probabilistic trajectory synthesis system for synthesising visual speech," in Proc. Interspeech, 2008, pp. 2310-2313.
    • (2008) Proc. Interspeech , pp. 2310-2313
    • Theobald, B.1    Wilkinson, N.2
  • 51
    • 70350437421 scopus 로고    scopus 로고
    • Realistic visual speech synthesis based on hybrid concatenation method
    • J. Tao, L. Xin, and Y. Panrong, "Realistic visual speech synthesis based on hybrid concatenation method," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 3, pp. 469-477, 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.3 , pp. 469-477
    • Tao, J.1    Xin, L.2    Panrong, Y.3
  • 54
    • 4143072802 scopus 로고    scopus 로고
    • Trainable articulatory control models for visual speech synthesis
    • J. Beskow, "Trainable articulatory control models for visual speech synthesis," J. Speech Technol., vol. 4, no. 7, pp. 335-349, 2004.
    • (2004) J. Speech Technol. , vol.4 , Issue.7 , pp. 335-349
    • Beskow, J.1
  • 55
    • 24144469759 scopus 로고    scopus 로고
    • Data-driven multimodal synthesis
    • DOI 10.1016/j.specom.2005.02.015, PII S0167639305000294
    • R. Carlson and B. Granström, "Data-driven multimodal synthesis," Speech Commun., vol. 47, pp. 182-193, 2005. (Pubitemid 41231589)
    • (2005) Speech Communication , vol.47 , Issue.1-2 , pp. 182-193
    • Carlson, R.1    Granstrom, B.2
  • 56
    • 79959844243 scopus 로고    scopus 로고
    • A minimum converted trajectory error (mcte) approach to high quality speech-to-lips conversion
    • X. Zhuang, L. Wang, F. Soong, and M. Hasegawa-Johnson, "A minimum converted trajectory error (mcte) approach to high quality speech-to-lips conversion," in Proc. Interspeech, 2010.
    • (2010) Proc. Interspeech
    • Zhuang, X.1    Wang, L.2    Soong, F.3    Hasegawa-Johnson, M.4
  • 57
    • 10044230252 scopus 로고    scopus 로고
    • Speech driven facial animation using a hidden Markov coarticulation model
    • D. Cosker, D. Marshall, P. Rosin, and Y. Hicks, "Speech driven facial animation using a hidden Markov coarticulation model," in Proc. Int. Conf. Pattern Recogn., 2004, pp. 128-131.
    • (2004) Proc. Int. Conf. Pattern Recogn. , pp. 128-131
    • Cosker, D.1    Marshall, D.2    Rosin, P.3    Hicks, Y.4
  • 58
    • 0141491557 scopus 로고    scopus 로고
    • 3D face point trajectory synthesis using an automatically derived visual phoneme similarity matrix
    • L. Arslan and D. Talkin, "3D face point trajectory synthesis using an automatically derived visual phoneme similarity matrix," in Proc. Int. Conf. Auditory-Vis. Speech Process., 1998, pp. 175-180.
    • (1998) Proc. Int. Conf. Auditory-Vis. Speech Process. , pp. 175-180
    • Arslan, L.1    Talkin, D.2
  • 59
    • 84966335540 scopus 로고    scopus 로고
    • Evaluation of movement generation systems using the point-light technique
    • G. Bailly, G. Gibert, and M. Odisio, "Evaluation of movement generation systems using the point-light technique," in Proc. IEEE Workshop Speech Synth., 2002, pp. 27-30.
    • (2002) Proc. IEEE Workshop Speech Synth. , pp. 27-30
    • Bailly, G.1    Gibert, G.2    Odisio, M.3
  • 60
    • 85034718268 scopus 로고    scopus 로고
    • Audio-visual synthesis of talking faces from speech production correlates
    • T. Kuratate, K.Munhall, P. Rubin, E. Vatikiotis-Bateson, and H. Yehia, "Audio-visual synthesis of talking faces from speech production correlates," in Proc. Eurospeech, 1999, vol. 3, pp. 1279-1282.
    • (1999) Proc. Eurospeech , vol.3 , pp. 1279-1282
    • Kuratate, T.1    Munhall, K.2    Rubin, P.3    Vatikiotis-Bateson, E.4    Yehia, H.5
  • 63
    • 0033336969 scopus 로고    scopus 로고
    • User evaluation: Synthetic talking faces for interactive services
    • DOI 10.1007/s003710050182
    • I. Pandzic, J. Ostermann, and D. Millen, "User evaluation: Synthetic talking faces for interactive services," Vis. Comput., vol. 15, pp. 330-340, 1999. (Pubitemid 30504265)
    • (1999) Visual Computer , vol.15 , Issue.7 , pp. 330-340
    • Pandzic, I.S.1    Ostermann, J.2    Millen, D.3
  • 65
    • 0032178686 scopus 로고    scopus 로고
    • Audio-visual speech synthesis from French text: Eight years of models, designs and evaluation at the ICP
    • PII S0167639398000454
    • C. Benoît and B. Le Goff, "Audio-visual speech synthesis from french text: Eight years of models, designs and evaluation at the ICP," Speech Commun., vol. 26, pp. 117-129, 1998. (Pubitemid 128381224)
    • (1998) Speech Communication , vol.26 , Issue.1-2 , pp. 117-129
    • Benoit, C.1    Le Goff, B.2
  • 66
    • 33749437734 scopus 로고    scopus 로고
    • Design, implementation and evaluation of the czech realistic audio-visual speech synthesis
    • M. Železný, K. Zdeněk, P. Císař, and M. Jindřich, "Design, implementation and evaluation of the czech realistic audio-visual speech synthesis," Signal Process., vol. 86, pp. 3657-3673, 2006.
    • (2006) Signal Process. , vol.86 , pp. 3657-3673
    • Železný, M.1    Zdeněk, K.2    Císař, P.3    Jindřich, M.4
  • 67
    • 0030166343 scopus 로고    scopus 로고
    • The SUS test 1: A method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences
    • DOI 10.1016/0167-6393(96)00026-X, PII S016763939600026X
    • C. Benoît, M. Grice, and V. Hazan, "The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences," Speech Commun., vol. 18, no. 4, pp. 381-392, 1996. (Pubitemid 126362996)
    • (1996) Speech Communication , vol.18 , Issue.4 , pp. 381-392
    • Benoit, C.1    Grice, M.2    Hazan, V.3
  • 68
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • H. McGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, pp. 746-748, 1976.
    • (1976) Nature , vol.264 , pp. 746-748
    • McGurk, H.1    MacDonald, J.2
  • 69
    • 58149506686 scopus 로고    scopus 로고
    • Towards perceptually realistic talking heads: Models, metrics and Mcgurk
    • D. Cosker, D. Marshall, P. Rosin, S. Paddock, and S. Rushton, "Towards perceptually realistic talking heads: Models, metrics and Mcgurk," ACM Trans. Appl. Percept., vol. 2, no. 3, pp. 270-285, 2005.
    • (2005) ACM Trans. Appl. Percept. , vol.2 , Issue.3 , pp. 270-285
    • Cosker, D.1    Marshall, D.2    Rosin, P.3    Paddock, S.4    Rushton, S.5
  • 71
    • 10444283472 scopus 로고    scopus 로고
    • An articulation model for audiovisual speech synthesis-Determination, adjustment, evaluation
    • S. Fagel and C. Clemens, "An articulation model for audiovisual speech synthesis - Determination, adjustment, evaluation," Speech Commun., vol. 44, pp. 141-154, 2004.
    • (2004) Speech Commun. , vol.44 , pp. 141-154
    • Fagel, S.1    Clemens, C.2
  • 72
    • 84867195987 scopus 로고    scopus 로고
    • MASSY speaks English: Adaptation and evaluation of a talking head
    • S. Fagel, "MASSY speaks English: Adaptation and evaluation of a talking head," Proc. Interspeech, 2008.
    • (2008) Proc. Interspeech
    • Fagel, S.1
  • 75
    • 24944588116 scopus 로고    scopus 로고
    • Automatic creation of a talking head from a video sequence
    • DOI 10.1109/TMM.2005.850964
    • K. Choi and J. Hwang, "Automatic creation of a talking head from a video sequence," IEEE Trans. Multimedia, vol. 7, no. 4, pp. 628-637, Aug. 2005. (Pubitemid 41311415)
    • (2005) IEEE Transactions on Multimedia , vol.7 , Issue.4 , pp. 628-637
    • Choi, K.-H.1    Hwang, J.-N.2
  • 76
    • 70350003751 scopus 로고    scopus 로고
    • On the importance of audiovisual coherence for the perceived quality of synthesized visual speech
    • W. Mattheyses, L. Latacz, and W. Verhelst, "On the importance of audiovisual coherence for the perceived quality of synthesized visual speech," EURASIP J. Audio, Speech, Music Process., vol. 2009, pp. 1-12, 2009.
    • (2009) EURASIP J. Audio, Speech, Music Process. , vol.2009 , pp. 1-12
    • Mattheyses, W.1    Latacz, L.2    Verhelst, W.3
  • 77
    • 70450168976 scopus 로고    scopus 로고
    • Direct, modular and hybrid audio to visual speech conversion methods-A comparative study
    • G. Takacs, "Direct, modular and hybrid audio to visual speech conversion methods - A comparative study," in Proc. Interspeech, 2009.
    • (2009) Proc. Interspeech
    • Takacs, G.1
  • 79
    • 34547523367 scopus 로고    scopus 로고
    • Audio-visual speech synchrony measure for talking-face identity verification
    • Apr.
    • H. Bredin and G. Chollet, "Audio-visual speech synchrony measure for talking-face identity verification," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Apr. 2007, vol. 2, pp. 233-236.
    • (2007) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol.2 , pp. 233-236
    • Bredin, H.1    Chollet, G.2
  • 80
    • 0035579204 scopus 로고    scopus 로고
    • Calculation of the smoothing spline with weighted roughness measure
    • DOI 10.1142/S0218202501000726
    • C. de Boor, "Calculation of the smoothing spline with weighted roughness measure," Math. Models Methods in Appl. Sci., vol. 11, no. 1, pp. 33-41, 2001. (Pubitemid 33686843)
    • (2001) Mathematical Models and Methods in Applied Sciences , vol.11 , Issue.1 , pp. 33-41
    • De Boor, C.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.