메뉴 건너뛰기




Volumn 19, Issue 3, 2011, Pages 570-582

Emotional audio-visual speech synthesis based on PAD

Author keywords

and DominanceSubmissiveness (PAD); ArousalNonarousal; Audio visual speech; boosting Gaussian mixture model (GMM); emotion; facial expression; Pleasure Displeasure

Indexed keywords

AND DOMINANCESUBMISSIVENESS (PAD); AROUSALNONAROUSAL; AUDIO-VISUAL SPEECH; EMOTION; FACIAL EXPRESSIONS; GAUSSIAN MIXTURE MODEL; PLEASURE-DISPLEASURE;

EID: 78650033338     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2010.2052246     Document Type: Article
Times cited : (54)

References (41)
  • 2
    • 34547549756 scopus 로고    scopus 로고
    • Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar
    • Z. Y. Wu, S. Zhang, L. H. Cai, and H. M. Meng, "Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar," in Proc. Int. Conf. Spoken Lang. Process., 2006, pp. 1802-1805.
    • (2006) Proc. Int. Conf. Spoken Lang. Process , pp. 1802-1805
    • Wu, Z.Y.1    Zhang, S.2    Cai, L.H.3    Meng, H.M.4
  • 3
    • 0003959340 scopus 로고
    • MIT Media Lab., Perceptual Comput. Section, Mass. Inst. Technol., Cambridge, MA, Tech. Rep
    • R. W. Picard, "Affective Computing," MIT Media Lab., Perceptual Comput. Section, Mass. Inst. Technol., Cambridge, MA, Tech. Rep., 1995.
    • (1995) Affective Computing
    • Picard, R.W.1
  • 5
    • 0027588084 scopus 로고
    • Facial expression and emotion
    • Apr
    • P. Ekman, "Facial expression and emotion," Amer. Psychol., vol. 48, no. 4, pp. 384-392, Apr. 1993.
    • (1993) Amer. Psychol. , vol.48 , Issue.4 , pp. 384-392
    • Ekman, P.1
  • 7
    • 84971539709 scopus 로고    scopus 로고
    • Emotional speech synthesis: A review
    • Aalborg, Denmark
    • M. Schröder, "Emotional speech synthesis: A review," in Proc. Eurospeech, Aalborg, Denmark, 2001, vol. 1, pp. 561-564.
    • (2001) Proc. Eurospeech , vol.1 , pp. 561-564
    • Schröder, M.1
  • 10
    • 54949115779 scopus 로고    scopus 로고
    • Humanoid audiovisual avatar with emotive text-to-speech synthesis
    • Oct
    • H. Tang, Y. Fu, J. Tu, M. Hasegawa-Johnson, and T. S. Huang, "Humanoid audiovisual avatar with emotive text-to-speech synthesis," IEEE Trans. Multimedia, vol. 10, pp. 969-981, Oct. 2008.
    • (2008) IEEE Trans. Multimedia , vol.10 , pp. 969-981
    • Tang, H.1    Fu, Y.2    Tu, J.3    Hasegawa-Johnson, M.4    Huang, T.S.5
  • 11
    • 85089091930 scopus 로고    scopus 로고
    • Expressive speech synthesis: Evaluation of a voice quality centered coder on the different acoustic dimensions
    • N. Audibert, D. Vincent, V. Auberg, and O. Rosec, "Expressive speech synthesis: Evaluation of a voice quality centered coder on the different acoustic dimensions," in Proc. Speech Prosody, 2006.
    • (2006) Proc. Speech Prosody
    • Audibert, N.1    Vincent, D.2    Auberg, V.3    Rosec, O.4
  • 12
    • 34547519038 scopus 로고    scopus 로고
    • A statistical approach for modeling prosody features using pos tags for emotional speech synthesis
    • Honolulu, HI,Apr
    • M. Bulut, S. Lee, and S. Narayanan, "A statistical approach for modeling prosody features using pos tags for emotional speech synthesis," in Proc. Int. Conf. Acoust. Speech, Signal, Process., Honolulu, HI, Apr. 2007, pp. 1237-1240.
    • (2007) Proc. Int. Conf. Acoust. Speech, Signal, Process. , pp. 1237-1240
    • Bulut, M.1    Lee, S.2    Narayanan, S.3
  • 13
    • 41049115081 scopus 로고    scopus 로고
    • Emovoice: A system to generate emotion in speech
    • Pittsburgh, PA
    • J. Cabral and L. Oliveira, "Emovoice: A system to generate emotion in speech," in Proc. Interspeech, Pittsburgh, PA, 2006.
    • (2006) Proc. Interspeech
    • Cabral, J.1    Oliveira, L.2
  • 14
    • 23144458178 scopus 로고    scopus 로고
    • Synthesis units for conversational speech using phrasal segments
    • N. Campbell, "Synthesis units for conversational speech using phrasal segments," in Proc. Autumn Meet. Acoust. Soc. Jpn., 2004.
    • (2004) Proc. Autumn Meet. Acoust. Soc. Jpn
    • Campbell, N.1
  • 15
    • 34047263010 scopus 로고    scopus 로고
    • Prosody conversion from neutral speech to emotional speech
    • Jul
    • J. Tao, Y. Kang, and A. Li, "Prosody conversion from neutral speech to emotional speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1145-1154, Jul. 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.4 , pp. 1145-1154
    • Tao, J.1    Kang, Y.2    Li, A.3
  • 17
    • 0142057164 scopus 로고    scopus 로고
    • Emotional facial expression model building
    • Y. Du and X. Lin, "Emotional facial expression model building," Pattern Recognition Lett., vol. 24, no. 16, pp. 2923-2934, 2003.
    • (2003) Pattern Recognition Lett. , vol.24 , Issue.16 , pp. 2923-2934
    • Du, Y.1    Lin, X.2
  • 19
    • 0037355922 scopus 로고    scopus 로고
    • Emotion disc and emotion squares: Tools to explore the facial expression space
    • Z. Ruttkay, H. Noot, and P. Hagen, "Emotion disc and emotion squares: Tools to explore the facial expression space," Comput. Graphics Forum, vol. 22, no. 1, pp. 49-53, 2003.
    • (2003) Comput. Graphics Forum , vol.22 , Issue.1 , pp. 49-53
    • Ruttkay, Z.1    Noot, H.2    Hagen, P.3
  • 20
    • 25844521034 scopus 로고    scopus 로고
    • Mixed feelings: Expression of non-basic emotions in a muscle-based talking head
    • I. Albrecht, M. Schröder, J. Haber, and H. P. Seidel, "Mixed feelings: expression of non-basic emotions in a muscle-based talking head," Virtual Reality, vol. 8, no. 4, pp. 201-212, 2005.
    • (2005) Virtual Reality , vol.8 , Issue.4 , pp. 201-212
    • Albrecht, I.1    Schröder, M.2    Haber, J.3    Seidel, H.P.4
  • 21
    • 23044466525 scopus 로고    scopus 로고
    • Computational model of believable conversational agents
    • Springer-Verlag, M. P. Huget, Ed. New York:, Lecture Notes in Computer Science
    • C. Pelachaud and M. Bilvi, "Computational model of believable conversational agents," in Communication in Multiagent Systems, M. P. Huget, Ed. New York: Springer-Verlag, 2003, vol. 2650, Lecture Notes in Computer Science, pp. 300-317.
    • (2003) Communication in Multiagent Systems , vol.2650 , pp. 300-317
    • Pelachaud, C.1    Bilvi, M.2
  • 22
    • 42949107237 scopus 로고    scopus 로고
    • Interrelation between speech and facial gestures in emotional utterances: A single subject study
    • Nov
    • C. Busso and S. Narayanan, "Interrelation between speech and facial gestures in emotional utterances: A single subject study," IEEE Trans. Speech, Audio,. Speech, Lang. Process., vol. 15, no. 8, pp. 2331-2347, Nov. 2007.
    • (2007) IEEE Trans. Speech, Audio,. Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2331-2347
    • Busso, C.1    Narayanan, S.2
  • 23
    • 57149144228 scopus 로고    scopus 로고
    • A survey of affect recognition methods: Audio, visual and spontaneous expressions
    • Jan
    • Z. Zeng, P. Maja, G. I. Roisman, and S. Thomas, "A survey of affect recognition methods: Audio, visual and spontaneous expressions," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 39-58, Jan. 2009.
    • (2009) IEEE Trans. Pattern Anal. Mach. Intell. , vol.31 , Issue.1 , pp. 39-58
    • Zeng, Z.1    Maja, P.2    Roisman, G.I.3    Thomas, S.4
  • 24
    • 21344454051 scopus 로고    scopus 로고
    • Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament
    • A. Mehrabian, "Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament," Current Psychol.: Development., Learn., Personal., Soc., vol. 14, pp. 261-292, 1996.
    • (1996) Current Psychol.: Development., Learn., Personal., Soc. , vol.14 , pp. 261-292
    • Mehrabian, A.1
  • 25
    • 78649990737 scopus 로고    scopus 로고
    • Analysis and conversion of emotional speech based on the prosodic features
    • W. Xiong, D. Cui, F. Meng, and L. Cai, "Analysis and conversion of emotional speech based on the prosodic features," in Proc. 8th Phon. Conf. China (PCC), 2008.
    • (2008) Proc. 8th Phon. Conf. China (PCC)
    • Xiong, W.1    Cui, D.2    Meng, F.3    Cai, L.4
  • 26
    • 0025543906 scopus 로고
    • Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
    • Dec
    • E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech Commun., vol. 9, no. 5-6, pp. 453-467, Dec. 1990.
    • (1990) Speech Commun. , vol.9 , Issue.5-6 , pp. 453-467
    • Moulines, E.1    Charpentier, F.2
  • 27
    • 51849090664 scopus 로고    scopus 로고
    • Prosodic boundary prediction based on maximum entropy model with error-driven modification
    • Singapore
    • X. Zhang, J. Xu, and L. Cai, "Prosodic boundary prediction based on maximum entropy model with error-driven modification," in Proc. ISCSLP, Singapore, 2006.
    • (2006) Proc. ISCSLP
    • Zhang, X.1    Xu, J.2    Cai, L.3
  • 29
    • 0031211090 scopus 로고    scopus 로고
    • A decision-theoretic generalization of on-line learning and an application to boosting
    • Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," J. Comput. Syst. Sci., vol. 55, pp. 119-139, 1997.
    • (1997) J. Comput. Syst. Sci. , vol.55 , pp. 119-139
    • Freund, Y.1    Schapire, R.E.2
  • 31
    • 78649999251 scopus 로고    scopus 로고
    • International Standard Information Technology Coding of Audio-Visual Objects. Part 2.Visual,Amendment 1,Visual Extensions. International Organization for Standardization Std ISO/IEC 14496-21999/Amd.1:2000(E)
    • International Standard, Information Technology Coding of Audio-Visual Objects. Part 2: Visual; Amendment 1: Visual Extensions., International Organization for Standardization Std, ISO/IEC 14496-2: 1999/Amd. 1: 2000(E).
  • 36
    • 0035472468 scopus 로고    scopus 로고
    • An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame
    • Nov
    • F. Lavagetto and R. Pockaj, "An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame," IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 1085-1097, Nov. 2001.
    • (2001) IEEE Trans. Circuits Syst. Video Technol. , vol.11 , pp. 1085-1097
    • Lavagetto, F.1    Pockaj, R.2
  • 37
    • 84949225455 scopus 로고    scopus 로고
    • A dynamic viseme model for personalizing a talking head
    • Beijing, China,Aug
    • Z. Wang, L. Cai, and H. Ai, "A dynamic viseme model for personalizing a talking head," in Proc. 6th Int. Conf. Signal Process., Beijing, China, Aug. 2002.
    • (2002) Proc. 6th Int. Conf. Signal Process.
    • Wang, Z.1    Cai, L.2    Ai, H.3
  • 39
    • 78650031964 scopus 로고    scopus 로고
    • DBN based audio-visual correlative model for audio-visual speech synthesis
    • Beijing, China
    • Z. Wu, L. Cai, and H. M. Meng, "DBN based audio-visual correlative model for audio-visual speech synthesis," in Proc. NCMMSC2005 (in Chinese), Beijing, China, 2005, pp. 334-337.
    • (2005) Proc. NCMMSC2005 (in Chinese) , pp. 334-337
    • Wu, Z.1    Cai, L.2    Meng, H.M.3
  • 40
    • 78649993603 scopus 로고    scopus 로고
    • MATLAB Function Reference 1-D Data Interpolation MathWorks
    • "MATLAB Function Reference," 1-D Data Interpolation MathWorks.
  • 41
    • 84876513525 scopus 로고    scopus 로고
    • Xface: MPEG-4 based open source toolkit for 3d facial animation
    • Gallipoli, Italy,May 25-28
    • "Xface: MPEG-4 based open source toolkit for 3d facial animation," in Proc. AVI04, Working Conf. Adv. Visual Interfaces, Gallipoli, Italy, May 25-28, 2004.
    • (2004) Proc. AVI04, Working Conf. Adv. Visual Interfaces


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.