SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 19, Issue 3, 2011, Pages 570-582

Emotional audio-visual speech synthesis based on PAD

(5) Jia, Jia a Zhang, Shen a Meng, Fanbo a Wang, Yongxin a Cai, Lianhong a

Author keywords

and DominanceSubmissiveness (PAD); ArousalNonarousal; Audio visual speech; boosting Gaussian mixture model (GMM); emotion; facial expression; Pleasure Displeasure

Indexed keywords

AND DOMINANCESUBMISSIVENESS (PAD); AROUSALNONAROUSAL; AUDIO-VISUAL SPEECH; EMOTION; FACIAL EXPRESSIONS; GAUSSIAN MIXTURE MODEL; PLEASURE-DISPLEASURE;

AUDIO ACOUSTICS; GAUSSIAN DISTRIBUTION; SPEECH COMMUNICATION; SPEECH SYNTHESIS;

SPEECH ANALYSIS;

EID: 78650033338 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2010.2052246 Document Type: Article

Times cited : (54)

References (41)

1
- 10044281988
- Lifelike talking faces for interactive services authors
- Sep
- E. Cosatto, J. Ostermann, H. P. Graf, and J. Schroeter, "Lifelike talking faces for interactive services authors," Proc. IEEE, Special Iss. Human-Computer Multimodal Interface, vol. 91, no. 9, pp. 1406-1429, Sep. 2003.
- (2003) Proc.IEEE, Special Iss. Human-Computer Multimodal Interface , vol.91 , Issue.9 , pp. 1406-1429
- Cosatto, E.¹ Ostermann, J.² Graf, H.P.³ Schroeter, J.⁴

2
- 34547549756
- Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar
- Z. Y. Wu, S. Zhang, L. H. Cai, and H. M. Meng, "Real-time synthesis of Chinese visual speech and facial expressions using MPEG-4 FAP features in a three-dimensional avatar," in Proc. Int. Conf. Spoken Lang. Process., 2006, pp. 1802-1805.
- (2006) Proc. Int. Conf. Spoken Lang. Process , pp. 1802-1805
- Wu, Z.Y.¹ Zhang, S.² Cai, L.H.³ Meng, H.M.⁴

3
- 0003959340
- MIT Media Lab., Perceptual Comput. Section, Mass. Inst. Technol., Cambridge, MA, Tech. Rep
- R. W. Picard, "Affective Computing," MIT Media Lab., Perceptual Comput. Section, Mass. Inst. Technol., Cambridge, MA, Tech. Rep., 1995.
- (1995) Affective Computing
- Picard, R.W.¹

4
- 0003774595
- London, U.K.:John Murray
- C. Darwin, The Expression of the Emotions in Man and Animals. London, U.K.: John Murray, 1872.
- (1872) The Expression of the Emotions in Man and Animals
- Darwin, C.¹

5
- 0027588084
- Facial expression and emotion
- Apr
- P. Ekman, "Facial expression and emotion," Amer. Psychol., vol. 48, no. 4, pp. 384-392, Apr. 1993.
- (1993) Amer. Psychol. , vol.48 , Issue.4 , pp. 384-392
- Ekman, P.¹

6
- 0038370976
- Facial and vocal expressions of emotion
- J. A. Russell, J. Bachorowski, and J. Fernández-Dols, "Facial and vocal expressions of emotion," Annu. Rev. Psychol., vol. 2002, pp. 329-349, 2003.
- (2003) Annu. Rev. Psychol. , vol.2002 , pp. 329-349
- Russell, J.A.¹ Bachorowski, J.² Fernández-Dols, J.³

7
- 84971539709
- Emotional speech synthesis: A review
- Aalborg, Denmark
- M. Schröder, "Emotional speech synthesis: A review," in Proc. Eurospeech, Aalborg, Denmark, 2001, vol. 1, pp. 561-564.
- (2001) Proc. Eurospeech , vol.1 , pp. 561-564
- Schröder, M.¹

8
- 22144492019
- Expressive audio-visual speech
- E. Bevacqua and C. Pelachaud, "Expressive audio-visual speech," Comput. Animat. Virtual Worlds, vol. 15, no. 3-4, pp. 297-304, 2004.
- (2004) Comput. Animat. Virtual Worlds , vol.15 , Issue.3-4 , pp. 297-304
- Bevacqua, E.¹ Pelachaud, C.²

9
- 84858952482
- The use of emotionally expressive avatars in collaborative virtual environments
- M. Fabri and D. J. Moore, "The use of emotionally expressive avatars in collaborative virtual environments," in Proc. Symp. Empathic Interact. With Synth. Charact., 2005.
- (2005) Proc. Symp. Empathic Interact. With Synth. Charact
- Fabri, M.¹ Moore, D.J.²

10
- 54949115779
- Humanoid audiovisual avatar with emotive text-to-speech synthesis
- Oct
- H. Tang, Y. Fu, J. Tu, M. Hasegawa-Johnson, and T. S. Huang, "Humanoid audiovisual avatar with emotive text-to-speech synthesis," IEEE Trans. Multimedia, vol. 10, pp. 969-981, Oct. 2008.
- (2008) IEEE Trans. Multimedia , vol.10 , pp. 969-981
- Tang, H.¹ Fu, Y.² Tu, J.³ Hasegawa-Johnson, M.⁴ Huang, T.S.⁵

11
- 85089091930
- Expressive speech synthesis: Evaluation of a voice quality centered coder on the different acoustic dimensions
- N. Audibert, D. Vincent, V. Auberg, and O. Rosec, "Expressive speech synthesis: Evaluation of a voice quality centered coder on the different acoustic dimensions," in Proc. Speech Prosody, 2006.
- (2006) Proc. Speech Prosody
- Audibert, N.¹ Vincent, D.² Auberg, V.³ Rosec, O.⁴

12
- 34547519038
- A statistical approach for modeling prosody features using pos tags for emotional speech synthesis
- Honolulu, HI,Apr
- M. Bulut, S. Lee, and S. Narayanan, "A statistical approach for modeling prosody features using pos tags for emotional speech synthesis," in Proc. Int. Conf. Acoust. Speech, Signal, Process., Honolulu, HI, Apr. 2007, pp. 1237-1240.
- (2007) Proc. Int. Conf. Acoust. Speech, Signal, Process. , pp. 1237-1240
- Bulut, M.¹ Lee, S.² Narayanan, S.³

13
- 41049115081
- Emovoice: A system to generate emotion in speech
- Pittsburgh, PA
- J. Cabral and L. Oliveira, "Emovoice: A system to generate emotion in speech," in Proc. Interspeech, Pittsburgh, PA, 2006.
- (2006) Proc. Interspeech
- Cabral, J.¹ Oliveira, L.²

14
- 23144458178
- Synthesis units for conversational speech using phrasal segments
- N. Campbell, "Synthesis units for conversational speech using phrasal segments," in Proc. Autumn Meet. Acoust. Soc. Jpn., 2004.
- (2004) Proc. Autumn Meet. Acoust. Soc. Jpn
- Campbell, N.¹

15
- 34047263010
- Prosody conversion from neutral speech to emotional speech
- Jul
- J. Tao, Y. Kang, and A. Li, "Prosody conversion from neutral speech to emotional speech," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 4, pp. 1145-1154, Jul. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.4 , pp. 1145-1154
- Tao, J.¹ Kang, Y.² Li, A.³

16
- 56149126461
- GMM-based voice conversion applied to emotional speech synthesis
- Nov
- H. Kawanami, Y. Iwami, T. Toda, H. Saruwatari, and K. Shikamo, "GMM-based voice conversion applied to emotional speech synthesis," IEEE Trans. Speech Audio Process., vol. 7, no. 6, pp. 697-708, Nov. 1999.
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.6 , pp. 697-708
- Kawanami, H.¹ Iwami, Y.² Toda, T.³ Saruwatari, H.⁴ Shikamo, K.⁵

17
- 0142057164
- Emotional facial expression model building
- Y. Du and X. Lin, "Emotional facial expression model building," Pattern Recognition Lett., vol. 24, no. 16, pp. 2923-2934, 2003.
- (2003) Pattern Recognition Lett. , vol.24 , Issue.16 , pp. 2923-2934
- Du, Y.¹ Lin, X.²

18
- 0036822899
- Parameterized facial expression synthesis based on MPEG-4
- A. Raouzaiou, N. Tsapatsoulis, K. Karpouzis, and S. Kollias, "Parameterized facial expression synthesis based on MPEG-4," EURASIP J. Appl. Signal Process., vol. 2002, no. 10, pp. 1021-1038, 2002.
- (2002) EURASIP J. Appl. Signal Process. , vol.2002 , Issue.10 , pp. 1021-1038
- Raouzaiou, A.¹ Tsapatsoulis, N.² Karpouzis, K.³ Kollias, S.⁴

19
- 0037355922
- Emotion disc and emotion squares: Tools to explore the facial expression space
- Z. Ruttkay, H. Noot, and P. Hagen, "Emotion disc and emotion squares: Tools to explore the facial expression space," Comput. Graphics Forum, vol. 22, no. 1, pp. 49-53, 2003.
- (2003) Comput. Graphics Forum , vol.22 , Issue.1 , pp. 49-53
- Ruttkay, Z.¹ Noot, H.² Hagen, P.³

20
- 25844521034
- Mixed feelings: Expression of non-basic emotions in a muscle-based talking head
- I. Albrecht, M. Schröder, J. Haber, and H. P. Seidel, "Mixed feelings: expression of non-basic emotions in a muscle-based talking head," Virtual Reality, vol. 8, no. 4, pp. 201-212, 2005.
- (2005) Virtual Reality , vol.8 , Issue.4 , pp. 201-212
- Albrecht, I.¹ Schröder, M.² Haber, J.³ Seidel, H.P.⁴

21
- 23044466525
- Computational model of believable conversational agents
- Springer-Verlag, M. P. Huget, Ed. New York:, Lecture Notes in Computer Science
- C. Pelachaud and M. Bilvi, "Computational model of believable conversational agents," in Communication in Multiagent Systems, M. P. Huget, Ed. New York: Springer-Verlag, 2003, vol. 2650, Lecture Notes in Computer Science, pp. 300-317.
- (2003) Communication in Multiagent Systems , vol.2650 , pp. 300-317
- Pelachaud, C.¹ Bilvi, M.²

22
- 42949107237
- Interrelation between speech and facial gestures in emotional utterances: A single subject study
- Nov
- C. Busso and S. Narayanan, "Interrelation between speech and facial gestures in emotional utterances: A single subject study," IEEE Trans. Speech, Audio,. Speech, Lang. Process., vol. 15, no. 8, pp. 2331-2347, Nov. 2007.
- (2007) IEEE Trans. Speech, Audio,. Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2331-2347
- Busso, C.¹ Narayanan, S.²

23
- 57149144228
- A survey of affect recognition methods: Audio, visual and spontaneous expressions
- Jan
- Z. Zeng, P. Maja, G. I. Roisman, and S. Thomas, "A survey of affect recognition methods: Audio, visual and spontaneous expressions," IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 1, pp. 39-58, Jan. 2009.
- (2009) IEEE Trans. Pattern Anal. Mach. Intell. , vol.31 , Issue.1 , pp. 39-58
- Zeng, Z.¹ Maja, P.² Roisman, G.I.³ Thomas, S.⁴

24
- 21344454051
- Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament
- A. Mehrabian, "Pleasure-arousal-dominance: A general framework for describing and measuring individual differences in temperament," Current Psychol.: Development., Learn., Personal., Soc., vol. 14, pp. 261-292, 1996.
- (1996) Current Psychol.: Development., Learn., Personal., Soc. , vol.14 , pp. 261-292
- Mehrabian, A.¹

25
- 78649990737
- Analysis and conversion of emotional speech based on the prosodic features
- W. Xiong, D. Cui, F. Meng, and L. Cai, "Analysis and conversion of emotional speech based on the prosodic features," in Proc. 8th Phon. Conf. China (PCC), 2008.
- (2008) Proc. 8th Phon. Conf. China (PCC)
- Xiong, W.¹ Cui, D.² Meng, F.³ Cai, L.⁴

26
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- Dec
- E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones," Speech Commun., vol. 9, no. 5-6, pp. 453-467, Dec. 1990.
- (1990) Speech Commun. , vol.9 , Issue.5-6 , pp. 453-467
- Moulines, E.¹ Charpentier, F.²

27
- 51849090664
- Prosodic boundary prediction based on maximum entropy model with error-driven modification
- Singapore
- X. Zhang, J. Xu, and L. Cai, "Prosodic boundary prediction based on maximum entropy model with error-driven modification," in Proc. ISCSLP, Singapore, 2006.
- (2006) Proc. ISCSLP
- Zhang, X.¹ Xu, J.² Cai, L.³

28
- 0031623661
- Spectral voice conversions of text-tospeech synthesis
- A. Kain and M. W. Macon, "Spectral voice conversions of text-tospeech synthesis," in Proc. Int. Conf. Acoust, Speech, Signal Process., 1998, pp. 285-288.
- (1998) Proc. Int. Conf. Acoust, Speech, Signal Process , pp. 285-288
- Kain, A.¹ Macon, M.W.²

29
- 0031211090
- A decision-theoretic generalization of on-line learning and an application to boosting
- Y. Freund and R. E. Schapire, "A decision-theoretic generalization of on-line learning and an application to boosting," J. Comput. Syst. Sci., vol. 55, pp. 119-139, 1997.
- (1997) J. Comput. Syst. Sci. , vol.55 , pp. 119-139
- Freund, Y.¹ Schapire, R.E.²

30
- 0001963082
- A short introduction to boosting
- Y. Freund and R. E. Schapire, "A short introduction to boosting," J. Jpn. Soc. Artif. Intell., vol. 14, pp. 771-780, 1999.
- (1999) J. Jpn. Soc. Artif. Intell. , vol.14 , pp. 771-780
- Freund, Y.¹ Schapire, R.E.²

31
- 78649999251
- International Standard Information Technology Coding of Audio-Visual Objects. Part 2.Visual,Amendment 1,Visual Extensions. International Organization for Standardization Std ISO/IEC 14496-21999/Amd.1:2000(E)
- International Standard, Information Technology Coding of Audio-Visual Objects. Part 2: Visual; Amendment 1: Visual Extensions., International Organization for Standardization Std, ISO/IEC 14496-2: 1999/Amd. 1: 2000(E).

32
- 0344212675
- New York:Wiley
- I. S. Pand'zić and R. Forchheimer, MPEG-4 Facial Animation - The Standard, Implementations and Applications. New York: Wiley, 2002.
- (2002) MPEG-4 Facial Animation - The Standard, Implementations and Applications
- Pand'zić, I.S.¹ Forchheimer, R.²

33
- 78649996133
- Region-based facial expression synthesis on a three-dimensional avatar
- S. Zhang, Z.Wu, and L. Cai, "Region-based facial expression synthesis on a three-dimensional avatar," in Proc. China Conf. Human Comput. Interact., 2006.
- (2006) Proc. China Conf. Human Comput. Interact
- Zhang, S.¹ Wu, Z.² Cai, L.³

34
- 84857444525
- Coding facial expressions with gabor wavelets
- M. Lyons, S. Akamatsu, M. Kamachi, and J. Gyoba, "Coding facial expressions with gabor wavelets," in Proc. 3rd IEEE Conf. Face Gesture Recogn., 1998, pp. 200-205.
- (1998) Proc. 3rd IEEE Conf. Face Gesture Recogn , pp. 200-205
- Lyons, M.¹ Akamatsu, S.² Kamachi, M.³ Gyoba, J.⁴

35
- 33646805422
- The reliability and validity of the chinese version of abbreviated pad emotion scales
- X. Li, H. Zhou, S. Song, T. Ran, and X. Fu, "The reliability and validity of the chinese version of abbreviated pad emotion scales," in Proc. Int. Conf. Affective Comput. Intell. Interact., 2005.
- (2005) Proc. Int. Conf. Affective Comput. Intell. Interact
- Li, X.¹ Zhou, H.² Song, S.³ Ran, T.⁴ Fu, X.⁵

36
- 0035472468
- An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame
- Nov
- F. Lavagetto and R. Pockaj, "An efficient use of MPEG-4 FAP interpolation for facial animation at 70 bits/frame," IEEE Trans. Circuits Syst. Video Technol., vol. 11, pp. 1085-1097, Nov. 2001.
- (2001) IEEE Trans. Circuits Syst. Video Technol. , vol.11 , pp. 1085-1097
- Lavagetto, F.¹ Pockaj, R.²

37
- 84949225455
- A dynamic viseme model for personalizing a talking head
- Beijing, China,Aug
- Z. Wang, L. Cai, and H. Ai, "A dynamic viseme model for personalizing a talking head," in Proc. 6th Int. Conf. Signal Process., Beijing, China, Aug. 2002.
- (2002) Proc. 6th Int. Conf. Signal Process.
- Wang, Z.¹ Cai, L.² Ai, H.³

38
- 33744958808
- Multi-level fusion of audio and visual features for speaker identification
- Advances in Biometrics - International Conference, ICB 2006, Proceedings LNCS
- Z. Wu, L. Cai, and H. M. Meng, "Multi-level fusion of audio and visual features for speaker identification," in Proc. Int. Conf. Biometrics (ICB2006), 2006, pp. 493-499. (Pubitemid 43856410)
- (2006) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , vol.3832 , pp. 493-499
- Wu, Z.¹ Cai, L.² Meng, H.³

39
- 78650031964
- DBN based audio-visual correlative model for audio-visual speech synthesis
- Beijing, China
- Z. Wu, L. Cai, and H. M. Meng, "DBN based audio-visual correlative model for audio-visual speech synthesis," in Proc. NCMMSC2005 (in Chinese), Beijing, China, 2005, pp. 334-337.
- (2005) Proc. NCMMSC2005 (in Chinese) , pp. 334-337
- Wu, Z.¹ Cai, L.² Meng, H.M.³

40
- 78649993603
- MATLAB Function Reference 1-D Data Interpolation MathWorks
- "MATLAB Function Reference," 1-D Data Interpolation MathWorks.

41
- 84876513525
- Xface: MPEG-4 based open source toolkit for 3d facial animation
- Gallipoli, Italy,May 25-28
- "Xface: MPEG-4 based open source toolkit for 3d facial animation," in Proc. AVI04, Working Conf. Adv. Visual Interfaces, Gallipoli, Italy, May 25-28, 2004.
- (2004) Proc. AVI04, Working Conf. Adv. Visual Interfaces

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.