메뉴 건너뛰기




Volumn 31, Issue 2, 2013, Pages 153-163

LSTM-modeling of continuous emotions in an audiovisual affect recognition framework

Author keywords

Context modeling; Emotion recognition; Facial movement features; Long short term memory

Indexed keywords

BRAIN;

EID: 84886418479     PISSN: 02628856     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.imavis.2012.03.001     Document Type: Article
Times cited : (274)

References (69)
  • 2
    • 70349337757 scopus 로고    scopus 로고
    • On the role of emotion in embodied cognitive architectures: From organisms to robots
    • T. Ziemke, R. Lowe, On the role of emotion in embodied cognitive architectures: from organisms to robots, Cogn. Comput. 1 (1) (2009) 104-117.
    • (2009) Cogn. Comput. , vol.1 , Issue.1 , pp. 104-117
    • Ziemke, T.1    Lowe, R.2
  • 3
    • 79959846823 scopus 로고    scopus 로고
    • Real-life emotion-related states detection in call centers: A cross-corpora study
    • Makuhari, Japan
    • L. Devillers, C. Vaudable, C. Chastagnol, Real-life emotion-related states detection in call centers: A cross-corpora study, Proc. of Interspeech, Makuhari, Japan, 2010, pp. 2350-2353.
    • (2010) Proc. of Interspeech , pp. 2350-2353
    • Devillers, L.1    Vaudable, C.2    Chastagnol, C.3
  • 7
    • 51449104640 scopus 로고    scopus 로고
    • Brute-forcinghierarchical functionals for paralinguistics: A waste of feature space?
    • Las Vegas, NV
    • B. Schuller, M.Wimmer, L.Mösenlechner, D. Arsic, G. Rigoll, Brute-forcingHierarchical Functionals for Paralinguistics: A Waste of Feature Space? Proc. of ICASSP, Las Vegas, NV, 2008, pp. 4501-4504.
    • (2008) Proc. of ICASSP , pp. 4501-4504
    • Schuller, B.1    Wimmer, M.2    Mösenlechner, L.3    Arsic, D.4    Rigoll, G.5
  • 8
    • 77949400109 scopus 로고    scopus 로고
    • The hinterland of emotions: Facing the open-microphone challenge
    • Amsterdam, The Netherlands
    • S. Steidl, B. Schuller, A. Batliner, D. Seppi, The Hinterland of Emotions: Facing the Open-microphone Challenge, Proc. of ACII, Amsterdam, The Netherlands, 2009, pp. 690-697.
    • (2009) Proc. of ACII , pp. 690-697
    • Steidl, S.1    Schuller, B.2    Batliner, A.3    Seppi, D.4
  • 10
    • 34547518166 scopus 로고    scopus 로고
    • Support vector regression for automatic recognition of spontaneous emotions in speech
    • Honolulu, Hawaii
    • M. Grimm, K. Kroschel, S. Narayanan, Support Vector Regression for Automatic Recognition of Spontaneous Emotions in Speech, Proc. of ICASSP, Honolulu, Hawaii, 2007, pp. 1085-1088.
    • (2007) Proc. of ICASSP , pp. 1085-1088
    • Grimm, M.1    Kroschel, K.2    Narayanan, S.3
  • 12
    • 77949395673 scopus 로고    scopus 로고
    • Acoustic emotion recognition: A benchmark comparison of performances
    • Merano, Italy
    • B. Schuller, B. Vlasenko, F. Eyben, G. Rigoll, A.Wendemuth, Acoustic Emotion Recognition: A Benchmark Comparison of Performances, Proc. of ASRU, Merano, Italy, 2009, pp. 552-557.
    • (2009) Proc. of ASRU , pp. 552-557
    • Schuller, B.1    Vlasenko, B.2    Eyben, F.3    Rigoll, G.4    Wendemuth, A.5
  • 13
    • 79958734716 scopus 로고    scopus 로고
    • Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional LSTM modeling
    • Makuhari, Japan
    • M. Wöllmer, A. Metallinou, F. Eyben, B. Schuller, S. Narayanan, Context-sensitive Multimodal Emotion Recognition from Speech and Facial Expression Using Bidirectional LSTM Modeling, Proc. of Interspeech, Makuhari, Japan, 2010, pp. 2362-2365.
    • (2010) Proc. of Interspeech , pp. 2362-2365
    • Wöllmer, M.1    Metallinou, A.2    Eyben, F.3    Schuller, B.4    Narayanan, S.5
  • 14
    • 79958719285 scopus 로고    scopus 로고
    • The SEMAINE corpus of emotionally coloured character interactions
    • G. McKeown, M.F. Valstar, M. Pantic, R. Cowie, The SEMAINE Corpus of Emotionally Coloured Character Interactions, Proc. of ICME, 2010, pp. 1-6.
    • (2010) Proc. of ICME , pp. 1-6
    • McKeown, G.1    Valstar, M.F.2    Pantic, M.3    Cowie, R.4
  • 15
    • 0141478857 scopus 로고    scopus 로고
    • Hidden markov model-based speech emotion recognition
    • Hong Kong, China
    • B. Schuller, G. Rigoll, M. Lang, Hidden Markov Model-based Speech Emotion Recognition, Proc. of ICASSP, Hong Kong, China, 2003, pp. 1-4.
    • (2003) Proc. of ICASSP , pp. 1-4
    • Schuller, B.1    Rigoll, G.2    Lang, M.3
  • 16
    • 77951250940 scopus 로고    scopus 로고
    • Context is routinely encoded during emotion perception
    • L.F. Barrett, E.A. Kensinger, Context is routinely encoded during emotion perception, Psychol. Sci. 21 (2010) 595-599.
    • (2010) Psychol. Sci. , vol.21 , pp. 595-599
    • Barrett, L.F.1    Kensinger, E.A.2
  • 17
    • 77956721304 scopus 로고    scopus 로고
    • Combining long short-term memory and dynamic bayesian networks for incremental emotion-sensitive artificial listening
    • M. Wöllmer, B. Schuller, F. Eyben, G. Rigoll, Combining Long Short-Term Memory and Dynamic Bayesian Networks for incremental emotion-sensitive artificial listening, IEEE J. Sel. Top. Sign. Proces. 4 (5) (2010) 867-881.
    • (2010) IEEE J. Sel. Top. Sign. Proces. , vol.4 , Issue.5 , pp. 867-881
    • Wöllmer, M.1    Schuller, B.2    Eyben, F.3    Rigoll, G.4
  • 18
    • 0031573117 scopus 로고    scopus 로고
    • Long Short-Term Memory
    • S. Hochreiter, J. Schmidhuber, Long Short-Term Memory, Neural Comput. 9 (8) (1997) 1735-1780. (Pubitemid 127462305)
    • (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 19
  • 20
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    • DOI 10.1016/j.neunet.2005.06.042, PII S0893608005001206
    • A. Graves, J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks 18 (5-6) (2005) 602-610. (Pubitemid 43186580)
    • (2005) Neural Networks , vol.18 , Issue.5-6 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 21
    • 38149014113 scopus 로고    scopus 로고
    • An application of recurrent neural networks to discriminative keyword spotting
    • Porto, Portugal
    • S. Fernandez, A. Graves, J. Schmidhuber, An Application of Recurrent Neural Networks to Discriminative Keyword Spotting, Proc. of ICANN, Porto, Portugal, 2007, pp. 220-229.
    • (2007) Proc. of ICANN , pp. 220-229
    • Fernandez, S.1    Graves, A.2    Schmidhuber, J.3
  • 22
    • 70349203870 scopus 로고    scopus 로고
    • Robust discriminative keyword spotting for emotionally colored spontaneous speech using bidirectional LSTM networks
    • Taipei, Taiwan
    • M. Wöllmer, F. Eyben, J. Keshet, A. Graves, B. Schuller, G. Rigoll, Robust Discriminative Keyword Spotting for Emotionally Colored Spontaneous Speech Using Bidirectional LSTM Networks, Proc. of ICASSP, Taipei, Taiwan, 2009, pp. 3949-3952.
    • (2009) Proc. of ICASSP , pp. 3949-3952
    • Wöllmer, M.1    Eyben, F.2    Keshet, J.3    Graves, A.4    Schuller, B.5    Rigoll, G.6
  • 23
    • 78651563436 scopus 로고    scopus 로고
    • Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework
    • M. Wöllmer, F. Eyben, A. Graves, B. Schuller, G. Rigoll, Bidirectional LSTM networks for context-sensitive keyword detection in a cognitive virtual agent framework, Cogn. Comput. 2 (3) (2010) 180-190.
    • (2010) Cogn. Comput. , vol.2 , Issue.3 , pp. 180-190
    • Wöllmer, M.1    Eyben, F.2    Graves, A.3    Schuller, B.4    Rigoll, G.5
  • 24
    • 80051637579 scopus 로고    scopus 로고
    • A multi-stream ASR framework for BLSTM modeling of conversational speech
    • Prague, Czech Republic
    • M. Wöllmer, F. Eyben, B. Schuller, G. Rigoll, A Multi-stream ASR Framework for BLSTM Modeling of Conversational Speech, Proc. of ICASSP, Prague, Czech Republic, 2011, pp. 4860-4863.
    • (2011) Proc. of ICASSP , pp. 4860-4863
    • Wöllmer, M.1    Eyben, F.2    Schuller, B.3    Rigoll, G.4
  • 25
    • 84858961864 scopus 로고    scopus 로고
    • A novel bottleneck-BLSTM front-end for feature-level context modeling in conversational speech recognition
    • Waikoloa, Big Island, Hawaii
    • M. Wöllmer, B. Schuller, G. Rigoll, A Novel Bottleneck-BLSTM Front-end for Feature-level Context Modeling in Conversational Speech Recognition, Proc. of ASRU, Waikoloa, Big Island, Hawaii, 2011, pp. 36-41.
    • (2011) Proc. of ASRU , pp. 36-41
    • Wöllmer, M.1    Schuller, B.2    Rigoll, G.3
  • 26
    • 84862156369 scopus 로고    scopus 로고
    • Abandoning emotion classes - towards continuous emotion recognition with modelling of long-range dependencies
    • Brisbane, Australia
    • M. Wöllmer, F. Eyben, S. Reiter, B. Schuller, C. Cox, E. Douglas-Cowie, R. Cowie, Abandoning Emotion Classes - Towards Continuous Emotion Recognition with Modelling of Long-range Dependencies, Proc. of Interspeech, Brisbane, Australia, 2008, pp. 597-600.
    • (2008) Proc. of Interspeech , pp. 597-600
    • Wöllmer, M.1    Eyben, F.2    Reiter, S.3    Schuller, B.4    Cox, C.5    Douglas-Cowie, E.6    Cowie, R.7
  • 27
    • 80054842318 scopus 로고    scopus 로고
    • Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space
    • M.A. Nicolaou, H. Gunes, M. Pantic, Continuous prediction of spontaneous affect from multiple cues and modalities in valence-arousal space, IEEE Trans. Affect. Comput. 2 (2011) 92-105.
    • (2011) IEEE Trans. Affect. Comput. , vol.2 , pp. 92-105
    • Nicolaou, M.A.1    Gunes, H.2    Pantic, M.3
  • 28
    • 78650977476 scopus 로고    scopus 로고
    • Open SMILE the munich versatile and fast open-source audio feature extractor
    • Firenze, Italy
    • F. Eyben, M. Wöllmer, B. Schuller, open SMILE - The Munich Versatile and Fast Open-source Audio Feature Extractor, Proc. of ACM Multimedia, Firenze, Italy, 2010, pp. 1459-1462.
    • (2010) Proc. of ACM Multimedia , pp. 1459-1462
    • Eyben, F.1    Wöllmer, M.2    Schuller, B.3
  • 38
    • 77950555854 scopus 로고    scopus 로고
    • Recent development of open-source speech recognition engine julius
    • Sapporo, Japan
    • A. Lee, T. Kawahara, Recent Development of Open-source Speech Recognition Engine Julius, Proc. of APSIPA ASC, Sapporo, Japan, 2009, pp. 131-137.
    • (2009) Proc. of APSIPA ASC , pp. 131-137
    • Lee, A.1    Kawahara, T.2
  • 39
    • 79959404069 scopus 로고    scopus 로고
    • The design and collection of COSINE, A multi-microphone in situ speech corpus recorded in noisy environments
    • A. Stupakov, E. Hanusa, D. Vijaywargi, D. Fox, J. Bilmes, The design and collection of COSINE, a multi-microphone in situ speech corpus recorded in noisy environments, Comput. Speech Lang. 26 (1) (2011) 52-66.
    • (2011) Comput. Speech Lang. , vol.26 , Issue.1 , pp. 52-66
    • Stupakov, A.1    Hanusa, E.2    Vijaywargi, D.3    Fox, D.4    Bilmes, J.5
  • 41
    • 63449136395 scopus 로고    scopus 로고
    • Facial expression recognition based on local binary patterns: A comprehensive study
    • C. Shan, S. Gong, P.W. McOwan, Facial expression recognition based on local binary patterns: A comprehensive study, Image Vision Comput. 27 (6) (2009) 803-816.
    • (2009) Image Vision Comput , vol.27 , Issue.6 , pp. 803-816
    • Shan, C.1    Gong, S.2    McOwan, P.W.3
  • 42
    • 33847747430 scopus 로고    scopus 로고
    • Facial expression recognition in image sequences using geometric deformation features and support vector machines
    • DOI 10.1109/TIP.2006.884954
    • I. Kotsia, I. Pitas, Facial expression recognition in image sequences using geometric deformation features and support vector machines, IEEE Trans. Image Process. 16 (1) (2007) 172-187. (Pubitemid 46437480)
    • (2007) IEEE Transactions on Image Processing , vol.16 , Issue.1 , pp. 172-187
    • Kotsia, I.1    Pitas, I.2
  • 43
    • 56049094605 scopus 로고    scopus 로고
    • Boosting encoded dynamic features for facial expression recognition
    • P. Yang, Q. Liu, D.N. Metaxas, Boosting encoded dynamic features for facial expression recognition, Pattern Recognit. Lett. 30 (2) (2009) 132-139.
    • (2009) Pattern Recognit. Lett. , vol.30 , Issue.2 , pp. 132-139
    • Yang, P.1    Liu, Q.2    Metaxas, D.N.3
  • 44
    • 42249104358 scopus 로고    scopus 로고
    • An analysis of facial expression recognition under partial facial image occlusion
    • I. Kotsia, I. Buciu, I. Pitas, An analysis of facial expression recognition under partial facial image occlusion, Image Vision Comput 26 (7) (2008) 1052-1067.
    • (2008) Image Vision Comput , vol.26 , Issue.7 , pp. 1052-1067
    • Kotsia, I.1    Buciu, I.2    Pitas, I.3
  • 45
    • 63049094206 scopus 로고    scopus 로고
    • Pose-invariant facial expression recognition using variable-intensity templates
    • S. Kumano, K. Otsuka, J. Yamato, E. Maeda, Y. Sato, Pose-invariant facial expression recognition using variable-intensity templates, Int. J. Comput. Vis. 83 (2) (2009) 178-194.
    • (2009) Int. J. Comput. Vis. , vol.83 , Issue.2 , pp. 178-194
    • Kumano, S.1    Otsuka, K.2    Yamato, J.3    Maeda, E.4    Sato, Y.5
  • 47
    • 79958694881 scopus 로고    scopus 로고
    • String-based audiovisual fusion of behavioural events for the assessment of dimensional affect
    • F. Eyben, M. Wöllmer, M. Valstar, H. Gunes, B. Schuller, M. Pantic, String-based audiovisual fusion of behavioural events for the assessment of dimensional affect, Proc. of FG, 2011, pp. 322-329.
    • (2011) Proc. of FG , pp. 322-329
    • Eyben, F.1    Wöllmer, M.2    Valstar, M.3    Gunes, H.4    Schuller, B.5    Pantic, M.6
  • 48
    • 70449526103 scopus 로고    scopus 로고
    • A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams
    • M. Wöllmer, M. Al-Hames, F. Eyben, B. Schuller, G. Rigoll, A multidimensional dynamic time warping algorithm for efficient multimodal fusion of asynchronous data streams, Neurocomputing 73 (1-3) (2009) 366-380.
    • (2009) Neurocomputing , vol.73 , Issue.1-3 , pp. 366-380
    • Wöllmer, M.1    Al-Hames, M.2    Eyben, F.3    Schuller, B.4    Rigoll, G.5
  • 49
    • 62949227381 scopus 로고    scopus 로고
    • Audio-visual emotion recognition using gaussian mixture models for face and voice
    • Los Alamitos, CA, USA
    • A. Metallinou, S. Lee, S. Narayanan, Audio-Visual Emotion Recognition Using Gaussian Mixture Models for Face and Voice, International Symposium on Multimedia, Los Alamitos, CA, USA, 2008, pp. 250-257.
    • (2008) International Symposium on Multimedia , pp. 250-257
    • Metallinou, A.1    Lee, S.2    Narayanan, S.3
  • 50
    • 44049099067 scopus 로고    scopus 로고
    • Audio-visual affective expression recognition through multistream fused HMM
    • DOI 10.1109/TMM.2008.921737, 4523967
    • Z. Zeng, J. Tu, B. Pianfetti, T.S. Huang, Audio-visual affective expression recognition through multistream fused HMM, IEEE Trans. Multimedia 10 (4) (2008) 570-577. (Pubitemid 351711233)
    • (2008) IEEE Transactions on Multimedia , vol.10 , Issue.4 , pp. 570-577
    • Zeng, Z.1    Tu, J.2    Pianfetti Jr., B.M.3    Huang, T.S.4
  • 54
    • 2142812371 scopus 로고    scopus 로고
    • Robust real-time face detection
    • P.A. Viola, M.J. Jones, Robust real-time face detection, Int. J. Comput. Vis. 57 (2) (2004) 137-154.
    • (2004) Int. J. Comput. Vis. , vol.57 , Issue.2 , pp. 137-154
    • Viola, P.A.1    Jones, M.J.2
  • 55
    • 0036647193 scopus 로고    scopus 로고
    • Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
    • DOI 10.1109/TPAMI.2002.1017623
    • T. Ojala, M. Pietikäinen, T. Mäenpää, Multiresolution gray-scale and rotation invariant texture classification with local binary patterns, IEEE Trans. Pattern Anal. Mach. Intell. 24 (7) (2002) 971-987. (Pubitemid 34835471)
    • (2002) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.24 , Issue.7 , pp. 971-987
    • Ojala, T.1    Pietikainen, M.2    Maenpaa, T.3
  • 57
    • 0004342235 scopus 로고    scopus 로고
    • Computer vision face tracking for use in a perceptual user interface
    • G.R. Bradski, Computer vision face tracking for use in a perceptual user interface, Tech. Rep. Q2, Intel Technol. J. (1998) 1-15.
    • (1998) Tech. Rep. Q2, Intel Technol. J. , pp. 1-15
    • Bradski, G.R.1
  • 60
    • 0031268931 scopus 로고    scopus 로고
    • Bidirectional recurrent neural networks
    • PII S1053587X97080550
    • M. Schuster, K.K. Paliwal, Bidirectional recurrent neural networks, IEEE Trans. Signal Process. 45 (1997) 2673-2681. (Pubitemid 127766336)
    • (1997) IEEE Transactions on Signal Processing , vol.45 , Issue.11 , pp. 2673-2681
    • Schuster, M.1    Paliwal, K.K.2
  • 61
    • 0034293152 scopus 로고    scopus 로고
    • Learning to forget: Continual prediction with LSTM
    • F. Gers, J. Schmidhuber, F. Cummins, Learning to forget: continual prediction with LSTM, Neural Comput. 12 (10) (2000) 2451-2471.
    • (2000) Neural Comput , vol.12 , Issue.10 , pp. 2451-2471
    • Gers, F.1    Schmidhuber, J.2    Cummins, F.3
  • 65
    • 84867614588 scopus 로고    scopus 로고
    • Analyzing the memory of BLSTM neural networks for enhanced emotion classification in dyadic spoken interactions
    • Kyoto, Japan
    • M. Wöllmer, A. Metallinou, N. Katsamanis, B. Schuller, S. Narayanan, Analyzing the Memory of BLSTM Neural Networks for Enhanced Emotion Classification in Dyadic Spoken Interactions, Proc. of ICASSP, Kyoto, Japan, 2012.
    • (2012) Proc. of ICASSP
    • Wöllmer, M.1    Metallinou, A.2    Katsamanis, N.3    Schuller, B.4    Narayanan, S.5
  • 68
    • 34547548235 scopus 로고    scopus 로고
    • Probabilistic and bottle-neck features for LVCSR of meetings
    • Honolulu, Hawaii
    • F. Grezl, M. Karafiat, K. Stanislav, J. Cernocky, Probabilistic and Bottle-neck Features for LVCSR of Meetings, Proc. of ICASSP, Honolulu, Hawaii, 2007, pp. 757-760.
    • (2007) Proc. of ICASSP , pp. 757-760
    • Grezl, F.1    Karafiat, M.2    Stanislav, K.3    Cernocky, J.4
  • 69
    • 84898971246 scopus 로고    scopus 로고
    • An asynchronous hidden markov model for audio-visual speech recognition
    • S. Bengio, An asynchronous Hidden Markov Model for audio-visual speech recognition, Adv. NIPS 15 (2003) 1-8.
    • (2003) Adv. NIPS , vol.15 , pp. 1-8
    • Bengio, S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.