SCOPUS 정보 검색 플랫폼

Neural Networks

Volumn 92, Issue , 2017, Pages 60-68

Evaluating deep learning architectures for Speech Emotion Recognition

(3) Fayek, Haytham M a Lech, Margaret a Cavedon, Lawrence a

a RMIT UNIVERSITY (Australia)

Author keywords

Affective computing; Deep learning; Emotion recognition; Neural networks; Speech recognition

Indexed keywords

DEEP LEARNING; DEEP NEURAL NETWORKS; NETWORK ARCHITECTURE; NEURAL NETWORKS; RECURRENT NEURAL NETWORKS; SPEECH PROCESSING;

AFFECTIVE COMPUTING; DYNAMIC CLASSIFICATION; EMOTION RECOGNITION; LEARNING ARCHITECTURES; QUANTITATIVE AND QUALITATIVE ASSESSMENTS; SPEAKER INDEPENDENTS; SPEECH EMOTION RECOGNITION; STATE OF THE ART;

SPEECH RECOGNITION;

ARTICLE; ARTIFICIAL NEURAL NETWORK; CONTROLLED STUDY; DEEP NEURAL NETWORK; DISCRIMINANT ANALYSIS; MACHINE LEARNING; MEASUREMENT ACCURACY; PRINCIPAL COMPONENT ANALYSIS; PRIORITY JOURNAL; QUALITATIVE ANALYSIS; QUANTITATIVE ANALYSIS; SPEECH EMOTION RECOGNITION; SUPPORT VECTOR MACHINE; AUTOMATIC SPEECH RECOGNITION; EMOTION;

EMOTIONS; MACHINE LEARNING; NEURAL NETWORKS (COMPUTER); SPEECH RECOGNITION SOFTWARE;

EID: 85017190163 PISSN: 08936080 EISSN: 18792782 Source Type: Journal
DOI: 10.1016/j.neunet.2017.02.013 Document Type: Article

Times cited : (506)

References (49)

1
- 84911473441
- Convolutional neural networks for speech recognition
- Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D., Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (2014), 1533–1545.
- (2014) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.22 , pp. 1533-1545
- Abdel-Hamid, O.¹ Mohamed, A.-R.² Jiang, H.³ Deng, L.⁴ Penn, G.⁵ Yu, D.⁶

2
- 84906250865
- Energy and F0 contour modeling with functional data analysis for emotional speech detection
- Arias, J.P., Busso, C., Yoma, N.B., Energy and F0 contour modeling with functional data analysis for emotional speech detection. Interspeech, 2013, 2871–2875.
- (2013) Interspeech , pp. 2871-2875
- Arias, J.P.¹ Busso, C.² Yoma, N.B.³

3
- 78649328053
- Survey on speech emotion recognition: Features, classification schemes, and databases
- Ayadi, M.E., Kamel, M.S., Karray, F., Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44 (2011), 572–587.
- (2011) Pattern Recognition , vol.44 , pp. 572-587
- Ayadi, M.E.¹ Kamel, M.S.² Karray, F.³

4
- 59849093076
- IEMOCAP: interactive emotional dyadic motion capture database
- Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., et al. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation 42 (2008), 335–359, 10.1007/s10579-008-9076-6.
- (2008) Language Resources and Evaluation , vol.42 , pp. 335-359
- Busso, C.¹ Bulut, M.² Lee, C.-C.³ Kazemzadeh, A.⁴ Mower, E.⁵ Kim, S.⁶

5
- 85032751766
- Emotion recognition in human–computer interaction
- Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine 18 (2001), 32–80, 10.1109/79.911197.
- (2001) IEEE Signal Processing Magazine , vol.18 , pp. 32-80
- Cowie, R.¹ Douglas-Cowie, E.² Tsapatsoulis, N.³ Votsis, G.⁴ Kollias, S.⁵ Fellenz, W.⁶

6
- 84965117097
- Equilibrated adaptive learning rates for non-convex optimization
- Dauphin, Y., de Vries, H., Bengio, Y., Equilibrated adaptive learning rates for non-convex optimization. Advances in neural information processing systems, 2015, 1504–1512.
- (2015) Advances in neural information processing systems , pp. 1504-1512
- Dauphin, Y.¹ de Vries, H.² Bengio, Y.³

7
- 84964066710
- Real-time robust recognition of speakers’ emotions and characteristics on mobile platforms
- Eyben, F., Huber, B., Marchi, E., Schuller, D., Schuller, B., Real-time robust recognition of speakers’ emotions and characteristics on mobile platforms. 2015 international conference on affective computing and intelligent interaction (ACII), 2015, 778–780, 10.1109/ACII.2015.7344658.
- (2015) 2015 international conference on affective computing and intelligent interaction (ACII) , pp. 778-780
- Eyben, F.¹ Huber, B.² Marchi, E.³ Schuller, D.⁴ Schuller, B.⁵

8
- 84973513831
- The Geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing
- Eyben, F., Scherer, K., Schuller, B., Sundberg, J., Andre, E., Busso, C., et al. The Geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Transactions on Affective Computing, 2015, 190–202, 10.1109/TAFFC.2015.2457417.
- (2015) IEEE Transactions on Affective Computing , pp. 190-202
- Eyben, F.¹ Scherer, K.² Schuller, B.³ Sundberg, J.⁴ Andre, E.⁵ Busso, C.⁶

9
- 84963682340
- Towards real-time speech emotion recognition using deep neural networks
- Fayek, H.M., Lech, M., Cavedon, L., Towards real-time speech emotion recognition using deep neural networks. 2015 9th international conference on signal processing and communication systems (ICSPCS), 2015, 1–5, 10.1109/ICSPCS.2015.7391796.
- (2015) 2015 9th international conference on signal processing and communication systems (ICSPCS) , pp. 1-5
- Fayek, H.M.¹ Lech, M.² Cavedon, L.³

10
- 85007168680
- Modeling subjectiveness in emotion recognition with deep neural networks: Ensembles vs soft labels
- Fayek, H.M., Lech, M., Cavedon, L., Modeling subjectiveness in emotion recognition with deep neural networks: Ensembles vs soft labels. 2016 international joint conference on neural networks (IJCNN), 2016, 566–570, 10.1109/IJCNN.2016.7727250.
- (2016) 2016 international joint conference on neural networks (IJCNN) , pp. 566-570
- Fayek, H.M.¹ Lech, M.² Cavedon, L.³

11
- 84994259687
- On the correlation and transferability of features between automatic speech recognition and speech emotion recognition
- Fayek, H.M., Lech, M., Cavedon, L., On the correlation and transferability of features between automatic speech recognition and speech emotion recognition. Interspeech 2016, 2016, 3618–3622, 10.21437/Interspeech.2016-868.
- (2016) Interspeech 2016 , pp. 3618-3622
- Fayek, H.M.¹ Lech, M.² Cavedon, L.³

12
- 8644229278
- A computational model for the automatic recognition of affect in speech
- (Ph.D. thesis) School of Architecture and Planning, Massachusetts Institute of Technology
- Fernandez, R., A computational model for the automatic recognition of affect in speech. (Ph.D. thesis), 2004, School of Architecture and Planning, Massachusetts Institute of Technology.
- (2004)
- Fernandez, R.¹

13
- 84862294866
- In JMLR W&CP: Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011.
- Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In JMLR W&CP: Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011.
- (2011) Deep sparse rectifier neural networks.
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

14
- 84944735469
- Deep learning
- MIT Press
- Goodfellow, I., Bengio, Y., Courville, A., Deep learning. 2016, MIT Press.
- (2016)
- Goodfellow, I.¹ Bengio, Y.² Courville, A.³

15
- 70349284484
- Supervised sequence labelling with recurrent neural networks
- (Ph.D. thesis) Technische Universitat Munchen
- Graves, A., Supervised sequence labelling with recurrent neural networks. (Ph.D. thesis), 2008, Technische Universitat Munchen.
- (2008)
- Graves, A.¹

16
- 84890543083
- Speech recognition with deep recurrent neural networks
- Graves, A., Mohamed, A.-R., Hinton, G., Speech recognition with deep recurrent neural networks. 2013 IEEE international conference on acoustics, speech and signal processing, 2013, 6645–6649, 10.1109/ICASSP.2013.6638947.
- (2013) 2013 IEEE international conference on acoustics, speech and signal processing , pp. 6645-6649
- Graves, A.¹ Mohamed, A.-R.² Hinton, G.³

17
- 84910060363
- Speech emotion recognition using deep neural network and extreme learning machine
- Han, K., Yu, D., Tashev, I., Speech emotion recognition using deep neural network and extreme learning machine. Interspeech 2014, 2014.
- (2014) Interspeech 2014
- Han, K.¹ Yu, D.² Tashev, I.³

18
- 84986274465
- Deep residual learning for image recognition
- He, K., Zhang, X., Ren, S., Sun, J., Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016, 770–778, 10.1109/CVPR.2016.90.
- (2016) 2016 IEEE conference on computer vision and pattern recognition (CVPR) , pp. 770-778
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

19
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29 (2012), 82–97, 10.1109/MSP.2012.2205597.
- (2012) IEEE Signal Processing Magazine , vol.29 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶

20
- 0031573117
- Long short-term memory
- Hochreiter, S., Schmidhuber, J., Long short-term memory. Neural Computation 9 (1997), 1735–1780, 10.1162/neco.1997.9.8.1735.
- (1997) Neural Computation , vol.9 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

21
- 84969584486
- In Proceedings of the 32nd international conference on machine learning (pp. ).
- Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd international conference on machine learning (pp. 448–456).
- (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. , pp. 448-456
- Ioffe, S.¹ Szegedy, C.²

22
- 84890526379
- Deep learning for robust feature generation in audiovisual emotion recognition
- Kim, Y., Lee, H., Provost, E.M., Deep learning for robust feature generation in audiovisual emotion recognition. 2013 IEEE international conference on acoustics, speech and signal processing, 2013, 3687–3691, 10.1109/ICASSP.2013.6638346.
- (2013) 2013 IEEE international conference on acoustics, speech and signal processing , pp. 3687-3691
- Kim, Y.¹ Lee, H.² Provost, E.M.³

23
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Krizhevsky, A., Sutskever, I., Hinton, G.E., Imagenet classification with deep convolutional neural networks. Neural information processing systems, 2012, 1097–1105.
- (2012) Neural information processing systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

24
- 84930630277
- Deep learning
- Insight
- LeCun, Y., Bengio, Y., Hinton, G., Deep learning. Nature 521 (2015), 436–444 Insight.
- (2015) Nature , vol.521 , pp. 436-444
- LeCun, Y.¹ Bengio, Y.² Hinton, G.³

25
- 0008458860
- chapter Handwritten digit recognition with a back-propagation network. (pp. ).
- LeCun, Y., Boser, B., Denker, J.S., Howard, R.E., Habbard, W., & Jackel, L.D. et al. (1990). Advances in neural information processing systems 2. chapter Handwritten digit recognition with a back-propagation network. (pp. 396–404).
- (1990) Advances in neural information processing systems 2. , pp. 396-404
- LeCun, Y.¹ Boser, B.² Denker, J.S.³ Howard, R.E.⁴ Habbard, W.⁵ Jackel, L.D.⁶

26
- 79960847182
- Emotion recognition using a hierarchical binary decision tree approach
- Lee, C.-C., Mower, E., Busso, C., Lee, S., Narayanan, S., Emotion recognition using a hierarchical binary decision tree approach. Speech Communication 53 (2011), 1162–1171.
- (2011) Speech Communication , vol.53 , pp. 1162-1171
- Lee, C.-C.¹ Mower, E.² Busso, C.³ Lee, S.⁴ Narayanan, S.⁵

27
- 84893307972
- In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. ).
- Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., & Gonzalez, I. et al. (2013). Hybrid deep neural network–hidden markov model (dnn-hmm) based speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 312–317).
- (2013) Hybrid deep neural network–hidden markov model (dnn-hmm) based speech emotion recognition. , pp. 312-317
- Li, L.¹ Zhao, Y.² Jiang, D.³ Zhang, Y.⁴ Wang, F.⁵ Gonzalez, I.⁶

28
- 84913548678
- Learning salient features for speech emotion recognition using convolutional neural networks
- Mao, Q., Dong, M., Huang, Z., Zhan, Y., Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia 16 (2014), 2203–2213.
- (2014) IEEE Transactions on Multimedia , vol.16 , pp. 2203-2213
- Mao, Q.¹ Dong, M.² Huang, Z.³ Zhan, Y.⁴

29
- 84880519188
- Exploring cross-modality affective reactions for audiovisual emotion recognition
- Mariooryad, S., Busso, C., Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Transactions on Affective Computing 4 (2013), 183–196, 10.1109/T-AFFC.2013.11.
- (2013) IEEE Transactions on Affective Computing , vol.4 , pp. 183-196
- Mariooryad, S.¹ Busso, C.²

30
- 84055211743
- Acoustic modeling using deep belief networks
- Mohamed, A., Dahl, G., Hinton, G., Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20 (2012), 14–22, 10.1109/TASL.2011.2109382.
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , pp. 14-22
- Mohamed, A.¹ Dahl, G.² Hinton, G.³

31
- 80052409390
- Emotion-oriented systems
- Petta, P., Pelachaud, C., Cowie, R., Emotion-oriented systems. The humaine handbook, 2011.
- (2011) The humaine handbook
- Petta, P.¹ Pelachaud, C.² Cowie, R.³

32
- 84858953642
- The kaldi speech recognition toolkit
- IEEE Signal Processing Society IEEE Catalog No.: CFP11SRW-USB
- Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. The kaldi speech recognition toolkit. IEEE 2011 workshop on automatic speech recognition and understanding, 2011, IEEE Signal Processing Society IEEE Catalog No.: CFP11SRW-USB.
- (2011) IEEE 2011 workshop on automatic speech recognition and understanding
- Povey, D.¹ Ghoshal, A.² Boulianne, G.³ Burget, L.⁴ Glembek, O.⁵ Goel, N.⁶

33
- 79954553074
- Cross-validation
- Springer
- Refaeilzadeh, P., Tang, L., Liu, H., Cross-validation. Encyclopedia of database systems, 2009, Springer, 532–538.
- (2009) Encyclopedia of database systems , pp. 532-538
- Refaeilzadeh, P.¹ Tang, L.² Liu, H.³

34
- 0022471098
- Learning representations by back-propagating errors
- Rumelhart, D.E., Hinton, G.E., Williams, R.J., Learning representations by back-propagating errors. Nature 323 (1986), 533–536.
- (1986) Nature , vol.323 , pp. 533-536
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

35
- 84910651844
- Deep learning in neural networks: An overview
- Schmidhuber, J., Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85–117, 10.1016/j.neunet.2014.09.003.
- (2015) Neural Networks , vol.61 , pp. 85-117
- Schmidhuber, J.¹

36
- 70450206416
- The interspeech 2009 emotion challenge
- Schuller, B., Steidl, S., Batliner, A., The interspeech 2009 emotion challenge. Interspeech, vol. 2009, 2009, 312–315.
- (2009) Interspeech, vol. 2009 , pp. 312-315
- Schuller, B.¹ Steidl, S.² Batliner, A.³

37
- 79954999224
- The interspeech 2010 paralinguistic challenge
- Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., et al. The interspeech 2010 paralinguistic challenge. Interspeech, 2010, 2794–2797.
- (2010) Interspeech , pp. 2794-2797
- Schuller, B.¹ Steidl, S.² Batliner, A.³ Burkhardt, F.⁴ Devillers, L.⁵ Müller, C.A.⁶

38
- 77949395673
- In IEEE workshop on automatic speech recognition understanding, 2009 (pp. ).
- Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., & Wendemuth, A. (2009). Acoustic emotion recognition: A benchmark comparison of performances. In IEEE workshop on automatic speech recognition understanding, 2009 (pp. 552–557).
- (2009) Acoustic emotion recognition: A benchmark comparison of performances. , pp. 552-557
- Schuller, B.¹ Vlasenko, B.² Eyben, F.³ Rigoll, G.⁴ Wendemuth, A.⁵

39
- 84907409936
- In IEEE international symposium on circuits and systems (ISCAS) (pp. ).
- Shah, M., Chakrabarti, C., & Spanias, A. (2014). A multi-modal approach to emotion recognition using undirected topic models. In IEEE international symposium on circuits and systems (ISCAS) (pp. 754–757).
- (2014) A multi-modal approach to emotion recognition using undirected topic models. , pp. 754-757
- Shah, M.¹ Chakrabarti, C.² Spanias, A.³

40
- 85083953063
- Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).
- Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).
- (2015)
- Simonyan, K.¹ Zisserman, A.²

41
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (2014), 1929–1958.
- (2014) Journal of Machine Learning Research , vol.15 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

42
- 80051631315
- In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. ).
- Stuhlsatz, A., Meyer, C., Eyben, F., ZieIke, T., Meier, G., & Schuller, B. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691).
- (2011) Deep neural networks for acoustic emotion recognition: Raising the benchmarks. , pp. 5688-5691
- Stuhlsatz, A.¹ Meyer, C.² Eyben, F.³ ZieIke, T.⁴ Meier, G.⁵ Schuller, B.⁶

43
- 84947968496
- Towards a small set of robust acoustic features for emotion recognition: Challenges
- Tahon, M., Devillers, L., Towards a small set of robust acoustic features for emotion recognition: Challenges. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (2016), 16–28.
- (2016) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.24 , pp. 16-28
- Tahon, M.¹ Devillers, L.²

44
- 85020690526
- In R. Lickley (Ed.), The 7th workshop on disfluency in spontaneous speech.
- Tian, L., Lai, C., & Moore, J. (2015). Recognising emotions in dialogues with disfluencies and non-verbal vocalisations. In R. Lickley (Ed.), The 7th workshop on disfluency in spontaneous speech.
- (2015) Recognising emotions in dialogues with disfluencies and non-verbal vocalisations.
- Tian, L.¹ Lai, C.² Moore, J.³

45
- 84964039845
- In 2015 International conference on affective computing and intelligent interaction (ACII) (pp. ).
- Tian, L., Moore, J., & Lai, C. (2015). Emotion recognition in spontaneous and acted dialogues. In 2015 International conference on affective computing and intelligent interaction (ACII) (pp. 698–704).
- (2015) Emotion recognition in spontaneous and acted dialogues. , pp. 698-704
- Tian, L.¹ Moore, J.² Lai, C.³

46
- 84893343292
- Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude
- Tieleman, T., Hinton, G., Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4, 2012.
- (2012) COURSERA: Neural Networks for Machine Learning , vol.4
- Tieleman, T.¹ Hinton, G.²

47
- 33746410556
- Emotional speech recognition: Resources, features, and methods
- Ververidis, D., Kotropoulos, C., Emotional speech recognition: Resources, features, and methods. Speech Communication 48 (2006), 1162–1181.
- (2006) Speech Communication , vol.48 , pp. 1162-1181
- Ververidis, D.¹ Kotropoulos, C.²

48
- 38049048651
- Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing
- Springer
- Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G., Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. Affective computing and intelligent interaction, 2007, Springer, 139–147.
- (2007) Affective computing and intelligent interaction , pp. 139-147
- Vlasenko, B.¹ Schuller, B.² Wendemuth, A.³ Rigoll, G.⁴

49
- 0000903748
- Generalization of backpropagation with application to a recurrent gas market model
- Werbos, P.J., Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1 (1988), 339–356, 10.1016/0893-6080(88)90007-X.
- (1988) Neural Networks , vol.1 , pp. 339-356
- Werbos, P.J.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.