-
1
-
-
84911473441
-
Convolutional neural networks for speech recognition
-
Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D., Convolutional neural networks for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22 (2014), 1533–1545.
-
(2014)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
, vol.22
, pp. 1533-1545
-
-
Abdel-Hamid, O.1
Mohamed, A.-R.2
Jiang, H.3
Deng, L.4
Penn, G.5
Yu, D.6
-
2
-
-
84906250865
-
Energy and F0 contour modeling with functional data analysis for emotional speech detection
-
Arias, J.P., Busso, C., Yoma, N.B., Energy and F0 contour modeling with functional data analysis for emotional speech detection. Interspeech, 2013, 2871–2875.
-
(2013)
Interspeech
, pp. 2871-2875
-
-
Arias, J.P.1
Busso, C.2
Yoma, N.B.3
-
3
-
-
78649328053
-
Survey on speech emotion recognition: Features, classification schemes, and databases
-
Ayadi, M.E., Kamel, M.S., Karray, F., Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern Recognition 44 (2011), 572–587.
-
(2011)
Pattern Recognition
, vol.44
, pp. 572-587
-
-
Ayadi, M.E.1
Kamel, M.S.2
Karray, F.3
-
4
-
-
59849093076
-
IEMOCAP: interactive emotional dyadic motion capture database
-
Busso, C., Bulut, M., Lee, C.-C., Kazemzadeh, A., Mower, E., Kim, S., et al. IEMOCAP: interactive emotional dyadic motion capture database. Language Resources and Evaluation 42 (2008), 335–359, 10.1007/s10579-008-9076-6.
-
(2008)
Language Resources and Evaluation
, vol.42
, pp. 335-359
-
-
Busso, C.1
Bulut, M.2
Lee, C.-C.3
Kazemzadeh, A.4
Mower, E.5
Kim, S.6
-
5
-
-
85032751766
-
Emotion recognition in human–computer interaction
-
Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., et al. Emotion recognition in human–computer interaction. IEEE Signal Processing Magazine 18 (2001), 32–80, 10.1109/79.911197.
-
(2001)
IEEE Signal Processing Magazine
, vol.18
, pp. 32-80
-
-
Cowie, R.1
Douglas-Cowie, E.2
Tsapatsoulis, N.3
Votsis, G.4
Kollias, S.5
Fellenz, W.6
-
6
-
-
84965117097
-
Equilibrated adaptive learning rates for non-convex optimization
-
Dauphin, Y., de Vries, H., Bengio, Y., Equilibrated adaptive learning rates for non-convex optimization. Advances in neural information processing systems, 2015, 1504–1512.
-
(2015)
Advances in neural information processing systems
, pp. 1504-1512
-
-
Dauphin, Y.1
de Vries, H.2
Bengio, Y.3
-
7
-
-
84964066710
-
Real-time robust recognition of speakers’ emotions and characteristics on mobile platforms
-
Eyben, F., Huber, B., Marchi, E., Schuller, D., Schuller, B., Real-time robust recognition of speakers’ emotions and characteristics on mobile platforms. 2015 international conference on affective computing and intelligent interaction (ACII), 2015, 778–780, 10.1109/ACII.2015.7344658.
-
(2015)
2015 international conference on affective computing and intelligent interaction (ACII)
, pp. 778-780
-
-
Eyben, F.1
Huber, B.2
Marchi, E.3
Schuller, D.4
Schuller, B.5
-
8
-
-
84973513831
-
The Geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing
-
Eyben, F., Scherer, K., Schuller, B., Sundberg, J., Andre, E., Busso, C., et al. The Geneva minimalistic acoustic parameter set (gemaps) for voice research and affective computing. IEEE Transactions on Affective Computing, 2015, 190–202, 10.1109/TAFFC.2015.2457417.
-
(2015)
IEEE Transactions on Affective Computing
, pp. 190-202
-
-
Eyben, F.1
Scherer, K.2
Schuller, B.3
Sundberg, J.4
Andre, E.5
Busso, C.6
-
9
-
-
84963682340
-
Towards real-time speech emotion recognition using deep neural networks
-
Fayek, H.M., Lech, M., Cavedon, L., Towards real-time speech emotion recognition using deep neural networks. 2015 9th international conference on signal processing and communication systems (ICSPCS), 2015, 1–5, 10.1109/ICSPCS.2015.7391796.
-
(2015)
2015 9th international conference on signal processing and communication systems (ICSPCS)
, pp. 1-5
-
-
Fayek, H.M.1
Lech, M.2
Cavedon, L.3
-
10
-
-
85007168680
-
Modeling subjectiveness in emotion recognition with deep neural networks: Ensembles vs soft labels
-
Fayek, H.M., Lech, M., Cavedon, L., Modeling subjectiveness in emotion recognition with deep neural networks: Ensembles vs soft labels. 2016 international joint conference on neural networks (IJCNN), 2016, 566–570, 10.1109/IJCNN.2016.7727250.
-
(2016)
2016 international joint conference on neural networks (IJCNN)
, pp. 566-570
-
-
Fayek, H.M.1
Lech, M.2
Cavedon, L.3
-
11
-
-
84994259687
-
On the correlation and transferability of features between automatic speech recognition and speech emotion recognition
-
Fayek, H.M., Lech, M., Cavedon, L., On the correlation and transferability of features between automatic speech recognition and speech emotion recognition. Interspeech 2016, 2016, 3618–3622, 10.21437/Interspeech.2016-868.
-
(2016)
Interspeech 2016
, pp. 3618-3622
-
-
Fayek, H.M.1
Lech, M.2
Cavedon, L.3
-
12
-
-
8644229278
-
A computational model for the automatic recognition of affect in speech
-
(Ph.D. thesis) School of Architecture and Planning, Massachusetts Institute of Technology
-
Fernandez, R., A computational model for the automatic recognition of affect in speech. (Ph.D. thesis), 2004, School of Architecture and Planning, Massachusetts Institute of Technology.
-
(2004)
-
-
Fernandez, R.1
-
13
-
-
84862294866
-
-
In JMLR W&CP: Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011.
-
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. In JMLR W&CP: Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011.
-
(2011)
Deep sparse rectifier neural networks.
-
-
Glorot, X.1
Bordes, A.2
Bengio, Y.3
-
14
-
-
84944735469
-
Deep learning
-
MIT Press
-
Goodfellow, I., Bengio, Y., Courville, A., Deep learning. 2016, MIT Press.
-
(2016)
-
-
Goodfellow, I.1
Bengio, Y.2
Courville, A.3
-
15
-
-
70349284484
-
Supervised sequence labelling with recurrent neural networks
-
(Ph.D. thesis) Technische Universitat Munchen
-
Graves, A., Supervised sequence labelling with recurrent neural networks. (Ph.D. thesis), 2008, Technische Universitat Munchen.
-
(2008)
-
-
Graves, A.1
-
16
-
-
84890543083
-
Speech recognition with deep recurrent neural networks
-
Graves, A., Mohamed, A.-R., Hinton, G., Speech recognition with deep recurrent neural networks. 2013 IEEE international conference on acoustics, speech and signal processing, 2013, 6645–6649, 10.1109/ICASSP.2013.6638947.
-
(2013)
2013 IEEE international conference on acoustics, speech and signal processing
, pp. 6645-6649
-
-
Graves, A.1
Mohamed, A.-R.2
Hinton, G.3
-
17
-
-
84910060363
-
Speech emotion recognition using deep neural network and extreme learning machine
-
Han, K., Yu, D., Tashev, I., Speech emotion recognition using deep neural network and extreme learning machine. Interspeech 2014, 2014.
-
(2014)
Interspeech 2014
-
-
Han, K.1
Yu, D.2
Tashev, I.3
-
18
-
-
84986274465
-
Deep residual learning for image recognition
-
He, K., Zhang, X., Ren, S., Sun, J., Deep residual learning for image recognition. 2016 IEEE conference on computer vision and pattern recognition (CVPR), 2016, 770–778, 10.1109/CVPR.2016.90.
-
(2016)
2016 IEEE conference on computer vision and pattern recognition (CVPR)
, pp. 770-778
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
19
-
-
85032751458
-
Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
-
Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.-R., Jaitly, N., et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE Signal Processing Magazine 29 (2012), 82–97, 10.1109/MSP.2012.2205597.
-
(2012)
IEEE Signal Processing Magazine
, vol.29
, pp. 82-97
-
-
Hinton, G.1
Deng, L.2
Yu, D.3
Dahl, G.E.4
Mohamed, A.-R.5
Jaitly, N.6
-
20
-
-
0031573117
-
Long short-term memory
-
Hochreiter, S., Schmidhuber, J., Long short-term memory. Neural Computation 9 (1997), 1735–1780, 10.1162/neco.1997.9.8.1735.
-
(1997)
Neural Computation
, vol.9
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
22
-
-
84890526379
-
Deep learning for robust feature generation in audiovisual emotion recognition
-
Kim, Y., Lee, H., Provost, E.M., Deep learning for robust feature generation in audiovisual emotion recognition. 2013 IEEE international conference on acoustics, speech and signal processing, 2013, 3687–3691, 10.1109/ICASSP.2013.6638346.
-
(2013)
2013 IEEE international conference on acoustics, speech and signal processing
, pp. 3687-3691
-
-
Kim, Y.1
Lee, H.2
Provost, E.M.3
-
23
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
Krizhevsky, A., Sutskever, I., Hinton, G.E., Imagenet classification with deep convolutional neural networks. Neural information processing systems, 2012, 1097–1105.
-
(2012)
Neural information processing systems
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
24
-
-
84930630277
-
Deep learning
-
Insight
-
LeCun, Y., Bengio, Y., Hinton, G., Deep learning. Nature 521 (2015), 436–444 Insight.
-
(2015)
Nature
, vol.521
, pp. 436-444
-
-
LeCun, Y.1
Bengio, Y.2
Hinton, G.3
-
25
-
-
0008458860
-
-
chapter Handwritten digit recognition with a back-propagation network. (pp. ).
-
LeCun, Y., Boser, B., Denker, J.S., Howard, R.E., Habbard, W., & Jackel, L.D. et al. (1990). Advances in neural information processing systems 2. chapter Handwritten digit recognition with a back-propagation network. (pp. 396–404).
-
(1990)
Advances in neural information processing systems 2.
, pp. 396-404
-
-
LeCun, Y.1
Boser, B.2
Denker, J.S.3
Howard, R.E.4
Habbard, W.5
Jackel, L.D.6
-
26
-
-
79960847182
-
Emotion recognition using a hierarchical binary decision tree approach
-
Lee, C.-C., Mower, E., Busso, C., Lee, S., Narayanan, S., Emotion recognition using a hierarchical binary decision tree approach. Speech Communication 53 (2011), 1162–1171.
-
(2011)
Speech Communication
, vol.53
, pp. 1162-1171
-
-
Lee, C.-C.1
Mower, E.2
Busso, C.3
Lee, S.4
Narayanan, S.5
-
27
-
-
84893307972
-
-
In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. ).
-
Li, L., Zhao, Y., Jiang, D., Zhang, Y., Wang, F., & Gonzalez, I. et al. (2013). Hybrid deep neural network–hidden markov model (dnn-hmm) based speech emotion recognition. In 2013 humaine association conference on affective computing and intelligent interaction (ACII) (pp. 312–317).
-
(2013)
Hybrid deep neural network–hidden markov model (dnn-hmm) based speech emotion recognition.
, pp. 312-317
-
-
Li, L.1
Zhao, Y.2
Jiang, D.3
Zhang, Y.4
Wang, F.5
Gonzalez, I.6
-
28
-
-
84913548678
-
Learning salient features for speech emotion recognition using convolutional neural networks
-
Mao, Q., Dong, M., Huang, Z., Zhan, Y., Learning salient features for speech emotion recognition using convolutional neural networks. IEEE Transactions on Multimedia 16 (2014), 2203–2213.
-
(2014)
IEEE Transactions on Multimedia
, vol.16
, pp. 2203-2213
-
-
Mao, Q.1
Dong, M.2
Huang, Z.3
Zhan, Y.4
-
29
-
-
84880519188
-
Exploring cross-modality affective reactions for audiovisual emotion recognition
-
Mariooryad, S., Busso, C., Exploring cross-modality affective reactions for audiovisual emotion recognition. IEEE Transactions on Affective Computing 4 (2013), 183–196, 10.1109/T-AFFC.2013.11.
-
(2013)
IEEE Transactions on Affective Computing
, vol.4
, pp. 183-196
-
-
Mariooryad, S.1
Busso, C.2
-
30
-
-
84055211743
-
Acoustic modeling using deep belief networks
-
Mohamed, A., Dahl, G., Hinton, G., Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing 20 (2012), 14–22, 10.1109/TASL.2011.2109382.
-
(2012)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.20
, pp. 14-22
-
-
Mohamed, A.1
Dahl, G.2
Hinton, G.3
-
31
-
-
80052409390
-
Emotion-oriented systems
-
Petta, P., Pelachaud, C., Cowie, R., Emotion-oriented systems. The humaine handbook, 2011.
-
(2011)
The humaine handbook
-
-
Petta, P.1
Pelachaud, C.2
Cowie, R.3
-
32
-
-
84858953642
-
The kaldi speech recognition toolkit
-
IEEE Signal Processing Society IEEE Catalog No.: CFP11SRW-USB
-
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., et al. The kaldi speech recognition toolkit. IEEE 2011 workshop on automatic speech recognition and understanding, 2011, IEEE Signal Processing Society IEEE Catalog No.: CFP11SRW-USB.
-
(2011)
IEEE 2011 workshop on automatic speech recognition and understanding
-
-
Povey, D.1
Ghoshal, A.2
Boulianne, G.3
Burget, L.4
Glembek, O.5
Goel, N.6
-
33
-
-
79954553074
-
Cross-validation
-
Springer
-
Refaeilzadeh, P., Tang, L., Liu, H., Cross-validation. Encyclopedia of database systems, 2009, Springer, 532–538.
-
(2009)
Encyclopedia of database systems
, pp. 532-538
-
-
Refaeilzadeh, P.1
Tang, L.2
Liu, H.3
-
34
-
-
0022471098
-
Learning representations by back-propagating errors
-
Rumelhart, D.E., Hinton, G.E., Williams, R.J., Learning representations by back-propagating errors. Nature 323 (1986), 533–536.
-
(1986)
Nature
, vol.323
, pp. 533-536
-
-
Rumelhart, D.E.1
Hinton, G.E.2
Williams, R.J.3
-
35
-
-
84910651844
-
Deep learning in neural networks: An overview
-
Schmidhuber, J., Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85–117, 10.1016/j.neunet.2014.09.003.
-
(2015)
Neural Networks
, vol.61
, pp. 85-117
-
-
Schmidhuber, J.1
-
36
-
-
70450206416
-
The interspeech 2009 emotion challenge
-
Schuller, B., Steidl, S., Batliner, A., The interspeech 2009 emotion challenge. Interspeech, vol. 2009, 2009, 312–315.
-
(2009)
Interspeech, vol. 2009
, pp. 312-315
-
-
Schuller, B.1
Steidl, S.2
Batliner, A.3
-
37
-
-
79954999224
-
The interspeech 2010 paralinguistic challenge
-
Schuller, B., Steidl, S., Batliner, A., Burkhardt, F., Devillers, L., Müller, C.A., et al. The interspeech 2010 paralinguistic challenge. Interspeech, 2010, 2794–2797.
-
(2010)
Interspeech
, pp. 2794-2797
-
-
Schuller, B.1
Steidl, S.2
Batliner, A.3
Burkhardt, F.4
Devillers, L.5
Müller, C.A.6
-
38
-
-
77949395673
-
-
In IEEE workshop on automatic speech recognition understanding, 2009 (pp. ).
-
Schuller, B., Vlasenko, B., Eyben, F., Rigoll, G., & Wendemuth, A. (2009). Acoustic emotion recognition: A benchmark comparison of performances. In IEEE workshop on automatic speech recognition understanding, 2009 (pp. 552–557).
-
(2009)
Acoustic emotion recognition: A benchmark comparison of performances.
, pp. 552-557
-
-
Schuller, B.1
Vlasenko, B.2
Eyben, F.3
Rigoll, G.4
Wendemuth, A.5
-
40
-
-
85083953063
-
-
Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).
-
Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In International conference on learning representations (ICLR).
-
(2015)
-
-
Simonyan, K.1
Zisserman, A.2
-
41
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R., Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research 15 (2014), 1929–1958.
-
(2014)
Journal of Machine Learning Research
, vol.15
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
42
-
-
80051631315
-
-
In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. ).
-
Stuhlsatz, A., Meyer, C., Eyben, F., ZieIke, T., Meier, G., & Schuller, B. (2011). Deep neural networks for acoustic emotion recognition: Raising the benchmarks. In 2011 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5688–5691).
-
(2011)
Deep neural networks for acoustic emotion recognition: Raising the benchmarks.
, pp. 5688-5691
-
-
Stuhlsatz, A.1
Meyer, C.2
Eyben, F.3
ZieIke, T.4
Meier, G.5
Schuller, B.6
-
43
-
-
84947968496
-
Towards a small set of robust acoustic features for emotion recognition: Challenges
-
Tahon, M., Devillers, L., Towards a small set of robust acoustic features for emotion recognition: Challenges. IEEE/ACM Transactions on Audio, Speech, and Language Processing 24 (2016), 16–28.
-
(2016)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
, vol.24
, pp. 16-28
-
-
Tahon, M.1
Devillers, L.2
-
44
-
-
85020690526
-
-
In R. Lickley (Ed.), The 7th workshop on disfluency in spontaneous speech.
-
Tian, L., Lai, C., & Moore, J. (2015). Recognising emotions in dialogues with disfluencies and non-verbal vocalisations. In R. Lickley (Ed.), The 7th workshop on disfluency in spontaneous speech.
-
(2015)
Recognising emotions in dialogues with disfluencies and non-verbal vocalisations.
-
-
Tian, L.1
Lai, C.2
Moore, J.3
-
45
-
-
84964039845
-
-
In 2015 International conference on affective computing and intelligent interaction (ACII) (pp. ).
-
Tian, L., Moore, J., & Lai, C. (2015). Emotion recognition in spontaneous and acted dialogues. In 2015 International conference on affective computing and intelligent interaction (ACII) (pp. 698–704).
-
(2015)
Emotion recognition in spontaneous and acted dialogues.
, pp. 698-704
-
-
Tian, L.1
Moore, J.2
Lai, C.3
-
46
-
-
84893343292
-
Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude
-
Tieleman, T., Hinton, G., Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning, 4, 2012.
-
(2012)
COURSERA: Neural Networks for Machine Learning
, vol.4
-
-
Tieleman, T.1
Hinton, G.2
-
47
-
-
33746410556
-
Emotional speech recognition: Resources, features, and methods
-
Ververidis, D., Kotropoulos, C., Emotional speech recognition: Resources, features, and methods. Speech Communication 48 (2006), 1162–1181.
-
(2006)
Speech Communication
, vol.48
, pp. 1162-1181
-
-
Ververidis, D.1
Kotropoulos, C.2
-
48
-
-
38049048651
-
Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing
-
Springer
-
Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G., Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing. Affective computing and intelligent interaction, 2007, Springer, 139–147.
-
(2007)
Affective computing and intelligent interaction
, pp. 139-147
-
-
Vlasenko, B.1
Schuller, B.2
Wendemuth, A.3
Rigoll, G.4
-
49
-
-
0000903748
-
Generalization of backpropagation with application to a recurrent gas market model
-
Werbos, P.J., Generalization of backpropagation with application to a recurrent gas market model. Neural Networks 1 (1988), 339–356, 10.1016/0893-6080(88)90007-X.
-
(1988)
Neural Networks
, vol.1
, pp. 339-356
-
-
Werbos, P.J.1
|