메뉴 건너뛰기




Volumn 2016-May, Issue , 2016, Pages 5200-5204

Adieu features? End-to-end speech emotion recognition using a deep convolutional recurrent network

Author keywords

CNN; deep learning; emotion recognition; end to end learning; LSTM; raw waveform

Indexed keywords


EID: 84973293291     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2016.7472669     Document Type: Conference Paper
Times cited : (875)

References (28)
  • 1
    • 84910651844 scopus 로고    scopus 로고
    • Deep learning in neural networks: An overview
    • January
    • J. Schmidhuber, "Deep learning in neural networks: An overview, " Neural Networks, vol. 61, pp. 85-117, January 2015.
    • (2015) Neural Networks , vol.61 , pp. 85-117
    • Schmidhuber, J.1
  • 3
    • 84936143793 scopus 로고    scopus 로고
    • Towards end-to-end speech recognition with recurrent neural networks
    • Beijing, China
    • A. Graves and N. Jaitly, "Towards end-to-end speech recognition with recurrent neural networks, " in Proc. ICML, Beijing, China, 2014, pp. 1764-1772.
    • (2014) Proc. ICML , pp. 1764-1772
    • Graves, A.1    Jaitly, N.2
  • 5
    • 84960854232 scopus 로고    scopus 로고
    • AV+EC 2015-the first affect recognition challenge bridging across audio, video, and physiological data
    • Eds., Brisbane, Australia, October ACM
    • F. Ringeval et al., "AV+EC 2015-The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data, " in Proc. AVEC, Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic, Eds., Brisbane, Australia, October 2015, pp. 3-8, ACM.
    • (2015) Proc. AVEC, Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic , pp. 3-8
    • Ringeval, F.1
  • 6
    • 80051609011 scopus 로고    scopus 로고
    • Learning a better representation of speech sound waves using restricted Boltzmann machines
    • Prague, Czech Republic, May IEEE
    • N. Jaitly and G. Hinton, "Learning a better representation of speech sound waves using restricted Boltzmann machines, " in Proc. ICASSP, Prague, Czech Republic, May 2011, pp. 5884-5887, IEEE.
    • (2011) Proc. ICASSP , pp. 5884-5887
    • Jaitly, N.1    Hinton, G.2
  • 7
    • 84959098603 scopus 로고    scopus 로고
    • Architectures for deep neural network based acoustic models defined over windowed speech waveforms
    • Dresden, Germany, September ISCA
    • M. Bhargava and R. Rose, "Architectures for deep neural network based acoustic models defined over windowed speech waveforms, " in Proc. INTERSPEECH, Dresden, Germany, September 2015, pp. 6-10, ISCA.
    • (2015) Proc. INTERSPEECH , pp. 6-10
    • Bhargava, M.1    Rose, R.2
  • 8
    • 84946037134 scopus 로고    scopus 로고
    • Convolutional, long short-term memory, fully connected deep neural networks
    • Brisbane, Australia, April IEEE
    • T. Sainath, O. Vinyals, A. Senior, and H. Sak, "Convolutional, long short-term memory, fully connected deep neural networks, " in Proc. ICASSP, Brisbane, Australia, April 2015, pp. 4580-4584, IEEE.
    • (2015) Proc. ICASSP , pp. 4580-4584
    • Sainath, T.1    Vinyals, O.2    Senior, A.3    Sak, H.4
  • 9
    • 84959168440 scopus 로고    scopus 로고
    • Learning the speech front-end with raw waveform cldnns
    • Dresden, Germany, September ISCA
    • T. Sainath, R. Weiss, A. Senior, K. Wilson, and O. Vinyals, "Learning the speech front-end with raw waveform cldnns, " in Proc. INTERSPEECH, Dresden, Germany, September 2015, pp. 1-5, ISCA.
    • (2015) Proc. INTERSPEECH , pp. 1-5
    • Sainath, T.1    Weiss, R.2    Senior, A.3    Wilson, K.4    Vinyals, O.5
  • 10
    • 84906273908 scopus 로고    scopus 로고
    • Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
    • Lyon, France, August ISCA
    • D. Palaz, R. Collobert, and M. Magimai-Doss, "Estimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks, " in Proc. INTERSPEECH, Lyon, France, August 2013, pp. 1766-1770, ISCA.
    • (2013) Proc. INTERSPEECH , pp. 1766-1770
    • Palaz, D.1    Collobert, R.2    Magimai-Doss, M.3
  • 11
    • 84955059475 scopus 로고    scopus 로고
    • Analysis of cnn-based speech recognition system using raw speech as input
    • Dresden, Germany, September, ISCA
    • D. Palaz, M. Magimai-Doss, and R. Collobert, "Analysis of cnn-based speech recognition system using raw speech as input, " in Proc. INTERSPEECH, Dresden, Germany, September 2015, pp. 11-15, ISCA.
    • (2015) Proc. INTERSPEECH , pp. 11-15
    • Palaz, D.1    Magimai-Doss, M.2    Collobert, R.3
  • 12
    • 84905248193 scopus 로고    scopus 로고
    • End-to-end learning for music audio
    • Florence, Italy, April
    • S. Dieleman and B. Schrauwen, "End-to-end learning for music audio, " in Proc. ICASSP, Florence, Italy, April 2014, pp. 7014-7018.
    • (2014) Proc. ICASSP , pp. 7014-7018
    • Dieleman, S.1    Schrauwen, B.2
  • 14
    • 84959157337 scopus 로고    scopus 로고
    • Using representation learning and out-of-domain data for a paralinguistic speech task
    • Dresden, Germany, September, ISCA
    • B. Milde and C. Biemann, "Using representation learning and out-of-domain data for a paralinguistic speech task, " in Proc. INTERSPEECH, Dresden, Germany, September 2015, pp. 904-908, ISCA.
    • (2015) Proc. INTERSPEECH , pp. 904-908
    • Milde, B.1    Biemann, C.2
  • 15
    • 84913548678 scopus 로고    scopus 로고
    • Learning salient features for speech emotion recognition using convolutional neural networks
    • Dec
    • Q. Mao, M. Dong, Z. Huang, and Y. Zhan, "Learning salient features for speech emotion recognition using convolutional neural networks, " IEEE Transactions on Multimedia, vol. 16, no. 8, pp. 2203-2213, Dec 2014.
    • (2014) IEEE Transactions on Multimedia , vol.16 , Issue.8 , pp. 2203-2213
    • Mao, Q.1    Dong, M.2    Huang, Z.3    Zhan, Y.4
  • 16
    • 0024521543 scopus 로고
    • A concordance correlation coefficient to evaluate reproducibility
    • March
    • L. I-Kuei Lin, "A concordance correlation coefficient to evaluate reproducibility, " Biometrics, vol. 45, no. 1, pp. 255-268, March 1989.
    • (1989) Biometrics , vol.45 , Issue.1 , pp. 255-268
    • I-Kuei Lin, L.1
  • 17
    • 0011823639 scopus 로고
    • Improved speech recognition using high-pass filtering of subband envelopes
    • Genoa, Italy, September, ISCA
    • H. G. Hirsch, P. Meyer, and H. W. Ruehl, "Improved speech recognition using high-pass filtering of subband envelopes, " in Proc. EUROSPEECH, Genoa, Italy, September 1991, pp. 413-416, ISCA.
    • (1991) Proc. EUROSPEECH , pp. 413-416
    • Hirsch, H.G.1    Meyer, P.2    Ruehl, H.W.3
  • 18
    • 34547539413 scopus 로고    scopus 로고
    • Gammatone features and feature combination for large vocabulary speech recognition
    • IEEE
    • R. Schlüter, L. Bezrukov, H. Wagner, and H. Ney, "Gammatone features and feature combination for large vocabulary speech recognition, " in Proc. ICASSP. April 2007, vol. 4, pp. 649-652, IEEE.
    • Proc. ICASSP. April 2007 , vol.4 , pp. 649-652
    • Schlüter, R.1    Bezrukov, L.2    Wagner, H.3    Ney, H.4
  • 19
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional lstm and other neural network architectures
    • July-August
    • A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional lstm and other neural network architectures, " Neural Networks, IJCNN Special Issue, vol. 18, no. 5-6, pp. 602-610, July-August 2005.
    • (2005) Neural Networks, IJCNN Special Issue , vol.18 , Issue.5-6 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 20
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 21
    • 84943197961 scopus 로고    scopus 로고
    • Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data
    • November
    • F. Ringeval et al., "Prediction of asynchronous dimensional emotion ratings from audiovisual and physiological data, " Pattern Recognition Letters, vol. 66, pp. 22-30, November 2015.
    • (2015) Pattern Recognition Letters , vol.66 , pp. 22-30
    • Ringeval, F.1
  • 22
    • 84915817064 scopus 로고    scopus 로고
    • Introducing the RECOLA multimodal corpus of remote collaborative and affective interactions
    • FG, Shanghai, China
    • F. Ringeval, A. Sonderegger, J. Sauer, and D. Lalanne, "Introducing the RECOLA Multimodal Corpus of Remote Collaborative and Affective Interactions, " in Proc. of EmoSPACE, FG, Shanghai, China, 2013.
    • (2013) Proc. of EmoSPACE
    • Ringeval, F.1    Sonderegger, A.2    Sauer, J.3    Lalanne, D.4
  • 23
    • 84947915210 scopus 로고    scopus 로고
    • The geneva minimalistic acoustic parameter set (GeMAPS) for voice research and affective computing
    • in press
    • F. Eyben et al., "The Geneva Minimalistic Acoustic Parameter Set (GeMAPS) for Voice Research and Affective Computing, " IEEE Transactions on Affective Computing, 2015, in press.
    • (2015) IEEE Transactions on Affective Computing
    • Eyben, F.1
  • 24
    • 85083951076 scopus 로고    scopus 로고
    • Adam: A method for stochastic optimization
    • San Diego, USA
    • D. Kingma and J. Ba, "Adam: A method for stochastic optimization, " in Proc. ICLR, San Diego, USA, 2015.
    • (2015) Proc. ICLR
    • Kingma, D.1    Ba, J.2
  • 26
    • 84960847562 scopus 로고    scopus 로고
    • Ensemble methods for continuous affect recognition: Multimodality, temporality, and challenges
    • Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic, Eds., Brisbane, Australia, October
    • M. Kächele, P. Thiam, G. Palm, F. Schwenker, and M. Schels, "Ensemble methods for continuous affect recognition: Multimodality, temporality, and challenges, " in Proc. AVEC, Fabien Ringeval, Björn Schuller, Michel Valstar, Roddy Cowie, and Maja Pantic, Eds., Brisbane, Australia, October 2015, pp. 9-16.
    • (2015) Proc. AVEC, Fabien Ringeval , pp. 9-16
    • Kächele, M.1    Thiam, P.2    Palm, G.3    Schwenker, F.4    Schels, M.5
  • 27
    • 84930944930 scopus 로고    scopus 로고
    • Correcting time-continuous emotional labels by modeling the reaction lag of evaluators
    • April-June
    • S. Mariooryad and C. Busso, "Correcting time-continuous emotional labels by modeling the reaction lag of evaluators, " IEEE Transactions on Affective Computing, vol. 6, no. 2, pp. 97-108, April-June 2015.
    • (2015) IEEE Transactions on Affective Computing , vol.6 , Issue.2 , pp. 97-108
    • Mariooryad, S.1    Busso, C.2
  • 28
    • 0037384712 scopus 로고    scopus 로고
    • Vocal communication of emotion: A review of research paradigms
    • April
    • K. Scherer, "Vocal communication of emotion: A review of research paradigms, " Speech Communication, vol. 40, no. 1-2, pp. 227-256, April 2003.
    • (2003) Speech Communication , vol.40 , Issue.1-2 , pp. 227-256
    • Scherer, K.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.