메뉴 건너뛰기




Volumn 53, Issue 5, 2011, Pages 768-785

Automatic speech emotion recognition using modulation spectral features

Author keywords

Affective computing; Emotion recognition; Spectro temporal representation; Speech analysis; Speech modulation

Indexed keywords

ACOUSTIC FREQUENCY; AFFECTIVE COMPUTING; AUTOMATIC RECOGNITION; EMOTION RECOGNITION; ESTIMATION PERFORMANCE; HUMAN EVALUATION; HUMAN SPEECH PERCEPTION; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; MODULATION FILTERBANK; PERCEPTUAL LINEAR PREDICTIONS; PROSODIC FEATURES; RECOGNITION PERFORMANCE; RECOGNITION RATES; SPECTRAL FEATURE; SPECTRAL REPRESENTATIONS; SPECTRO-TEMPORAL REPRESENTATION; SPEECH EMOTION RECOGNITION; TEMPORAL MODULATIONS; TEMPORAL REPRESENTATIONS;

EID: 79953659944     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2010.08.013     Document Type: Article
Times cited : (354)

References (60)
  • 2
    • 0019199104 scopus 로고
    • Spectro-temporal receptive fields of auditory neurons in the grassfrog. I. Characterization of tonal and natural stimuli
    • DOI 10.1007/BF00337015
    • A. Aertsen, and P. Johannesma Spectro-temporal receptive fields of auditory neurons in the grass frog. I. Characterization of tonal and natural stimuli Biol. Cybernet. 38 1980 223 234 (Pubitemid 11220294)
    • (1980) Biological Cybernetics , vol.38 , Issue.4 , pp. 223-234
    • Aertsen, A.M.H.J.1    Johannesma, P.I.M.2
  • 8
    • 65249116503 scopus 로고    scopus 로고
    • Analysis of emotionally salient aspects of fundamental frequency for emotion detection
    • C. Busso, S. Lee, and S. Narayanan Analysis of emotionally salient aspects of fundamental frequency for emotion detection IEEE Trans. Audio Speech Language Process. 17 2009 582 596
    • (2009) IEEE Trans. Audio Speech Language Process , vol.17 , pp. 582-596
    • Busso, C.1    Lee, S.2    Narayanan, S.3
  • 9
    • 0003710380 scopus 로고    scopus 로고
    • LIBSVM: A library for support vector machines
    • Department of Computer Science, National Taiwan University. Software
    • Chang, C.-C., Lin, C.-J., 2009. LIBSVM: A library for support vector machines. Tech. rep., Department of Computer Science, National Taiwan University. Software available at: .
    • (2009) Tech. Rep.
    • Chang, C.-C.1    Lin, C.-J.2
  • 10
    • 23744508888 scopus 로고    scopus 로고
    • Multiresolution spectrotemporal analysis of complex sounds
    • DOI 10.1121/1.1945807
    • T. Chih, P. Ru, and S. Shamma Multiresolution spectrotemporal analysis of complex sounds J. Acoust. Soc. Amer. 118 2005 887 906 (Pubitemid 41129224)
    • (2005) Journal of the Acoustical Society of America , vol.118 , Issue.2 , pp. 887-906
    • Chi, T.1    Ru, P.2    Shamma, S.A.3
  • 11
    • 44149109121 scopus 로고    scopus 로고
    • Fear-type emotion recognition for future audio-based surveillance systems
    • C. Clavel, I. Vasilescu, L. Devillers, G. Richard, and T. Ehrette Fear-type emotion recognition for future audio-based surveillance systems Speech Commun. 50 2008 487 503
    • (2008) Speech Commun. , vol.50 , pp. 487-503
    • Clavel, C.1    Vasilescu, I.2    Devillers, L.3    Richard, G.4    Ehrette, T.5
  • 12
    • 0030352957 scopus 로고    scopus 로고
    • Automatic statistical analysis of the signal and prosodic signs of emotion in speech
    • Cowie, R., Douglas-Cowie, E., 1996. Automatic statistical analysis of the signal and prosodic signs of emotion in speech. In: Proc. Internat. Conf. on Spoken Language Processing, Vol. 3, pp. 1989-1992.
    • (1996) Proc. Internat. Conf. on Spoken Language Processing , vol.3 , pp. 1989-1992
    • Cowie, R.1    Douglas-Cowie, E.2
  • 14
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • S. Davis, and P. Mermelstein Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences IEEE Trans. Audio Speech Language Process. 28 1980 357 366 (Pubitemid 11464930)
    • (1980) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis Steven, B.1    Mermelstein Paul2
  • 15
    • 0035097825 scopus 로고    scopus 로고
    • Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex
    • D. Depireux, J. Simon, D. Klein, and S. Shamma Spectro-temporal response field characterization with dynamic ripples in ferret primary auditory cortex J. Neurophysiol. 85 2001 1220 1234 (Pubitemid 32209608)
    • (2001) Journal of Neurophysiology , vol.85 , Issue.3 , pp. 1220-1234
    • Depireux, D.A.1    Simon, J.Z.2    Klein, D.J.3    Shamma, S.A.4
  • 17
    • 34548283842 scopus 로고    scopus 로고
    • John Wiley New York
    • P. Ekman Basic Emotions 1999 John Wiley New York pp. 45-60
    • (1999) Basic Emotions
    • Ekman, P.1
  • 18
    • 0033813112 scopus 로고    scopus 로고
    • Characterizing frequency selectivity for envelope fluctuations
    • S. Ewert, and T. Dau Characterizing frequency selectivity for envelope fluctuations J. Acoust. Soc. Amer. 108 2000 1181 1196
    • (2000) J. Acoust. Soc. Amer. , vol.108 , pp. 1181-1196
    • Ewert, S.1    Dau, T.2
  • 20
    • 70449360175 scopus 로고    scopus 로고
    • Modulation spectral features for robust far-field speaker identification
    • T.H. Falk, and W.-Y. Chan Modulation spectral features for robust far-field speaker identification IEEE Trans. Audio Speech Language Process. 18 2010 90 100
    • (2010) IEEE Trans. Audio Speech Language Process. , vol.18 , pp. 90-100
    • Falk, T.H.1    Chan, W.-Y.2
  • 21
    • 77949423782 scopus 로고    scopus 로고
    • Temporal dynamics for blind measurement of room acoustical parameters
    • T.H. Falk, and W.-Y. Chan Temporal dynamics for blind measurement of room acoustical parameters IEEE Trans. Instrum. Meas. 59 2010 978 989
    • (2010) IEEE Trans. Instrum. Meas. , vol.59 , pp. 978-989
    • Falk, T.H.1    Chan, W.-Y.2
  • 22
  • 24
    • 0025110885 scopus 로고
    • Derivation of auditory filter shapes from notched-noise data
    • DOI 10.1016/0378-5955(90)90170-T
    • B. Glasberg, and B. Moore Derivation of auditory filter shapes from notched-noise data Hearing Res. 47 1990 103 138 (Pubitemid 20244652)
    • (1990) Hearing Research , vol.47 , Issue.1-2 , pp. 103-138
    • Glasberg, B.R.1    Moore, B.C.J.2
  • 25
    • 34547940048 scopus 로고    scopus 로고
    • Primitives-based evaluation and estimation of emotions in speech
    • DOI 10.1016/j.specom.2007.01.010, PII S0167639307000040
    • M. Grimm, K. Kroschel, E. Mower, and S. Narayanan Primitives-based evaluation and estimation of emotions in speech Speech Commun. 49 2007 787 800 (Pubitemid 47268568)
    • (2007) Speech Communication , vol.49 , Issue.10-11 , pp. 787-800
    • Grimm, M.1    Kroschel, K.2    Mower, E.3    Narayanan, S.4
  • 28
    • 34547951152 scopus 로고    scopus 로고
    • Bi-modal emotion recognition from expressive face and body gestures
    • DOI 10.1016/j.jnca.2006.09.007, PII S1084804506000774
    • H. Gunes, and M. Piccard Bi-modal emotion recognition from expressive face and body gestures J. Network Comput. Appl. 30 2007 1334 1345 (Pubitemid 47263518)
    • (2007) Journal of Network and Computer Applications , vol.30 , Issue.4 , pp. 1334-1345
    • Gunes, H.1    Piccardi, M.2
  • 29
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • DOI 10.1121/1.399423
    • H. Hermansky Perceptual linear predictive (PLP) analysis of speech J. Acoust. Soc. Amer. 87 1990 1738 1752 (Pubitemid 20256470)
    • (1990) Journal of the Acoustical Society of America , vol.87 , Issue.4 , pp. 1738-1752
    • Hermansky, H.1
  • 30
    • 4944228528 scopus 로고    scopus 로고
    • A practical guide to support vector classification
    • Department of Computer Science, National Taiwan University
    • Hsu, C.-C., Chang, C.-C., Lin, C.-J., 2007. A practical guide to support vector classification. Tech. rep., Department of Computer Science, National Taiwan University.
    • (2007) Tech. Rep.
    • Hsu, C.-C.1    Chang, C.-C.2    Lin, C.-J.3
  • 33
    • 80052689797 scopus 로고    scopus 로고
    • Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech
    • C. Ishi, H. Ishiguro, and N. Hagita Analysis of the roles and the dynamics of breathy and whispery voice qualities in dialogue speech EURASIP J. Audio Speech Music Process. 2010 article ID 528193, 12 pages
    • (2010) EURASIP J. Audio Speech Music Process
    • Ishi, C.1    Ishiguro, H.2    Hagita, N.3
  • 35
    • 0032676337 scopus 로고    scopus 로고
    • On the relative importance of various components of the modulation spectrum for automatic speech recognition
    • N. Kanederaa, T. Araib, H. Hermanskyc, and M. Pavel On the relative importance of various components of the modulation spectrum for automatic speech recognition Speech Commun. 28 1999 43 55
    • (1999) Speech Commun. , vol.28 , pp. 43-55
    • Kanederaa, N.1    Araib, T.2    Hermanskyc, H.3    Pavel, M.4
  • 38
    • 51449108623 scopus 로고    scopus 로고
    • Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters
    • Lugger, M., Yang, B., 2008. Cascaded emotion classification via psychological emotion dimensions using a large set of voice quality parameters. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, Vol. 4, pp. 4945-4948.
    • (2008) Proc. Internat. Conf. on Acoustics, Speech and Signal Processing , vol.4 , pp. 4945-4948
    • Lugger, M.1    Yang, B.2
  • 41
    • 0242721417 scopus 로고    scopus 로고
    • Speech emotion recognition using hidden markov models
    • T. Nwe, S. Foo, and L. De Silva Speech emotion recognition using hidden markov models Speech Commun. 41 2003 603 623
    • (2003) Speech Commun. , vol.41 , pp. 603-623
    • Nwe, T.1    Foo, S.2    De Silva, L.3
  • 44
    • 0037384712 scopus 로고    scopus 로고
    • Vocal communication of emotion: A review of research paradigms
    • K. Scherer Vocal communication of emotion: A review of research paradigms Speech Commun. 40 2003 227 256
    • (2003) Speech Commun. , vol.40 , pp. 227-256
    • Scherer, K.1
  • 49
    • 33947164164 scopus 로고    scopus 로고
    • An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech
    • DOI 10.1016/j.specom.2007.01.006, PII S016763930700009X
    • M. Shami, and W. Verhelst An evaluation of the robustness of existing supervised machine learning approaches to the classification of emotions in speech Speech Commun. 49 2007 201 212 (Pubitemid 46413361)
    • (2007) Speech Communication , vol.49 , Issue.3 , pp. 201-212
    • Shami, M.1    Verhelst, W.2
  • 50
    • 0035425442 scopus 로고    scopus 로고
    • On the role of space and time in auditory processing
    • DOI 10.1016/S1364-6613(00)01704-6, PII S1364661300017046
    • S. Shamma On the role of space and time in auditory processing Trends Cogn. Sci. 5 2001 340 348 (Pubitemid 32703803)
    • (2001) Trends in Cognitive Sciences , vol.5 , Issue.8 , pp. 340-348
    • Shamma, S.1
  • 51
    • 0041941417 scopus 로고    scopus 로고
    • Encoding sound timbre in the auditory system
    • S. Shamma Encoding sound timbre in the auditory system IETE J. Res. 49 2003 193 205
    • (2003) IETE J. Res. , vol.49 , pp. 193-205
    • Shamma, S.1
  • 52
    • 0003913694 scopus 로고
    • An efficient implementation of the Patterson-Holdsworth auditory filterbank
    • Apple Computer, Perception Group
    • Slaney, M., 1993. An efficient implementation of the Patterson-Holdsworth auditory filterbank. Tech. rep., Apple Computer, Perception Group.
    • (1993) Tech. Rep.
    • Slaney, M.1
  • 53
    • 70349199919 scopus 로고    scopus 로고
    • Investigating glottal parameters for differentiating emotional categories with similar prosodics
    • Sun, R., E., M., Torres, J., 2009. Investigating glottal parameters for differentiating emotional categories with similar prosodics. In: Proc. Internat. Conf. on Acoustics, Speech and Signal Processing, pp. 4509-4512.
    • (2009) Proc. Internat. Conf. on Acoustics, Speech and Signal Processing , pp. 4509-4512
    • Sun, R.M.E.1    Torres, J.2
  • 56
    • 33746410556 scopus 로고    scopus 로고
    • Emotional speech recognition: Resources, features, and methods
    • DOI 10.1016/j.specom.2006.04.003, PII S0167639306000422
    • D. Ververidis, and C. Kotropoulos Emotional speech recognition: Resources, features, and methods Speech Commun. 48 2006 1162 1181 (Pubitemid 44128615)
    • (2006) Speech Communication , vol.48 , Issue.9 , pp. 1162-1181
    • Ververidis, D.1    Kotropoulos, C.2
  • 57
    • 56149115138 scopus 로고    scopus 로고
    • Combining frame and turn-level information for robust recognition of emotions within speech
    • Vlasenko, B., Schuller, B., Wendemuth, A., Rigoll, G., 2007. Combining frame and turn-level information for robust recognition of emotions within speech. In: Proc. Interspeech, pp. 2225-2228.
    • (2007) Proc. Interspeech , pp. 2225-2228
    • Vlasenko, B.1    Schuller, B.2    Wendemuth, A.3    Rigoll, G.4
  • 58
    • 84862156369 scopus 로고    scopus 로고
    • Abandoning emotion classes - Towards continuous emotion recognition with modelling of long-range dependencies
    • Wollmer, M., Eyben, F., Reiter, S., Schuller, B., Cox, C., Douglas-Cowie, E., Cowie, R., 2008. Abandoning emotion classes - Towards continuous emotion recognition with modelling of long-range dependencies. In: Proc. Interspeech, pp. 597-600.
    • (2008) Proc. Interspeech , pp. 597-600
    • Wollmer, M.1    Eyben, F.2    Reiter, S.3    Schuller, B.4    Cox, C.5    Douglas-Cowie, E.6    Cowie, R.7
  • 60
    • 0035278948 scopus 로고    scopus 로고
    • Nonlinear feature based classification of speech under stress
    • DOI 10.1109/89.905995, PII S1063667601013232
    • G. Zhou, J. Hansen, and J. Kaiser Nonlinear feature based classification of speech under stress IEEE Trans. Audio Speech Language Process. 9 2001 201 216 (Pubitemid 32286594)
    • (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.3 , pp. 201-216
    • Zhou, G.1    Hansen, J.H.L.2    Kaiser, J.F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.