SCOPUS 정보 검색 플랫폼

Neural Computing and Applications

Volumn 21, Issue 7, 2012, Pages 1765-1773

Using DTW neural-based MFCC warping to improve emotional speech recognition

(3) Sheikhan, Mansour a Gharavian, Davood b Ashoftedel, Farhad a

a ISLAMIC AZAD UNIVERSITY (Iran)

b SHAHID BEHESHTI UNIVERSITY (Iran)

Author keywords

Dynamic time warping; Emotion; Frequency warping; Neural network; Speech recognition

Indexed keywords

AUTOMATIC SPEECH RECOGNITION SYSTEM; CALCULATION PROCESS; COMBINED STRUCTURE; DYNAMIC TIME WARPING; EMOTION; EMOTIONAL SPEECH; EMOTIONAL SPEECH RECOGNITION; EMOTIONAL STATE; FREQUENCY RANGES; FREQUENCY WARPING; MEL-FREQUENCY CEPSTRAL COEFFICIENTS; MULTI LAYER PERCEPTRON; RECOGNITION RATES; WARPING FACTORS;

DISCRETE COSINE TRANSFORMS; FILTER BANKS; HIDDEN MARKOV MODELS; NEURAL NETWORKS; SPEECH ANALYSIS;

SPEECH RECOGNITION;

EID: 84866447267 PISSN: 09410643 EISSN: None Source Type: Journal
DOI: 10.1007/s00521-011-0620-8 Document Type: Article

Times cited : (40)

References (34)

1
- 0033335618
- Modeling pronunciation variation for ASR: a survey of the literature
- Strik H, Cucchiarini C (1999) Modeling pronunciation variation for ASR: a survey of the literature. Speech Commun 29: 225-246.
- (1999) Speech Commun , vol.29 , pp. 225-246
- Strik, H.¹ Cucchiarini, C.²

2
- 70449569295
- Heading toward to the natural way of human-machine interaction: the NIMITEK project
- Vlasenko B, Wendemuth A (2009) Heading toward to the natural way of human-machine interaction: the NIMITEK project. Proceedings of IEEE international conference on multimedia and expo, pp 950-953.
- (2009) Proceedings of IEEE international conference on multimedia and expo , pp. 950-953
- Vlasenko, B.¹ Wendemuth, A.²

3
- 70349205575
- Emotional speech recognition based on style estimation and adaptation with multiple-regression HMM
- Ijima Y, Tachibana M, Nose T, Kobayashi T (2009) Emotional speech recognition based on style estimation and adaptation with multiple-regression HMM. Proceedings of IEEE international conference on acoustic, speech and signal processing, pp 4157-4160.
- (2009) Proceedings of IEEE International Conference On Acoustic, Speech and Signal Processing , pp. 4157-4160
- Ijima, Y.¹ Tachibana, M.² Nose, T.³ Kobayashi, T.⁴

4
- 33746410556
- Emotional speech recognition: resources, features, and methods
- Ververidis D, Kotropoulos C (2006) Emotional speech recognition: resources, features, and methods. Speech Commun 48: 1162-1181.
- (2006) Speech Commun , vol.48 , pp. 1162-1181
- Ververidis, D.¹ Kotropoulos, C.²

5
- 34547549142
- Towards more reality in the recognition of emotional speech
- Schuller B, Seppi D, Batliner A, Maier A, Steidl S (2007) Towards more reality in the recognition of emotional speech. Proceedings of IEEE international conference on acoustic, speech and signal processing, vol 4, pp 941-944.
- (2007) Proceedings of IEEE international conference on acoustic, speech and signal processing , vol.4 , pp. 941-944
- Schuller, B.¹ Seppi, D.² Batliner, A.³ Maier, A.⁴ Steidl, S.⁵

6
- 70349193703
- Emotion recognition from speech: putting ASR in the loop
- Schuller B, Batliner A, Steidl S, Seppi D (2009) Emotion recognition from speech: putting ASR in the loop. Proceedings of IEEE international conference on acoustic, speech and signal processing, pp 4585-4588.
- (2009) Proceedings of IEEE international conference on acoustic, speech and signal processing , pp. 4585-4588
- Schuller, B.¹ Batliner, A.² Steidl, S.³ Seppi, D.⁴

7
- 78149491290
- Comparing multiple classifiers for speech-based detection of self-confidence-A pilot study
- Krajewski J, Batliner A, Kessel S (2010) Comparing multiple classifiers for speech-based detection of self-confidence-A pilot study. Proceedings international conference on pattern recognition, pp 3716-3719.
- (2010) Proceedings international conference on pattern recognition , pp. 3716-3719
- Krajewski, J.¹ Batliner, A.² Kessel, S.³

8
- 21544466181
- ASR for emotional speech: clarifying the issues and enhancing performance
- Athanaselis T, Bakamidis S, Dologlou I, Cowie R, Douglas-Cowie E, Cox C (2005) ASR for emotional speech: clarifying the issues and enhancing performance. J Neural Netw 18: 437-444.
- (2005) J Neural Netw , vol.18 , pp. 437-444
- Athanaselis, T.¹ Bakamidis, S.² Dologlou, I.³ Cowie, R.⁴ Douglas-Cowie, E.⁵ Cox, C.⁶

9
- 85029930138
- Predicting automatic speech recognition performance using prosodic cues
- Litman DJ, Hirschberg JB, Swerts M (2000) Predicting automatic speech recognition performance using prosodic cues. Proceedings North American chapter of the association for computational linguistics conference, pp 218-225.
- (2000) Proceedings North American chapter of the association for computational linguistics conference , pp. 218-225
- Litman, D.J.¹ Hirschberg, J.B.² Swerts, M.³

10
- 84877483026
- Speech under stress conditions: overview of the effect of speech production and on system performance
- Steeneken HJM, Hansen JHL (1999) Speech under stress conditions: overview of the effect of speech production and on system performance. Proceedings of IEEE international conference on acoustic, speech and signal processing, vol 4, pp 2079-2082.
- (1999) Proceedings of IEEE international conference on acoustic, speech and signal processing , vol.4 , pp. 2079-2082
- Steeneken, H.J.M.¹ Hansen, J.H.L.²

11
- 34547941599
- Automatic speech recognition and speech variability: a review
- Benzeghiba M, De Mori R, Deroo O, Dupont S, Erbes T, Jouvet D, Fissore L, Laface P, Mertins A, Ris C, Rose R, Tyagi V, Wellekens C (2007) Automatic speech recognition and speech variability: a review. Speech Commun 49: 763-786.
- (2007) Speech Commun , vol.49 , pp. 763-786
- Benzeghiba, M.¹ de Mori, R.² Deroo, O.³ Dupont, S.⁴ Erbes, T.⁵ Jouvet, D.⁶ Fissore, L.⁷ Laface, P.⁸ Mertins, A.⁹ Ris, C.¹⁰ Rose, R.¹¹ Tyagi, V.¹² Wellekens, C.¹³

12
- 36248972553
- Berlin: Springer
- Hansen JH, Patil S (2007) Speech under stress: analysis, modeling and recognition. Springer, Berlin, pp 108-137.
- (2007) Speech under Stress: Analysis, Modeling and Recognition , pp. 108-137
- Hansen, J.H.¹ Patil, S.²

13
- 42449148217
- Ph. D. Dissertation, Electrical Engineering Department, Amirkabir University of Technology, Tehran
- Gharavian D (2004) Prosody in Farsi language and its use in recognition of intonation and speech. Ph. D. Dissertation, Electrical Engineering Department, Amirkabir University of Technology, Tehran.
- (2004) Prosody in Farsi language and its use in recognition of intonation and speech
- Gharavian, D.¹

14
- 79960772943
- Recognition of emotional speech and speech emotion in Farsi
- Gharavian D, Ahadi SM (2006) Recognition of emotional speech and speech emotion in Farsi. Proceedings of international symposium on Chinese spoken language processing, vol 2, pp 299-308.
- (2006) Proceedings of international symposium on Chinese spoken language processing , vol.2 , pp. 299-308
- Gharavian, D.¹ Ahadi, S.M.²

15
- 79960798127
- The effect of emotion on Farsi speech parameters: a statistical evaluation
- Gharavian D, Ahadi SM (2005) The effect of emotion on Farsi speech parameters: a statistical evaluation. Proceedings of international conference on speech and computer, pp 463-466.
- (2005) Proceedings of international conference on speech and computer , pp. 463-466
- Gharavian, D.¹ Ahadi, S.M.²

16
- 84864948871
- Pitch in emotional speech and emotional speech recognition using pitch frequency
- Gharavian D, Sheikhan M, Janipour M (2010) Pitch in emotional speech and emotional speech recognition using pitch frequency. Majlesi J Electr Eng 4(1): 19-24.
- (2010) Majlesi J Electr Eng , vol.4 , Issue.1 , pp. 19-24
- Gharavian, D.¹ Sheikhan, M.² Janipour, M.³

17
- 0037382560
- Emotions, speech and the ASR framework
- Bosch LT (2003) Emotions, speech and the ASR framework. Speech Commun 40: 213-225.
- (2003) Speech Commun , vol.40 , pp. 213-225
- Bosch, L.T.¹

18
- 79955539267
- Contextual invariant-integration features for improved speaker-independent speech recognition
- doi: 10. 1016/j. specom. 2011. 02. 002 Article in Press
- Müller F, Mertins A (2011) Contextual invariant-integration features for improved speaker-independent speech recognition. Speech Commun. doi: 10. 1016/j. specom. 2011. 02. 002 Article in Press.
- (2011) Speech Commun

19
- 0036753897
- Speaker adaptive modeling by vocal tract normalization
- Welling L, Ney H, Kanthak S (2002) Speaker adaptive modeling by vocal tract normalization. IEEE Trans Speech Audio Process 10: 415-426.
- (2002) IEEE Trans Speech Audio Process , vol.10 , pp. 415-426
- Welling, L.¹ Ney, H.² Kanthak, S.³

20
- 0036293694
- Non-uniform scaling based speaker normalization
- Sinha R, Umesh S (2002) Non-uniform scaling based speaker normalization. Proceedings of IEEE international conference on acoustic, speech and signal processing, vol 1, pp 589-592.
- (2002) Proceedings of IEEE International Conference On Acoustic, Speech and Signal Processing , vol.1 , pp. 589-592
- Sinha, R.¹ Umesh, S.²

21
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- Gales MJF (1998) Maximum likelihood linear transformations for HMM-based speech recognition. Comput Speech Lang 12: 75-98.
- (1998) Comput Speech Lang , vol.12 , pp. 75-98
- Gales, M.J.F.¹

22
- 3042820894
- Automatic recognition of spontaneous speech for access to multilingual oral history archives
- Byrne W, Doermann D, Franz M, Gustman S, Hajič J, Oard D, Picheny M, Psutka J, Ramabhadran B, Soergel D, Ward T, Zhu W-J (2004) Automatic recognition of spontaneous speech for access to multilingual oral history archives. IEEE Trans Speech Audio Process 12: 420-435.
- (2004) IEEE Trans Speech Audio Process , vol.12 , pp. 420-435
- Byrne, W.¹ Doermann, D.² Franz, M.³ Gustman, S.⁴ Hajič, J.⁵ Oard, D.⁶ Picheny, M.⁷ Psutka, J.⁸ Ramabhadran, B.⁹ Soergel, D.¹⁰ Ward, T.¹¹ Zhu, W.-J.¹²

23
- 85016587886
- SWITCHBOARD: Telephone speech corpus for research and development
- Godfrey J, Holliman E, McDaniel J (1992) SWITCHBOARD: telephone speech corpus for research and development. Proceedings of IEEE international conference on acoustic, speech and signal processing, pp 517-520.
- (1992) Proceedings of IEEE International Conference On Acoustic, Speech and Signal Processing , pp. 517-520
- Godfrey, J.¹ Holliman, E.² McDaniel, J.³

24
- 51749100264
- Emotion-detecting based model selection for emotional speech recognition
- Pan YC, Xu MX, Liu LQ, Jia PF (2006) Emotion-detecting based model selection for emotional speech recognition. Proceedings of multiconference on computational engineering in system applications, pp 2169-2172.
- (2006) Proceedings of Multiconference On Computational Engineering In System Applications , pp. 2169-2172
- Pan, Y.C.¹ Xu, M.X.² Liu, L.Q.³ Jia, P.F.⁴

25
- 60349106688
- Combined speech-emotion recognition for spoken human-computer interfaces
- Meng H, Pittermann J, Pittermann A, Minker W (2007) Combined speech-emotion recognition for spoken human-computer interfaces. Proceedings IEEE international conference on signal processing and communications, pp 1179-1182.
- (2007) Proceedings IEEE International Conference On Signal Processing and Communications , pp. 1179-1182
- Meng, H.¹ Pittermann, J.² Pittermann, A.³ Minker, W.⁴

26
- 77949626292
- Acoustic feature optimization for emotion affected speech recognition
- doi: 10. 1109/ICIECS. 2009. 5365821
- Sun Y, Zhou Y, Zhao Q, Yan Y (2009) Acoustic feature optimization for emotion affected speech recognition. Proceedings of international conference on information engineering and computer science, pp 1-4. doi: 10. 1109/ICIECS. 2009. 5365821.
- (2009) Proceedings of International Conference On Information Engineering and Computer Science , pp. 1-4
- Sun, Y.¹ Zhou, Y.² Zhao, Q.³ Yan, Y.⁴

27
- 34247210668
- Theoretical complex cepstrum of DCT and warped DCT filters
- Muralishankar R, Sangwan A, O'Shaughnessy D (2007) Theoretical complex cepstrum of DCT and warped DCT filters. IEEE Signal Process Lett 14: 367-370.
- (2007) IEEE Signal Process Lett , vol.14 , pp. 367-370
- Muralishankar, R.¹ Sangwan, A.² O'Shaughnessy, D.³

28
- 26844479120
- Warped discrete cosine transform-based noisy speech enhancement
- Chang J-H (2005) Warped discrete cosine transform-based noisy speech enhancement. IEEE Trans Circuits Syst II 52: 535-539.
- (2005) IEEE Trans Circuits Syst II , vol.52 , pp. 535-539
- Chang, J.-H.¹

29
- 44949157762
- Frequency warping by linear transformation of standard MFCC
- Panchapagesan S (2006) Frequency warping by linear transformation of standard MFCC. Proceedings of interspeech, pp 397-400.
- (2006) Proceedings of Interspeech , pp. 397-400
- Panchapagesan, S.¹

30
- 0141740940
- Vocal tract normalization equals linear transformation in cepstral space
- Pitz M, Molau S, Schlueter R, Ney H (2001) Vocal tract normalization equals linear transformation in cepstral space. Proceedings of European conference on speech communication and technology, pp 721-724.
- (2001) Proceedings of European Conference On Speech Communication and Technology , pp. 721-724
- Pitz, M.¹ Molau, S.² Schlueter, R.³ Ney, H.⁴

31
- 77955423547
- Fiction support for realistic portrayals of fear-type emotional manifestations
- Clavel C, Vasilescu I, Devillers L (2011) Fiction support for realistic portrayals of fear-type emotional manifestations. Comput Speech Lang 25: 63-83.
- (2011) Comput Speech Lang , vol.25 , pp. 63-83
- Clavel, C.¹ Vasilescu, I.² Devillers, L.³

32
- 33646197299
- The speech database of Farsi spoken language
- Bijankhan M, Sheikhzadegan J, Roohani MR, Samareh Y, Lucas C, Tebiani M (1994) The speech database of Farsi spoken language. Proceedings of Australian international conference on speech science and technology, pp 826-831.
- (1994) Proceedings of Australian International Conference On Speech Science and Technology , pp. 826-831
- Bijankhan, M.¹ Sheikhzadegan, J.² Roohani, M.R.³ Samareh, Y.⁴ Lucas, C.⁵ Tebiani, M.⁶

33
- 0003822743
- Cambridge University, Cambridge
- Young SJ, Evermann G, Kershaw D, Moore G, Odell J, Ollason D, Povey D, Valtchev V, Woodland V (2002) The HTK book (Ver. 3. 2). Cambridge University, Cambridge.
- (2002) The HTK Book (Ver. 3. 2) , pp. 2
- Young, S.J.¹ Evermann, G.² Kershaw, D.³ Moore, G.⁴ Odell, J.⁵ Ollason, D.⁶ Povey, D.⁷ Valtchev, V.⁸ Woodland, V.⁹

34
- 0016049328
- An Algorithm for formant extraction using linear prediction spectra
- McCandless SS (1974) An Algorithm for formant extraction using linear prediction spectra. IEEE Trans Acoustics Speech Signal Process 22: 135-141.
- (1974) IEEE Trans Acoustics Speech Signal Process , vol.22 , pp. 135-141
- McCandless, S.S.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.