SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 2453-2457

Deep bidirectional LSTM modeling of timbre and prosody for emotional voice conversion

(6) Ming, Huaiping a Huang, Dongyan a Xie, Lei b Wu, Jie b Dong, Minghui a Li, Haizhou a

a INSTITUTE FOR INFOCOMM RESEARCH (Singapore)

b NORTHWESTERN POLYTECHNICAL UNIVERSITY (China)

Author keywords

Long short term memory; Prosody; Recurrent neural networks; Voice conversion

Indexed keywords

BRAIN; RECURRENT NEURAL NETWORKS; SPEECH COMMUNICATION; WAVELET TRANSFORMS;

CONTINUOUS WAVELET TRANSFORMS; CONVERSION METHODS; EMOTIONAL VOICES; ENERGY CONTOURS; LONG SHORT TERM MEMORY; PROSODY; PROSODY FEATURES; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84994251909 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-1053 Document Type: Conference Paper

Times cited : (92)

References (30)

1
- 0038370976
- Facial and vocal expressions of emotion
- J. A. Russell, J. A. Bachorowski and J. M. Fernández-Dols, (2003). Facial and vocal expressions of emotion. Annual review of psychology, 54(1), 329-349.
- (2003) Annual Review of Psychology , vol.54 , Issue.1 , pp. 329-349
- Russell, J.A.¹ Bachorowski, J.A.² Fernández-Dols, J.M.³

2
- 0003959340
- Cambridge: MIT press
- R.W. Picard and R. Picard, (1997). Affective computing (Vol. 252). Cambridge: MIT press.
- (1997) Affective Computing , vol.252
- Picard, R.W.¹ Picard, R.²

3
- 70350603282
- 41 Madison Avenue, New York, NY 10010
- D. L. Schacter, (2011). Psychology Second Edition, 41 Madison Avenue, New York, NY 10010.
- (2011) Psychology Second Edition
- Schacter, D.L.¹

4
- 0037384712
- Vocal communication of emotion: A review of research paradigms
- K. R. Scherer, (2003). Vocal communication of emotion: A review of research paradigms. Speech communication, 40(1), 227-256.
- (2003) Speech Communication , vol.40 , Issue.1 , pp. 227-256
- Scherer, K.R.¹

5
- 0003665661
- Cambridge University Press
- D. Hirst, and A. Di Cristo, (1998). Intonation systems: a survey of twenty languages. Cambridge University Press.
- (1998) Intonation Systems: A Survey of Twenty Languages
- Hirst, D.¹ Di Cristo, A.²

6
- 34047263010
- Prosody conversion from neutral speech to emotional speech
- J. Tao, Y. Kang and A. Li, (2006). Prosody conversion from neutral speech to emotional speech. IEEE Transactions on Audio, Speech, and Language Processing, 14(4), 1145-1154.
- (2006) IEEE Transactions on Audio, Speech, and Language Processing , vol.14 , Issue.4 , pp. 1145-1154
- Tao, J.¹ Kang, Y.² Li, A.³

7
- 84938935270
- A system for transform- ing the emotion in speech: Combining data-driven conversion tech- niques for prosody and voice quality
- August
- Z. Inanoglu and S. Young, (2007, August). A system for transform- ing the emotion in speech: combining data-driven conversion tech- niques for prosody and voice quality. In INTERSPEECH (pp. 490- 493).
- (2007) INTERSPEECH , pp. 490-493
- Inanoglu, Z.¹ Young, S.²

8
- 84890451203
- GMM- based emotional voice conversion using spectrum and prosody fea- tures
- R. Aihara, R. Takashima, T. Takiguchi and Y. Ariki, (2012). GMM- based emotional voice conversion using spectrum and prosody fea- tures. In American Journal of Signal Processing, 2(5), 134-138.
- (2012) American Journal of Signal Processing , vol.2 , Issue.5 , pp. 134-138
- Aihara, R.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

9
- 84949924136
- Exemplar-based emotional voice conversion using non-negative matrix factorization
- December IEEE
- R. Aihara, R. Ueda, T. Takiguchi and Y. Ariki, (2014, December). Exemplar-based emotional voice conversion using non-negative matrix factorization. In Asia-Pacific Signal and Information Pro- cessing Association, 2014 Annual Summit and Conference (APSI- PA) (pp. 1-7). IEEE.
- (2014) Asia-Pacific Signal and Information Pro- Cessing Association, 2014 Annual Summit and Conference (APSI- PA) , pp. 1-7
- Aihara, R.¹ Ueda, R.² Takiguchi, T.³ Ariki, Y.⁴

10
- 84964010208
- Fundamental frequency modeling using wavelets for emo- tional voice conversion
- H. Ming, D. Huang, L. Xie, S. Zhang, M. Dong and H. Li, (2015). Fundamental frequency modeling using wavelets for emo- tional voice conversion. In 6th Affective Computing and Intelligent Interaction (ACII)Workshop on Affective Social Multimedia Com- puting.
- (2015) 6th Affective Computing and Intelligent Interaction (ACII)Workshop on Affective Social Multimedia Com- Puting
- Ming, H.¹ Huang, D.² Xie, L.³ Zhang, S.⁴ Dong, M.⁵ Li, H.⁶

11
- 84876502441
- Review of F0 modelling and generation in HMM based speech synthesis
- October
- K. Yu, (2012, October). Review of F0 modelling and generation in HMM based speech synthesis. In IEEE 11th International Con- ference on Signal Processing (ICSP), (Vol. 1, pp. 599-604).
- (2012) IEEE 11th International Con- Ference on Signal Processing (ICSP) , vol.1 , pp. 599-604
- Yu, K.¹

12
- 77955722263
- Hier- archical prosody conversion using regression-based clustering for emotional speech synthesis
- C. H. Wu, C. C. Hsia, C. H. Lee and M. C. Lin, (2010). Hier- archical prosody conversion using regression-based clustering for emotional speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 18(6), 1394-1405.
- (2010) IEEE Transactions on Audio, Speech, and Language Processing , vol.18 , Issue.6 , pp. 1394-1405
- Wu, C.H.¹ Hsia, C.C.² Lee, C.H.³ Lin, M.C.⁴

13
- 0344593979
- Oxford University Press
- A. Wennerstrom, (2001). The music of everyday speech: Prosody and discourse analysis. Oxford University Press.
- (2001) The Music of Everyday Speech: Prosody and Discourse Analysis
- Wennerstrom, A.¹

14
- 84864409462
- Speech prosody: A methodological review
- Y. Xu, (2011). Speech prosody: A methodological review. Journal of Speech Sciences, 1(1), 85-115.
- (2011) Journal of Speech Sciences , vol.1 , Issue.1 , pp. 85-115
- Xu, Y.¹

15
- 85065690468
- Alternatives to the sonor- ity hierarchy for explaining segmental sequential constraints
- J. J. O. H. Kawasaki-Fukumori, (1997). Alternatives to the sonor- ity hierarchy for explaining segmental sequential constraints. In Language and its ecology: Essays in memory of Einar Haugen, 100, 343.
- (1997) Language and Its Ecology: Essays in Memory of Einar Haugen , vol.100 , pp. 343
- Kawasaki-Fukumori, J.J.O.H.¹

16
- 84946044619
- A multi-level rep- resentation of f0 using the continuous wavelet transform and the discrete cosine transform
- April
- M. S. Ribeiro and R. A.Clark (2015, April). A multi-level rep- resentation of f0 using the continuous wavelet transform and the discrete cosine transform. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 4909- 4913).
- (2015) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4909-4913
- Ribeiro, M.S.¹ Clark, R.A.²

17
- 84867194192
- Multilevel parametric-base F0 model for speech synthesis
- September
- J. Latorre and M. Akamine, (2008, September). Multilevel parametric-base F0 model for speech synthesis. In INTERSPEECH (pp. 2274-2277).
- (2008) INTERSPEECH , pp. 2274-2277
- Latorre, J.¹ Akamine, M.²

18
- 85008039410
- Improved prosody generation by maximizing joint probability of state and longer units
- Y. Qian, Z. Wu, B. Gao and F. K. Soong, F. K. (2011). Improved prosody generation by maximizing joint probability of state and longer units. IEEE Transactions on Audio, Speech, and Language Processing, 19(6), 1702-1710.
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.6 , pp. 1702-1710
- Qian, Y.¹ Wu, Z.² Gao, B.³ Soong, F.K.⁴

19
- 84865714286
- Stylization and trajectory modelling of short and long term speech prosody variations
- August
- N. Obin, A. Lacheret and X. Rodet, (2011, August). Stylization and trajectory modelling of short and long term speech prosody variations. In INTERSPEECH.
- (2011) INTERSPEECH
- Obin, N.¹ Lacheret, A.² Rodet, X.³

20
- 84910068272
- Continuous wavelet transform for analysis of speech prosody
- M. Vainio, A. Suni and D. Aalto, (2013). Continuous wavelet transform for analysis of speech prosody. In Tools and Resources for the Analysys of Speech Prosody, an INTERSPEECH 2013 satellite event.
- (2013) Tools and Resources for the Analysys of Speech Prosody, An INTERSPEECH 2013 Satellite Event
- Vainio, M.¹ Suni, A.² Aalto, D.³

21
- 84946045633
- Wavelets for intonation modeling in HMM speech synthesis
- A. S. Suni, D. Aalto, T. Raitio, P. Alku and M. Vainio, (2013). Wavelets for intonation modeling in HMM speech synthesis. In 8th ISCA Workshop on Speech Synthesis, Proceedings, Barcelona.
- (2013) 8th ISCA Workshop on Speech Synthesis, Proceedings, Barcelona
- Suni, A.S.¹ Aalto, D.² Raitio, T.³ Alku, P.⁴ Vainio, M.⁵

22
- 84973338474
- Exemplar-based sparse representation of timbre and prosody for voice conversion
- H. Ming, D. Huang, L. Xie, S. Zhang, M. Dong and H. Li, (2016). Exemplar-based Sparse Representation of Timbre and Prosody for Voice Conversion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
- (2016) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Ming, H.¹ Huang, D.² Xie, L.³ Zhang, S.⁴ Dong, M.⁵ Li, H.⁶

23
- 0035505385
- LSTM recurrent network- s learn simple context-free and context-sensitive languages
- F. A. Gers, and J. Schmidhuber, (2001). LSTM recurrent network- s learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 12(6), 1333-1340.
- (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.6 , pp. 1333-1340
- Gers, F.A.¹ Schmidhuber, J.²

24
- 84910046405
- Long short-term memory recurrent neural network architectures for large vocabulary speech recognition
- September
- H. Sak, A. W. Senior, and F. Beaufays, (2014, September). Long short-term memory recurrent neural network architectures for large vocabulary speech recognition. In INTERSPEECH (pp. 338-342).
- (2014) INTERSPEECH , pp. 338-342
- Sak, H.¹ Senior, A.W.² Beaufays, F.³

25
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural net- works
- September
- Y. Fan, Y. Qian, F. L., Xie and F. K. Soong, (2014, September). TTS synthesis with bidirectional LSTM based recurrent neural net- works. In INTERSPEECH (pp. 1964-1968).
- (2014) INTERSPEECH , pp. 1964-1968
- Fan, Y.¹ Qian, Y.² Xie, F.L.³ Soong, F.K.⁴

26
- 84946027999
- Voice con- version using deep bidirectional long short-term memory based recurrent neural networks
- April
- L. Sun, S. Kang, K. Li and H. Meng, (2015, April). Voice con- version using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 4869- 4873).
- (2015) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4869-4873
- Sun, L.¹ Kang, S.² Li, K.³ Meng, H.⁴

27
- 51449108867
- TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and appli- cations to interference-free spectrum, F0, and aperiodicity estima- tion
- March
- H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino and H. Banno, (2008, March). TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and appli- cations to interference-free spectrum, F0, and aperiodicity estima- tion. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), (pp. 3933-3936).
- (2008) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 3933-3936
- Kawahara, H.¹ Morise, M.² Takahashi, T.³ Nisimura, R.⁴ Irino, T.⁵ Banno, H.⁶

28
- 85090475413
- The CMU Arctic speech databases
- J. Kominek and A. W. Black, (2004). The CMU Arctic speech databases. In Fifth ISCA Workshop on Speech Synthesis.
- (2004) Fifth ISCA Workshop on Speech Synthesis
- Kominek, J.¹ Black, A.W.²

29
- 84949924808
- Emotional facial expression transfer based on tempo- ral restricted Boltzmann machines
- December
- S. Liu, D. Y. Huang,W. Lin, M. Dong, H. Li and E. P. Ong, (2014, December). Emotional facial expression transfer based on tempo- ral restricted Boltzmann machines. In Asia-Pacific Signal and In- formation Processing Association (APSIPA).
- (2014) Asia-Pacific Signal and In- Formation Processing Association (APSIPA)
- Liu, S.¹ Huang, W.² Lin, D.Y.³ Dong, M.⁴ Li, H.⁵ Ong, E.P.⁶

30
- 84976226316
- F. Eyben, J. Bergmann, and F. Weninger. CURRENNT CUDA- enabled Machine Learning Library For Recurrent Neural Network- s. https://sourceforge.net/projects/currennt/
- CURRENNT CUDA- Enabled Machine Learning Library for Recurrent Neural Network- S
- Eyben, F.¹ Bergmann, J.² Weninger, F.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.