SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2015-August, Issue , 2015, Pages 4869-4873

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks

(4) Sun, Lifa a Kang, Shiyin a Li, Kun a Meng, Helen a

a CHINESE UNIVERSITY OF HONG KONG (Hong Kong)

Author keywords

bidirectional long short term memory; dynamic features; recurrent neural networks; voice conversion

Indexed keywords

AUDIO SIGNAL PROCESSING; BRAIN; DEEP NEURAL NETWORKS; LONG SHORT-TERM MEMORY; RECURRENT NEURAL NETWORKS; SPEECH COMMUNICATION;

CONTEXT DEPENDENCY; CONVERSION METHODS; DYNAMIC FEATURES; MEAN OPINION SCORES; SPEECH FRAMES; SPEECH OUTPUT; TEMPORAL CORRELATIONS; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84946027999 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2015.7178896 Document Type: Conference Paper

Times cited : (306)

References (34)

1
- 84946065618
- Voice conversion
- J. Nurminen, H. Silen, and V. Popa, Voice Conversion, Speech Enhancement, Modeling and Recognition-Algorithms and Applications, pp. 69-94, 2012
- (2012) Speech Enhancement, Modeling and Recognition-Algorithms and Applications , pp. 69-94
- Nurminen, J.¹ Silen, H.² Popa, V.³

2
- 34547507542
- Frequency warping based on mapping formant parameters
- Z. W. Shuang, R. Bakis, S. Shechtman, and Y. Qin, Frequency warping based on mapping formant parameters, in Interspeech, 2006
- (2006) Interspeech
- Shuang, Z.W.¹ Bakis, R.² Shechtman, S.³ Qin, Y.⁴

3
- 56149096085
- Weighted frequency warping for voice conversion
- D. Erro and A. Moreno, Weighted frequency warping for voice conversion, in Interspeech, 2007
- (2007) Interspeech
- Erro, D.¹ Moreno, A.²

4
- 33745805403
- A fast learning algorithm for Deep Belief Nets
- G. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for Deep Belief Nets, Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.¹ Osindero, S.² Teh, Y.³

5
- 84055211743
- Acoustic modeling using Deep Belief Networks
- A. Mohamed, G. E. Dahl, and G. Hinton, Acoustic modeling using Deep Belief Networks, Audio, Speech, and Language Processing, IEEE Transactions on, vol. 20, no. 1, pp. 14-22, 2012
- (2012) Audio, Speech, and Language Processing, IEEE Transactions on , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.¹ Dahl, G.E.² Hinton, G.³

6
- 0032026483
- Continuous probabilistic transform for voice conversion
- Y. Stylianou, O. Cappé, and E. Moulines, Continuous probabilistic transform for voice conversion, IEEE Transactions on Speech and Audio Processing, vol. 6, no. 2, pp. 131-142, 1998
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

7
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A. W. Black, and K. Tokuda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

8
- 70349197691
- Voice conversion using Artificial Neural Networks
- S. Desai, E. V. Raghavendra, B. Yegnanarayana, A. W. Black, and K. Prahallad, Voice conversion using Artificial Neural Networks, in ICASSP, 2009
- (2009) ICASSP
- Desai, S.¹ Raghavendra, E.V.² Yegnanarayana, B.³ Black, A.W.⁴ Prahallad, K.⁵

9
- 84906225084
- Joint spectral distribution modeling using Restricted Boltzmann Machines for voice conversion
- L. H. Chen, Z. H. Ling, Y. Song, and L. R. Dai, Joint spectral distribution modeling using Restricted Boltzmann Machines for voice conversion, in Interspeech, 2013
- (2013) Interspeech
- Chen, L.H.¹ Ling, Z.H.² Song, Y.³ Dai, L.R.⁴

10
- 84921735339
- Voice conversion using deep neural networks with layer-wise generative training
- L. H. Chen, Z. H. Ling, L. J. Liu, and L. R. Dai, Voice conversion using Deep Neural Networks with layer-wise generative training, IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 22, no. 12, pp. 1859-1872, 2014
- (2014) IEEE/ACM Transactions on Audio, Speech and Language Processing , vol.22 , Issue.12 , pp. 1859-1872
- Chen, L.H.¹ Ling, Z.H.² Liu, L.J.³ Dai, L.R.⁴

11
- 84906280857
- Voice conversion in high-order eigen space using Deep Belief Nets
- T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, Voice conversion in high-order eigen space using Deep Belief Nets, in Interspeech, 2013
- (2013) Interspeech
- Nakashika, T.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

12
- 84910087396
- High-order sequence modeling using speaker-dependent recurrent temporal Restricted Boltzmann Machines for voice conversion
- T. Nakashika, T. Takiguchi, and Y. Ariki, High-order sequence modeling using speaker-dependent recurrent temporal Restricted Boltzmann Machines for voice conversion, in Interspeech, 2014
- (2014) Interspeech
- Nakashika, T.¹ Takiguchi, T.² Ariki, Y.³

13
- 0028392483
- Learning longterm dependencies with gradient descent is difficult
- Y. Bengio, P. Simard, and P. Frasconi, Learning longterm dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, 1994
- (1994) IEEE Transactions on Neural Networks , vol.5 , Issue.2 , pp. 157-166
- Bengio, Y.¹ Simard, P.² Frasconi, P.³

14
- 27744588611
- Framewise phoneme classification with bidirectional LSTM and other neural network architectures
- A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, vol. 18, no. 5, pp. 602-610, 2005
- (2005) Neural Networks , vol.18 , Issue.5 , pp. 602-610
- Graves, A.¹ Schmidhuber, J.²

15
- 0031573117
- Long short term memory
- S. Hochreiter and J. Schmidhuber, Long Short Term Memory, Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

16
- 0035505385
- LSTM recurrent networks learn simple context-free and context-sensitive languages
- F. A. Gers and J. Schmidhuber, LSTM recurrent networks learn simple context-free and context-sensitive languages, IEEE Transactions on Neural Networks, vol. 12, no. 6, pp. 1333-1340, 2001
- (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.6 , pp. 1333-1340
- Gers, F.A.¹ Schmidhuber, J.²

17
- 84893701254
- Hybrid speech recognition with deep bidirectional LSTM
- A. Graves, N. Jaitly, and A. R. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in ASRU, 2013
- (2013) ASRU
- Graves, A.¹ Jaitly, N.² Mohamed, A.R.³

18
- 84908677215
- arXiv preprint arXiv: 1402.1128
- H. Sak, A. Senior, and F. Beaufays, Long Short-Term Memory based Recurrent Neural Network architectures for large vocabulary speech recognition, arXiv preprint arXiv: 1402.1128, 2014
- (2014) Long Short-Term Memory Based Recurrent Neural Network Architectures for Large Vocabulary Speech Recognition
- Sak, H.¹ Senior, A.² Beaufays, F.³

19
- 84890489927
- Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly nonstationary noise
- M. Wollmer, Z. X. Zhang, F. Weninger, B. Schuller, and G. Rigoll, Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly nonstationary noise, in ICASSP, 2013
- (2013) ICASSP
- Wollmer, M.¹ Zhang, Z.X.² Weninger, F.³ Schuller, B.⁴ Rigoll, G.⁵

20
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Y. C. Fan, Y. Qian, F. L. Xie, and F. K. Soong, TTS synthesis with bidirectional LSTM based Recurrent Neural Networks, in Interspeech, 2014
- (2014) Interspeech
- Fan, Y.C.¹ Qian, Y.² Xie, F.L.³ Soong, F.K.⁴

21
- 0031268931
- Bidirectional recurrent neural networks
- M. Schuster and K. K. Paliwal, Bidirectional Recurrent Neural Networks, IEEE Transactions on Signal Processing, vol. 45, no. 11, pp. 2673-2681, 1997
- (1997) IEEE Transactions on Signal Processing , vol.45 , Issue.11 , pp. 2673-2681
- Schuster, M.¹ Paliwal, K.K.²

22
- 0034293152
- Learning to forget: Continual prediction with LSTM
- F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, Neural computation, vol. 12, no. 10, pp. 2451-2471, 2000
- (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
- Gers, F.A.¹ Schmidhuber, J.² Cummins, F.³

23
- 84890543083
- Speech recognition with deep Recurrent Neural Networks
- A. Graves, A. R. Mohamed, and G. E. Hinton, Speech recognition with deep Recurrent Neural Networks, in ICASSP, 2013, pp. 6645-6649
- (2013) ICASSP , pp. 6645-6649
- Graves, A.¹ Mohamed, A.R.² Hinton, G.E.³

24
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction: Possible role of a repetitive structure in sounds, Speech communication, vol. 27, no. 3, pp. 187-207, 1999
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

25
- 44949143155
- Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation, in Proc. ICSLP, 2006
- (2006) Proc. ICSLP
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

26
- 0002623785
- Learning distributed representations of concepts
- G. E. Hinton, Learning distributed representations of concepts, in Proceedings of the eighth annual conference of the cognitive science society, 1986
- (1986) Proceedings of the Eighth Annual Conference of the Cognitive Science Society
- Hinton, G.E.¹

27
- 84921817164
- Learning representations by back-propagating errors
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning representations by back-propagating errors, Cognitive modeling, 1988
- (1988) Cognitive Modeling
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

28
- 0025503558
- Backpropagation through time: What it does and how to do it
- P. J.Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, 1990
- (1990) Proceedings of the IEEE , vol.78 , Issue.10 , pp. 1550-1560
- Werbos, P.J.¹

29
- 0033708106
- Speech parameter generation algorithms for hmm-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, Speech parameter generation algorithms for hmm-based speech synthesis, in ICASSP, 2000
- (2000) ICASSP
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

30
- 85090475413
- The CMU Arctic speech databases
- J. Kominek and A. W. Black, The CMU Arctic speech databases, in Fifth ISCAWorkshop on Speech Synthesis, 2004
- (2004) Fifth ISCAWorkshop on Speech Synthesis
- Kominek, J.¹ Black, A.W.²

31
- 84946100789
- F. Weninger and J. Bergmann, Currennt: CUDA-enabled machine learning library for Recurrent Neural Networks, http://sourceforge.net/projects/currennt
- Currennt: CUDA-enabled Machine Learning Library for Recurrent Neural Networks
- Weninger, F.¹ Bergmann, J.²

32
- 84890527090
- Multi-distribution deep belief network for speech synthesis
- S.Y. Kang, X.J. Qian, and H. Meng, Multi-distribution Deep Belief Network for speech synthesis, in ICASSP, 2013
- (2013) ICASSP
- Kang, S.Y.¹ Qian, X.J.² Meng, H.³

33
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, Statistical parametric speech synthesis using Deep Neural Networks, in ICASSP, 2013
- (2013) ICASSP
- Zen, H.¹ Senior, A.² Schuster, M.³

34
- 84910030421
- Statistical parametric speech synthesis using weighted multi-distribution Deep Belief Network
- S.Y. Kang and H.Meng, Statistical parametric speech synthesis using weighted multi-distribution Deep Belief Network, in Interspeech, 2014
- (2014) Interspeech
- Kang, S.Y.¹ Meng, H.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.