메뉴 건너뛰기




Volumn 2015-August, Issue , 2015, Pages 4869-4873

Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks

Author keywords

bidirectional long short term memory; dynamic features; recurrent neural networks; voice conversion

Indexed keywords

AUDIO SIGNAL PROCESSING; BRAIN; DEEP NEURAL NETWORKS; LONG SHORT-TERM MEMORY; RECURRENT NEURAL NETWORKS; SPEECH COMMUNICATION;

EID: 84946027999     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2015.7178896     Document Type: Conference Paper
Times cited : (306)

References (34)
  • 3
    • 56149096085 scopus 로고    scopus 로고
    • Weighted frequency warping for voice conversion
    • D. Erro and A. Moreno, Weighted frequency warping for voice conversion, in Interspeech, 2007
    • (2007) Interspeech
    • Erro, D.1    Moreno, A.2
  • 4
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for Deep Belief Nets
    • G. Hinton, S. Osindero, and Y. Teh, A fast learning algorithm for Deep Belief Nets, Neural computation, vol. 18, no. 7, pp. 1527-1554, 2006
    • (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.1    Osindero, S.2    Teh, Y.3
  • 7
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • T. Toda, A. W. Black, and K. Tokuda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007
    • (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 9
    • 84906225084 scopus 로고    scopus 로고
    • Joint spectral distribution modeling using Restricted Boltzmann Machines for voice conversion
    • L. H. Chen, Z. H. Ling, Y. Song, and L. R. Dai, Joint spectral distribution modeling using Restricted Boltzmann Machines for voice conversion, in Interspeech, 2013
    • (2013) Interspeech
    • Chen, L.H.1    Ling, Z.H.2    Song, Y.3    Dai, L.R.4
  • 12
    • 84910087396 scopus 로고    scopus 로고
    • High-order sequence modeling using speaker-dependent recurrent temporal Restricted Boltzmann Machines for voice conversion
    • T. Nakashika, T. Takiguchi, and Y. Ariki, High-order sequence modeling using speaker-dependent recurrent temporal Restricted Boltzmann Machines for voice conversion, in Interspeech, 2014
    • (2014) Interspeech
    • Nakashika, T.1    Takiguchi, T.2    Ariki, Y.3
  • 13
    • 0028392483 scopus 로고
    • Learning longterm dependencies with gradient descent is difficult
    • Y. Bengio, P. Simard, and P. Frasconi, Learning longterm dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, vol. 5, no. 2, pp. 157-166, 1994
    • (1994) IEEE Transactions on Neural Networks , vol.5 , Issue.2 , pp. 157-166
    • Bengio, Y.1    Simard, P.2    Frasconi, P.3
  • 14
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional LSTM and other neural network architectures
    • A. Graves and J. Schmidhuber, Framewise phoneme classification with bidirectional LSTM and other neural network architectures, Neural Networks, vol. 18, no. 5, pp. 602-610, 2005
    • (2005) Neural Networks , vol.18 , Issue.5 , pp. 602-610
    • Graves, A.1    Schmidhuber, J.2
  • 15
  • 16
    • 0035505385 scopus 로고    scopus 로고
    • LSTM recurrent networks learn simple context-free and context-sensitive languages
    • F. A. Gers and J. Schmidhuber, LSTM recurrent networks learn simple context-free and context-sensitive languages, IEEE Transactions on Neural Networks, vol. 12, no. 6, pp. 1333-1340, 2001
    • (2001) IEEE Transactions on Neural Networks , vol.12 , Issue.6 , pp. 1333-1340
    • Gers, F.A.1    Schmidhuber, J.2
  • 17
    • 84893701254 scopus 로고    scopus 로고
    • Hybrid speech recognition with deep bidirectional LSTM
    • A. Graves, N. Jaitly, and A. R. Mohamed, Hybrid speech recognition with deep bidirectional LSTM, in ASRU, 2013
    • (2013) ASRU
    • Graves, A.1    Jaitly, N.2    Mohamed, A.R.3
  • 19
    • 84890489927 scopus 로고    scopus 로고
    • Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly nonstationary noise
    • M. Wollmer, Z. X. Zhang, F. Weninger, B. Schuller, and G. Rigoll, Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly nonstationary noise, in ICASSP, 2013
    • (2013) ICASSP
    • Wollmer, M.1    Zhang, Z.X.2    Weninger, F.3    Schuller, B.4    Rigoll, G.5
  • 20
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Y. C. Fan, Y. Qian, F. L. Xie, and F. K. Soong, TTS synthesis with bidirectional LSTM based Recurrent Neural Networks, in Interspeech, 2014
    • (2014) Interspeech
    • Fan, Y.C.1    Qian, Y.2    Xie, F.L.3    Soong, F.K.4
  • 22
    • 0034293152 scopus 로고    scopus 로고
    • Learning to forget: Continual prediction with LSTM
    • F. A. Gers, J. Schmidhuber, and F. Cummins, Learning to forget: Continual prediction with LSTM, Neural computation, vol. 12, no. 10, pp. 2451-2471, 2000
    • (2000) Neural Computation , vol.12 , Issue.10 , pp. 2451-2471
    • Gers, F.A.1    Schmidhuber, J.2    Cummins, F.3
  • 23
    • 84890543083 scopus 로고    scopus 로고
    • Speech recognition with deep Recurrent Neural Networks
    • A. Graves, A. R. Mohamed, and G. E. Hinton, Speech recognition with deep Recurrent Neural Networks, in ICASSP, 2013, pp. 6645-6649
    • (2013) ICASSP , pp. 6645-6649
    • Graves, A.1    Mohamed, A.R.2    Hinton, G.E.3
  • 24
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequencybased F0 extraction: Possible role of a repetitive structure in sounds, Speech communication, vol. 27, no. 3, pp. 187-207, 1999
    • (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 25
    • 44949143155 scopus 로고    scopus 로고
    • Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
    • Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation, in Proc. ICSLP, 2006
    • (2006) Proc. ICSLP
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 28
    • 0025503558 scopus 로고
    • Backpropagation through time: What it does and how to do it
    • P. J.Werbos, Backpropagation through time: what it does and how to do it, Proceedings of the IEEE, vol. 78, no. 10, pp. 1550-1560, 1990
    • (1990) Proceedings of the IEEE , vol.78 , Issue.10 , pp. 1550-1560
    • Werbos, P.J.1
  • 32
    • 84890527090 scopus 로고    scopus 로고
    • Multi-distribution deep belief network for speech synthesis
    • S.Y. Kang, X.J. Qian, and H. Meng, Multi-distribution Deep Belief Network for speech synthesis, in ICASSP, 2013
    • (2013) ICASSP
    • Kang, S.Y.1    Qian, X.J.2    Meng, H.3
  • 33
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, Statistical parametric speech synthesis using Deep Neural Networks, in ICASSP, 2013
    • (2013) ICASSP
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 34
    • 84910030421 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using weighted multi-distribution Deep Belief Network
    • S.Y. Kang and H.Meng, Statistical parametric speech synthesis using weighted multi-distribution Deep Belief Network, in Interspeech, 2014
    • (2014) Interspeech
    • Kang, S.Y.1    Meng, H.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.