메뉴 건너뛰기




Volumn 22, Issue 12, 2014, Pages 1859-1872

Voice conversion using deep neural networks with layer-wise generative training

Author keywords

Bidirectional associative memory; Deep neural network; Gaussian mixture model; Restricted Boltzmann machine; Spectral envelope conversion; Voice conversion

Indexed keywords

ASSOCIATIVE PROCESSING; ASSOCIATIVE STORAGE; GAUSSIAN DISTRIBUTION; IMAGE SEGMENTATION; MAPPING; NETWORK LAYERS; SPEECH PROCESSING;

EID: 84921735339     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2014.2353991     Document Type: Article
Times cited : (255)

References (43)
  • 4
    • 84865698185 scopus 로고    scopus 로고
    • Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
    • Nov
    • T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 9, pp. 2505-2517, Nov.. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.9 , pp. 2505-2517
    • Toda, T.1    Nakagiri, M.2    Shikano, K.3
  • 6
    • 85068458327 scopus 로고    scopus 로고
    • Weighted frequency warping for voice conversion
    • D. Erro and A. Moreno, "Weighted frequency warping for voice conversion," in Proc. Interspeech, 2007, pp. 1965-1968.
    • (2007) Proc. Interspeech , pp. 1965-1968
    • Erro, D.1    Moreno, A.2
  • 7
    • 0032026483 scopus 로고    scopus 로고
    • Continuous probabilistic transform for voice conversion
    • Mar
    • Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
    • (1998) IEEE Trans. Speech Audio Process , vol.6 , Issue.2 , pp. 131-142
    • Stylianou, Y.1    Cappe, O.2    Moulines, E.3
  • 8
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to-speech syn thesis
    • A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. ICASSP, 1998, pp. 285-288.
    • (1998) Proc. ICASSP , pp. 285-288
    • Kain, A.1    Macon, M.2
  • 9
    • 34047254509 scopus 로고    scopus 로고
    • Quality-enhanced voice morphing using maximum likelihood transformations
    • Jul
    • H. Ye and S. Young, "Quality-enhanced voice morphing using maximum likelihood transformations," IEEE Trans. Audio, Speech, Lang.Process., vol. 14, no. 4, pp. 1301-1312, Jul. 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang.Process , vol.14 , Issue.4 , pp. 1301-1312
    • Ye, H.1    Young, S.2
  • 10
    • 84905560807 scopus 로고    scopus 로고
    • Voice conversion with smoothed gmm and map adaptation
    • Y. Chen,M. Chu, E. Chang, J. Liu, and R. Liu, "Voice conversion with smoothed GMM and MAP adaptation," Eurospeech, pp. 2413-2416, 2003.
    • (2003) Eurospeech , pp. 2413-2416
    • Chenm. Chu, Y.1    Chang, E.2    Liu, J.3    Liu, R.4
  • 11
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum- likelihood estimation of spectral parameter trajectory
    • Nov
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum- likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 12
    • 78149260085 scopus 로고    scopus 로고
    • Continuous stochastic feature mapping based on trajectory hmms
    • Feb
    • H. Zen, Y. Nankaku, and K. Tokuda, "Continuous stochastic feature mapping based on trajectory HMMs," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 2, pp. 417-430, Feb. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.2 , pp. 417-430
    • Zen, H.1    Nankaku, Y.2    Tokuda, K.3
  • 14
    • 84906276055 scopus 로고    scopus 로고
    • Exemplar-based unit selection for voice conversion utilizing temporal information
    • Z.Wu, T. Virtanen, T. Kinnunen, E. Chng, and H. Li, "Exemplar-based unit selection for voice conversion utilizing temporal information," in Proc. Interspeech, 2013.
    • (2013) Proc. Interspeech
    • Wu, Z.1    Virtanen, T.2    Kinnunen, T.3    Chng, E.4    Li, H.5
  • 15
    • 84856141218 scopus 로고    scopus 로고
    • Voice conversion using dynamic kernel partial least squares regression
    • Mar
    • E. Helander, H. Silen, T. Virtanen, and M. Gabbouj, "Voice conversion using dynamic kernel partial least squares regression," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 806-817, Mar. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.3 , pp. 806-817
    • Helander, E.1    Silen, H.2    Virtanen, T.3    Gabbouj, M.4
  • 16
    • 84865737668 scopus 로고    scopus 로고
    • Gaussian process experts for voice conversion
    • N. Pilkington, H. Zen, and M. J. F. Gales, "Gaussian process experts for voice conversion," in Proc. Interspeech, 2011, pp. 2761-2764.
    • (2011) Proc. Interspeech , pp. 2761-2764
    • Pilkington, N.1    Zen, H.2    Gales, M.J.F.3
  • 18
    • 84906280857 scopus 로고    scopus 로고
    • Voice conversion in high-order eigen space using deep belief nets
    • T. Nakashika, T. Takashima, R. Takiguchi, and Y. Ariki, "Voice conversion in high-order eigen space using deep belief nets," in Proc. Interspeech, 2013, pp. 369-372.
    • (2013) Proc. Interspeech , pp. 369-372
    • Nakashika, T.1    Takashima, T.2    Takiguchi, R.3    Ariki, Y.4
  • 20
    • 84906225084 scopus 로고    scopus 로고
    • Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
    • L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion," in Proc. Interspeech, 2013, pp. 3052-3056.
    • (2013) Proc. Interspeech , pp. 3052-3056
    • Chen, L.-H.1    Ling, Z.-H.2    Song, Y.3    Dai, L.-R.4
  • 21
    • 84905223323 scopus 로고    scopus 로고
    • Using bidirectional associative memories for joint spectral envelope modeling in voice conversion
    • L.-J. Liu, L.-H. Chen, Z.-H. Ling, and L.-R. Dai, "Using bidirectional associative memories for joint spectral envelope modeling in voice conversion," in Proc. ICASSP, 2014, pp. 7884-7888.
    • (2014) Proc. ICASSP , pp. 7884-7888
    • Liu, L.-J.1    Chen, L.-H.2    Ling, Z.-H.3    Dai, L.-R.4
  • 22
    • 84890447002 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted boltzmann machines for statistical parametric speech syn thesis
    • Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis," in Proc. ICASSP, 2013, pp. 7825-7829.
    • (2013) Proc. ICASSP , pp. 7825-7829
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 23
    • 0013344078 scopus 로고    scopus 로고
    • Training products of experts by minimizing contrastive divergence
    • G. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Comput., vol. 12, no. 14, pp. 1711-1800, 2002.
    • (2002) Neural Comput , vol.12 , Issue.14 , pp. 1711-1800
    • Hinton, G.1
  • 24
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech syn thesis
    • Oct
    • Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang.Process., vol. 21, no. 10, pp. 2129-2139, Oct. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang.Process , vol.21 , Issue.10 , pp. 2129-2139
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 26
    • 84890527090 scopus 로고    scopus 로고
    • Multi-distribution deep belief network for speech syn thesis
    • S. Kang, X. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis," in Proc. ICASSP, 2013, pp. 8012-8016.
    • (2013) Proc. ICASSP , pp. 8012-8016
    • Kang, S.1    Qian, X.2    Meng, H.3
  • 27
    • 0000329993 scopus 로고
    • Information processing in dynamical systems: Foundations of harmony theory
    • D. E. Rumelhart and J. L.McClelland, Eds. Cambridge, MA, USA: MIT Press, ch. 6
    • P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," in Parallel distributed processing: explorations in the microstructure of cognition, D. E. Rumelhart and J. L.McClelland, Eds. Cambridge, MA, USA: MIT Press, 1986, vol. 1, ch. 6, pp. 194-281.
    • (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol.1 , pp. 194-281
    • Smolensky, P.1
  • 28
    • 33746600649 scopus 로고    scopus 로고
    • Reducing the dimensionality of data with neural networks
    • G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
    • (2006) Science , vol.313 , Issue.5786 , pp. 504-507
    • Hinton, G.E.1    Salakhutdinov, R.R.2
  • 29
  • 30
    • 0023861743 scopus 로고
    • Bidirectional associative memories
    • Jan
    • B. Kosko, "Bidirectional associative memories," IEEE Trans. Systems, Man, Cybern., vol. 18, no. 1, pp. 49-60, Jan. 1988.
    • (1988) IEEE Trans. Systems, Man, Cybern , vol.18 , Issue.1 , pp. 49-60
    • Kosko, B.1
  • 31
    • 0009361665 scopus 로고
    • A pseudo-relaxation learning algorithm for bidirectional associativememory
    • H. Oh and S. C. Kothari, "A pseudo-relaxation learning algorithm for bidirectional associativememory," in Proc. Int. Joint Conf. Neural Networks (IJCNN'92), 1992, vol. 2, pp. 208-213.
    • (1992) Proc. Int. Joint Conf. Neural Networks (IJCNN'92) , vol.2 , pp. 208-213
    • Oh, H.1    Kothari, S.C.2
  • 32
    • 0028410032 scopus 로고
    • Quick learning for bidirectional associative memory
    • M. Hattori, M. Hagiwara, and M. Nakagawa, "Quick learning for bidirectional associative memory," IEICE Trans. Inf. Syst., vol. 77, no. 4, pp. 385-392, 1994.
    • (1994) IEICE Trans. Inf. Syst , vol.77 , Issue.4 , pp. 385-392
    • Hattori, M.1    Hagiwara, M.2    Nakagawa, M.3
  • 33
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the hmm as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
    • H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Lang., vol. 21, no.1, pp. 153-173, 2007.
    • (2007) Comput. Speech Lang , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 34
    • 84878387361 scopus 로고    scopus 로고
    • PLDA using gaussian restricted boltzmann machines with application to speaker verification
    • T. Stafylakis, P. Kenny, M. Senoussaoui, and P. Dumouchel, "PLDA using Gaussian restricted Boltzmann machines with application to speaker verification," in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Stafylakis, T.1    Kenny, P.2    Senoussaoui, M.3    Dumouchel, P.4
  • 35
    • 69349090197 scopus 로고    scopus 로고
    • Learning deep architectures for ai
    • Jan
    • Y. Bengio, "Learning deep architectures for AI," Foundat. Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, Jan. 2009.
    • (2009) Foundat. Trends Mach. Learn , vol.2 , Issue.1 , pp. 1-127
    • Bengio, Y.1
  • 36
    • 84872506495 scopus 로고    scopus 로고
    • A practical guide to training restricted boltzmann machines
    • NewYork,NY, USA: Springer, 2012
    • G. E. Hinton, "A practical guide to training restricted Boltzmann machines," in Neural Networks: Tricks of the Trade. NewYork,NY, USA: Springer, 2012, vol. 7700, pp. 599-619.
    • Neural Networks: Tricks of the Trade , vol.7700 , pp. 599-619
    • Hinton, G.E.1
  • 39
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol.27, no. 3, pp. 187-208, 1999.
    • (1999) Speech Commun , vol.27 , Issue.3 , pp. 187-208
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 42
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for hmm-based speech syn thesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. 90, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inf. Syst , vol.90 , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 43
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pretrained deep neural networks for large-vocabulary speech recognition
    • Jan
    • G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.