SCOPUS 정보 검색 플랫폼

IEEE/ACM Transactions on Audio Speech and Language Processing

Volumn 22, Issue 12, 2014, Pages 1859-1872

Voice conversion using deep neural networks with layer-wise generative training

(4) Chen, Ling Hui a Ling, Zhen Hua a Liu, Li Juan a Dai, Li Rong a

a National Engineering Laboratory for Speech and Language Information Processing (China)

Author keywords

Bidirectional associative memory; Deep neural network; Gaussian mixture model; Restricted Boltzmann machine; Spectral envelope conversion; Voice conversion

Indexed keywords

ASSOCIATIVE PROCESSING; ASSOCIATIVE STORAGE; GAUSSIAN DISTRIBUTION; IMAGE SEGMENTATION; MAPPING; NETWORK LAYERS; SPEECH PROCESSING;

BI-DIRECTIONAL ASSOCIATIVE MEMORY; DEEP NEURAL NETWORKS; GAUSSIAN MIXTURE MODEL; RESTRICTED BOLTZMANN MACHINE; SPECTRAL ENVELOPES; VOICE CONVERSION;

PHOTOMAPPING;

EID: 84921735339 PISSN: 23299290 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2014.2353991 Document Type: Article

Times cited : (255)

References (43)

1
- 0023739214
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," in Proc. ICASSP, 1988, pp.655-658.
- (1988) Proc. ICASSP , pp. 655-658
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

2
- 84897939966
- Alaryngeal speech enhancement based on one-to-many eigenvoice conversion
- Jan
- H. Doi, T. Toda, K. Nakamura, H. Saruwatari, and K. Shikano, "Alaryngeal speech enhancement based on one-to-many eigenvoice conversion," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 1, pp. 172-183, Jan. 2014.
- (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process , vol.22 , Issue.1 , pp. 172-183
- Doi, H.¹ Toda, T.² Nakamura, K.³ Saruwatari, H.⁴ Shikano, K.⁵

3
- 78349252661
- Voice conversion: From spoken vowels to singing vowels
- T. L. New, M. Dong, P. Chan, X. Wang, B. Ma, and H. Li, "Voice conversion: From spoken vowels to singing vowels," in Proc. IEEE Int. Conf. Multimedia Expo (ICME), 2010, pp. 1421-1426.
- (2010) Proc. IEEE Int. Conf. Multimedia Expo (ICME) , pp. 1421-1426
- New, T.L.¹ Dong, M.² Chan, P.³ Wang, X.⁴ Ma, B.⁵ Li, H.⁶

4
- 84865698185
- Statistical voice conversion techniques for body-conducted unvoiced speech enhancement
- Nov
- T. Toda, M. Nakagiri, and K. Shikano, "Statistical voice conversion techniques for body-conducted unvoiced speech enhancement," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 9, pp. 2505-2517, Nov.. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.9 , pp. 2505-2517
- Toda, T.¹ Nakagiri, M.² Shikano, K.³

5
- 34547507542
- Frequency warping based on mapping formant parameters
- Z. W. Shuang, R. Bakis, S. Shechtman, D. Chazan, and Y. Qin, "Frequency warping based on mapping formant parameters," in Proc. Interspeech, 2006, pp. 2290-2293.
- (2006) Proc. Interspeech , pp. 2290-2293
- Shuang, Z.W.¹ Bakis, R.² Shechtman, S.³ Chazan, D.⁴ Qin, Y.⁵

6
- 85068458327
- Weighted frequency warping for voice conversion
- D. Erro and A. Moreno, "Weighted frequency warping for voice conversion," in Proc. Interspeech, 2007, pp. 1965-1968.
- (2007) Proc. Interspeech , pp. 1965-1968
- Erro, D.¹ Moreno, A.²

7
- 0032026483
- Continuous probabilistic transform for voice conversion
- Mar
- Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
- (1998) IEEE Trans. Speech Audio Process , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

8
- 0031623661
- Spectral voice conversion for text-to-speech syn thesis
- A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. ICASSP, 1998, pp. 285-288.
- (1998) Proc. ICASSP , pp. 285-288
- Kain, A.¹ Macon, M.²

9
- 34047254509
- Quality-enhanced voice morphing using maximum likelihood transformations
- Jul
- H. Ye and S. Young, "Quality-enhanced voice morphing using maximum likelihood transformations," IEEE Trans. Audio, Speech, Lang.Process., vol. 14, no. 4, pp. 1301-1312, Jul. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang.Process , vol.14 , Issue.4 , pp. 1301-1312
- Ye, H.¹ Young, S.²

10
- 84905560807
- Voice conversion with smoothed gmm and map adaptation
- Y. Chen,M. Chu, E. Chang, J. Liu, and R. Liu, "Voice conversion with smoothed GMM and MAP adaptation," Eurospeech, pp. 2413-2416, 2003.
- (2003) Eurospeech , pp. 2413-2416
- Chenm. Chu, Y.¹ Chang, E.² Liu, J.³ Liu, R.⁴

11
- 57749193836
- Voice conversion based on maximum- likelihood estimation of spectral parameter trajectory
- Nov
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum- likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

12
- 78149260085
- Continuous stochastic feature mapping based on trajectory hmms
- Feb
- H. Zen, Y. Nankaku, and K. Tokuda, "Continuous stochastic feature mapping based on trajectory HMMs," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 2, pp. 417-430, Feb. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.2 , pp. 417-430
- Zen, H.¹ Nankaku, Y.² Tokuda, K.³

13
- 34547496196
- Towards a voice conversion system based on frame selection
- T. Dutoit, A. Holzapfel, M. Jottrand, A. Moinet, J. Perez, and Y.Stylianou, "Towards a voice conversion system based on frame selection," in Proc. ICASSP, 2007, vol. 4, pp. IV-513-IV-516.
- (2007) Proc. ICASSP , vol.4 , pp. IV513-IV516
- Dutoit, T.¹ Holzapfel, A.² Jottrand, M.³ Moinet, A.⁴ Perez, J.⁵ Stylianou, Y.⁶

14
- 84906276055
- Exemplar-based unit selection for voice conversion utilizing temporal information
- Z.Wu, T. Virtanen, T. Kinnunen, E. Chng, and H. Li, "Exemplar-based unit selection for voice conversion utilizing temporal information," in Proc. Interspeech, 2013.
- (2013) Proc. Interspeech
- Wu, Z.¹ Virtanen, T.² Kinnunen, T.³ Chng, E.⁴ Li, H.⁵

15
- 84856141218
- Voice conversion using dynamic kernel partial least squares regression
- Mar
- E. Helander, H. Silen, T. Virtanen, and M. Gabbouj, "Voice conversion using dynamic kernel partial least squares regression," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 806-817, Mar. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.3 , pp. 806-817
- Helander, E.¹ Silen, H.² Virtanen, T.³ Gabbouj, M.⁴

16
- 84865737668
- Gaussian process experts for voice conversion
- N. Pilkington, H. Zen, and M. J. F. Gales, "Gaussian process experts for voice conversion," in Proc. Interspeech, 2011, pp. 2761-2764.
- (2011) Proc. Interspeech , pp. 2761-2764
- Pilkington, N.¹ Zen, H.² Gales, M.J.F.³

17
- 77953707533
- Spectralmapping using artificial neural networks for voice conversion
- Jul
- S. Desai, A. Black,B.Yegnanarayana, andK. Prahallad, "Spectralmapping using artificial neural networks for voice conversion," IEEE Trans.Audio, Speech, Lang. Process., vol. 18, no. 5, pp. 954-964, Jul. 2010.
- (2010) IEEE Trans.Audio, Speech, Lang. Process , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, B.² Yegnanarayana, A.³ Prahallad, K.⁴

18
- 84906280857
- Voice conversion in high-order eigen space using deep belief nets
- T. Nakashika, T. Takashima, R. Takiguchi, and Y. Ariki, "Voice conversion in high-order eigen space using deep belief nets," in Proc. Interspeech, 2013, pp. 369-372.
- (2013) Proc. Interspeech , pp. 369-372
- Nakashika, T.¹ Takashima, T.² Takiguchi, R.³ Ariki, Y.⁴

19
- 84889579519
- Conditional restricted boltzmann machine for voice conversion
- Z. Wu, E. S. Chng, and H. Li, "Conditional restricted Boltzmann machine for voice conversion," in Proc. IEEE China Summit Int. Conf.Signal Inf. Process. (ChinaSIP), 2013, pp. 104-108.
- (2013) Proc. IEEE China Summit Int. Conf.Signal Inf. Process. (ChinaSIP) , pp. 104-108
- Wu, Z.¹ Chng, E.S.² Li, H.³

20
- 84906225084
- Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
- L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion," in Proc. Interspeech, 2013, pp. 3052-3056.
- (2013) Proc. Interspeech , pp. 3052-3056
- Chen, L.-H.¹ Ling, Z.-H.² Song, Y.³ Dai, L.-R.⁴

21
- 84905223323
- Using bidirectional associative memories for joint spectral envelope modeling in voice conversion
- L.-J. Liu, L.-H. Chen, Z.-H. Ling, and L.-R. Dai, "Using bidirectional associative memories for joint spectral envelope modeling in voice conversion," in Proc. ICASSP, 2014, pp. 7884-7888.
- (2014) Proc. ICASSP , pp. 7884-7888
- Liu, L.-J.¹ Chen, L.-H.² Ling, Z.-H.³ Dai, L.-R.⁴

22
- 84890447002
- Modeling spectral envelopes using restricted boltzmann machines for statistical parametric speech syn thesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines for statistical parametric speech synthesis," in Proc. ICASSP, 2013, pp. 7825-7829.
- (2013) Proc. ICASSP , pp. 7825-7829
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

23
- 0013344078
- Training products of experts by minimizing contrastive divergence
- G. Hinton, "Training products of experts by minimizing contrastive divergence," Neural Comput., vol. 12, no. 14, pp. 1711-1800, 2002.
- (2002) Neural Comput , vol.12 , Issue.14 , pp. 1711-1800
- Hinton, G.¹

24
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech syn thesis
- Oct
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang.Process., vol. 21, no. 10, pp. 2129-2139, Oct. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang.Process , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

25
- 84898947294
- Learning stochastic feedforward neural networks
- Cambridge, MA, USA: MIT Press
- Y. Tang and R. Salakhutdinov, "Learning stochastic feedforward neural networks," in Advances in Neural Information Processing Systems 26. Cambridge, MA, USA: MIT Press, 2013, pp. 530-538.
- (2013) Advances in Neural Information Processing Systems 26 , pp. 530-538
- Tang, Y.¹ Salakhutdinov, R.²

26
- 84890527090
- Multi-distribution deep belief network for speech syn thesis
- S. Kang, X. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis," in Proc. ICASSP, 2013, pp. 8012-8016.
- (2013) Proc. ICASSP , pp. 8012-8016
- Kang, S.¹ Qian, X.² Meng, H.³

27
- 0000329993
- Information processing in dynamical systems: Foundations of harmony theory
- D. E. Rumelhart and J. L.McClelland, Eds. Cambridge, MA, USA: MIT Press, ch. 6
- P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," in Parallel distributed processing: explorations in the microstructure of cognition, D. E. Rumelhart and J. L.McClelland, Eds. Cambridge, MA, USA: MIT Press, 1986, vol. 1, ch. 6, pp. 194-281.
- (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol.1 , pp. 194-281
- Smolensky, P.¹

28
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, no. 5786, pp. 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

29
- 78651276374
- Ph.D. dissertation, Univ. of Toronto, Toronto, ON, Canada
- R. Salakhutdinov, "Learning deep generative models," Ph.D. dissertation, Univ. of Toronto, Toronto, ON, Canada, 2009.
- (2009) Learning Deep Generative Models
- Salakhutdinov, R.¹

30
- 0023861743
- Bidirectional associative memories
- Jan
- B. Kosko, "Bidirectional associative memories," IEEE Trans. Systems, Man, Cybern., vol. 18, no. 1, pp. 49-60, Jan. 1988.
- (1988) IEEE Trans. Systems, Man, Cybern , vol.18 , Issue.1 , pp. 49-60
- Kosko, B.¹

31
- 0009361665
- A pseudo-relaxation learning algorithm for bidirectional associativememory
- H. Oh and S. C. Kothari, "A pseudo-relaxation learning algorithm for bidirectional associativememory," in Proc. Int. Joint Conf. Neural Networks (IJCNN'92), 1992, vol. 2, pp. 208-213.
- (1992) Proc. Int. Joint Conf. Neural Networks (IJCNN'92) , vol.2 , pp. 208-213
- Oh, H.¹ Kothari, S.C.²

32
- 0028410032
- Quick learning for bidirectional associative memory
- M. Hattori, M. Hagiwara, and M. Nakagawa, "Quick learning for bidirectional associative memory," IEICE Trans. Inf. Syst., vol. 77, no. 4, pp. 385-392, 1994.
- (1994) IEICE Trans. Inf. Syst , vol.77 , Issue.4 , pp. 385-392
- Hattori, M.¹ Hagiwara, M.² Nakagawa, M.³

33
- 33749573927
- Reformulating the hmm as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Lang., vol. 21, no.1, pp. 153-173, 2007.
- (2007) Comput. Speech Lang , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

34
- 84878387361
- PLDA using gaussian restricted boltzmann machines with application to speaker verification
- T. Stafylakis, P. Kenny, M. Senoussaoui, and P. Dumouchel, "PLDA using Gaussian restricted Boltzmann machines with application to speaker verification," in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Stafylakis, T.¹ Kenny, P.² Senoussaoui, M.³ Dumouchel, P.⁴

35
- 69349090197
- Learning deep architectures for ai
- Jan
- Y. Bengio, "Learning deep architectures for AI," Foundat. Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, Jan. 2009.
- (2009) Foundat. Trends Mach. Learn , vol.2 , Issue.1 , pp. 1-127
- Bengio, Y.¹

36
- 84872506495
- A practical guide to training restricted boltzmann machines
- NewYork,NY, USA: Springer, 2012
- G. E. Hinton, "A practical guide to training restricted Boltzmann machines," in Neural Networks: Tricks of the Trade. NewYork,NY, USA: Springer, 2012, vol. 7700, pp. 599-619.
- Neural Networks: Tricks of the Trade , vol.7700 , pp. 599-619
- Hinton, G.E.¹

37
- 0000999440
- Distributed representations
- D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA, USA: MIT Press, ch. 6
- G. E. Hinton, J. L. McClelland, and D. E. Rumelhart, "Distributed representations," in Parallel distributed processing: explorations in the microstructure of cognition, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA, USA: MIT Press, 1986, vol. 1, ch. 6, pp.77-109.
- (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol.1 , pp. 77-109
- Hinton, G.E.¹ McClelland, J.L.² Rumelhart, D.E.³

38
- 84862286946
- Deep boltzmann machines
- R. Salakhutdinov and G. E. Hinton, "Deep Boltzmann machines," in Proc. Int. Conf. Artif. Intell. Statist., 2009, pp. 448-455.
- (2009) Proc. Int. Conf. Artif. Intell. Statist , pp. 448-455
- Salakhutdinov, R.¹ Hinton, G.E.²

39
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol.27, no. 3, pp. 187-208, 1999.
- (1999) Speech Commun , vol.27 , Issue.3 , pp. 187-208
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

40
- 0033350721
- Products of experts
- (Conf. Publ. No. 470), IET
- G. E. Hinton, "Products of experts," in Proc. 9th Int. Conf. Artif. Neural Netw. (ICANN '99) (Conf. Publ. No. 470), 1999, vol. 1, pp. 1-6, IET.
- (1999) Proc. 9th Int. Conf. Artif. Neural Netw. (ICANN '99) , vol.1 , pp. 1-6
- Hinton, G.E.¹

41
- 84867720412
- arXiv preprint arXiv:1207.0580
- G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. R.Salakhutdinov, "Improving neural networks by preventing co-adaptation of feature detectors," arXiv preprint arXiv:1207.0580 2012.
- (2012) Improving Neural Networks by Preventing Co-adaptation of Feature Detectors
- Hinton, G.E.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.R.⁵

42
- 38549096029
- A speech parameter generation algorithm considering global variance for hmm-based speech syn thesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. 90, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst , vol.90 , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

43
- 84055222005
- Context-dependent pretrained deep neural networks for large-vocabulary speech recognition
- Jan
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.