SCOPUS 정보 검색 플랫폼

Eurasip Journal on Audio, Speech, and Music Processing

Volumn 2015, Issue 1, 2015, Pages

Voice conversion using speaker-dependent conditional restricted Boltzmann machine

(3) Nakashika, Toru a Takiguchi, Tetsuya a Ariki, Yasuo a

Author keywords

Conditional restricted Boltzmann machine; Deep learning; Recurrent neural network; Speaker specific features; Voice conversion

Indexed keywords

RECURRENT NEURAL NETWORKS;

CONDITIONAL RESTRICTED BOLTZMANN MACHINES; DEEP LEARNING; NEURAL NETWORK (NN); OBJECTIVE EVALUATION; RECURRENT NEURAL NETWORK (RNN); SPEAKER INDEPENDENTS; SPEAKER-SPECIFIC FEATURES; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84924309945 PISSN: 16874714 EISSN: 16874722 Source Type: Journal
DOI: 10.1186/s13636-014-0044-3 Document Type: Article

Times cited : (16)

References (44)

1
- 0031623661
- MW Macon, in Proceedings of
- A Kain, MW Macon, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Spectral voice conversion for text-to-speech synthesis, (1998), pp. 285–288.
- (1998) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Spectral voice conversion for text-to-speech synthesis , pp. 285-288

2
- 84924288901
- X Robet, in Proceedings of Interspeech
- C Veaux, X Robet, in Proceedings of Interspeech. Intonation conversion from neutral to expressive speech, (2011), pp. 2765–2768.
- (2011) Intonation conversion from neutral to expressive speech , pp. 2765-2768
- Veaux, C.¹

3
- 80052698826
- Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech
- K Nakamura, T Toda, H Saruwatari, K Shikano, Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech. Speech Commun. 54(1), 134–146 (2012).
- (2012) Speech Commun , vol.54 , Issue.1 , pp. 134-146
- Nakamura, K.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

4
- 0034855352
- A Acero, L Jiang
- L Deng, A Acero, L Jiang, J Droppo, X Huang, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). High-performance robust speech recognition using stereo training data, (2001), pp. 301–304.
- (2001) J Droppo, X Huang, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). High-performance robust speech recognition using stereo training data , pp. 301-304

5
- 70450192197
- Minematsu, K Hirose, in Proceedings of Interspeech
- A Kunikoshi, Y Qiao, N Minematsu, K Hirose, in Proceedings of Interspeech. Speech generation from hand gestures based on space mapping, (2009), pp. 308–311.
- (2009) Speech generation from hand gestures based on space mapping , pp. 308-311
- A Kunikoshi, Y.¹ Qiao, N.²

6
- 0021412027
- Vector quantization
- R Gray, Vector quantization. ASSP Mag. IEEE. 1(2), 4–29 (1984).
- (1984) ASSP Mag. IEEE , vol.1 , Issue.2 , pp. 4-29
- Gray, R.¹

7
- 0026880275
- Voice transformation using PSOLA technique
- H Valbret, E Moulines, J-P Tubach, Voice transformation using PSOLA technique. Speech Commun. 11(2), 175–187 (1992).
- (1992) Speech Commun , vol.11 , Issue.2 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.-P.³

8
- 0032026483
- Continuous probabilistic transform for voice conversion
- Y Stylianou, Cappé O, E Moulines, Continuous probabilistic transform for voice conversion. IEEE Trans. Speech Audio Process. 6(2), 131–142 (1998).
- (1998) IEEE Trans. Speech Audio Process , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

9
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T Toda, AW Black, K Tokuda, Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory. IEEE Trans. Audio Speech Lang. Process. 15(8), 2222–2235 (2007).
- (2007) IEEE Trans. Audio Speech Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

10
- 77953712499
- Voice conversion using partial least squares regression
- E Helander, T Virtanen, J Nurminen, Gabbouj, Voice conversion using partial least squares regression. IEEE Trans. Audio Speech Lang. Process. 18(5), 912–921 (2010).
- (2010) IEEE Trans. Audio Speech Lang. Process , vol.18 , Issue.5 , pp. 912-921
- Helander, E.¹ Virtanen, T.² Nurminen, J.³ Gabbouj⁴

11
- 44949210554
- C-H Wu, in Proceedings of Interspeech
- C-H Lee, C-H Wu, in Proceedings of Interspeech. Map-based adaptation for speech conversion using adaptation data selection and non-parallel training, (2006), pp. 2254–2257.
- (2006) Map-based adaptation for speech conversion using adaptation data selection and non-parallel training , pp. 2254-2257
- Lee, C.-H.¹

12
- 34547512822
- K Shikano, in Proceedings of Interspeech
- T Toda, Y Ohtani, K Shikano, in Proceedings of Interspeech. Eigenvoice conversion based on gaussian mixture model, (2006), pp. 2446–2449.
- (2006) Eigenvoice conversion based on gaussian mixture model , pp. 2446-2449
- Toda, T.¹ Ohtani, Y.²

13
- 84865798483
- K Hirose, in Proceedings of Interspeech
- D Saito, Yamamoto K, N Minematsu, K Hirose, in Proceedings of Interspeech. One-to-many voice conversion based on tensor representation of speaker space, (2011), pp. 653–656.
- (2011) One-to-many voice conversion based on tensor representation of speaker space , pp. 653-656
- Saito, D.¹ Yamamoto, K.² Minematsu, N.³

14
- 79959834571
- Nakamura
- D Saito, S Watanabe, A Nakamura, N Minematsu, in Proceedings of Interspeech. Probabilistic integration of joint density model and speaker model for voice conversion, (2010), pp. 1728–1731.
- (2010) N Minematsu, in Proceedings of Interspeech. Probabilistic integration of joint density model and speaker model for voice conversion , pp. 1728-1731
- D Saito, S.¹ Watanabe, A.²

15
- 35148852326
- Z Yang, in Proceedings of International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing
- Z Jian, Z Yang, in Proceedings of International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing. Voice conversion using canonical correlation analysis based on Gaussian mixture model, (2007), pp. 210–215.
- (2007) Voice conversion using canonical correlation analysis based on Gaussian mixture model , pp. 210-215

16
- 84874248255
- Y Ariki, in
- R Takashima, T Takiguchi, Y Ariki, in IEEE Spoken Language Technology Workshop (SLT). Exemplar-based voice conversion in noisy environment, (2012), pp. 313–317.
- (2012) IEEE Spoken Language Technology Workshop (SLT). Exemplar-based voice conversion in noisy environment , pp. 313-317
- Takashima, R.¹ Takiguchi, T.²

17
- 0029254176
- Transformation of formants for voice conversion using artificial neural networks
- M Narendranath, HA Murthy, S Rajendran, B Yegnanarayana, Transformation of formants for voice conversion using artificial neural networks. Speech Commun. 16(2), 207–216 (1995).
- (1995) Speech Commun , vol.16 , Issue.2 , pp. 207-216
- Narendranath, M.¹ Murthy, H.A.² Rajendran, S.³ Yegnanarayana, B.⁴

18
- 70349197691
- EV Raghavendra, B Yegnanarayana, AW Black, K Prahallad, in Proceedings of
- S Desai, EV Raghavendra, B Yegnanarayana, AW Black, K Prahallad, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Voice conversion using artificial neural networks, (2009), pp. 3893–3896.
- (2009) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Voice conversion using artificial neural networks , pp. 3893-3896

19
- 4544270860
- Y-J Wu, H Kawai, J Ni, R-H Wang, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Minimum segmentation error based discriminative training for speech synthesis application, (2004), p. 629
- Y-J Wu, H Kawai, J Ni, R-H Wang, in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). Minimum segmentation error based discriminative training for speech synthesis application, (2004), p. 629.
- (2004)

20
- 34547522070
- Discriminative training for large-vocabulary speech recognition using minimum classification error
- E McDermott, TJ Hazen, J Le Roux, A Nakamura, S Katagiri, Discriminative training for large-vocabulary speech recognition using minimum classification error. IEEE Trans. Audio Speech Lang. Process. 15(1), 203–223 (2007).
- (2007) IEEE Trans. Audio Speech Lang. Process , vol.15 , Issue.1 , pp. 203-223
- McDermott, E.¹ Hazen, T.J.² Le Roux, J.³ Nakamura, A.⁴ Katagiri, S.⁵

21
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T Tomoki, K Tokuda, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis. IEICE Trans. Inform. Syst. 90(5), 816–824 (2007).
- (2007) IEICE Trans. Inform. Syst , vol.90 , Issue.5 , pp. 816-824
- Tomoki, T.¹ Tokuda, K.²

22
- 84901793334
- Minimum Kullback-Leibler divergence parameter generation for HMM-based speech synthesis
- Z-H Ling, L-R Dai, Minimum Kullback-Leibler divergence parameter generation for HMM-based speech synthesis. IEEE Trans. Audio Speech Lang. Process. 20(5), 1492–1502 (2012).
- (2012) IEEE Trans. Audio Speech Lang. Process , vol.20 , Issue.5 , pp. 1492-1502
- Ling, Z.-H.¹ Dai, L.-R.²

23
- 84924311712
- Qin, R-H Wang, in Blizzard Challenge Workshop
- Z-H Ling, Y-J Wu, Y-P Wang, L Qin, R-H Wang, in Blizzard Challenge Workshop. USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method, (2006).
- (2006) USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method
- Z-H Ling, Y.-J.W.¹ Y-P Wang, L.²

24
- 84924336471
- Chng, H Li, in Proceedings of the 8th ISCA Speech Synthesis Workshop
- Z Wu, T Virtanen, T Kinnunen, ES Chng, H Li, in Proceedings of the 8th ISCA Speech Synthesis Workshop. Exemplar-based voice conversion using non-negative spectrogram deconvolution, (2013), pp. 221–226.
- (2013) Exemplar-based voice conversion using non-negative spectrogram deconvolution , pp. 221-226
- Z Wu, T.¹ Virtanen, T.² Kinnunen, E.S.³

25
- 84906280857
- Takiguchi, Y Ariki, in Proceedings of Interspeech
- T Nakashika, R Takashima, T Takiguchi, Y Ariki, in Proceedings of Interspeech. Voice conversion in high-order eigen space using deep belief nets, (2013), pp. 369–372.
- (2013) Voice conversion in high-order eigen space using deep belief nets , pp. 369-372
- T Nakashika, R.¹ Takashima, T.²

26
- 0000329993
- Information processing in dynamical systems: foundations of harmony theory
- P Smolensky, Information processing in dynamical systems: foundations of harmony theory. Parallel Distributed Process. 1, 194–281 (1986).
- (1986) Parallel Distributed Process , vol.1 , pp. 194-281
- Smolensky, P.¹

27
- 33745805403
- A fast learning algorithm for deep belief nets
- GE Hinton, S Osindero, Y-W Teh, A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006).
- (2006) Neural Comput , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

28
- 84901237776
- Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z-H Ling, L Deng, D Yu, Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis. IEEE Trans. Audio Speech Lang. Process. 21(10), 2129–2139 (2013).
- (2013) IEEE Trans. Audio Speech Lang. Process , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

29
- 84055211743
- Acoustic modeling using deep belief networks
- A-R Mohamed, GE Dahl, G Hinton, Acoustic modeling using deep belief networks. Audio Speech Lang. Process. IEEE Trans. 20(1), 14–22 (2012).
- (2012) Audio Speech Lang. Process. IEEE Trans , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.-R.¹ Dahl, G.E.² Hinton, G.³

30
- 78149306047
- 3-D object recognition with deep belief nets
- V Nair, G Hinton, 3-D object recognition with deep belief nets. Adv. Neural Inform. Process. Syst. 22, 1339–1347 (2009).
- (2009) Adv. Neural Inform. Process. Syst , vol.22 , pp. 1339-1347
- Nair, V.¹ Hinton, G.²

31
- 84991233704
- Bender, H Ney, in Proceedings of the Fourth Workshop on Statistical Machine Translation
- T Deselaers, S Hasan, O Bender, H Ney, in Proceedings of the Fourth Workshop on Statistical Machine Translation. A deep learning approach to machine transliteration, (2009), pp. 233–241.
- (2009) A deep learning approach to machine transliteration , pp. 233-241
- T Deselaers, S.¹ Hasan, O.²

32
- 84924311711
- H Li, in Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)
- Z Wu, ES Chng, H Li, in Proceedings of the IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP). Conditional restricted Boltzmann machine for voice conversion, (2013).
- (2013) Conditional restricted Boltzmann machine for voice conversion
- Wu, Z.¹ Chng, E.S.²

33
- 84924334174
- Yan, D Li-Rong, in Proceedings of Interspeech
- C Ling-Hui, L Zhen-Hua, S Yan, D Li-Rong, in Proceedings of Interspeech. Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion, (2013), pp. 3052–3056.
- (2013) Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion , pp. 3052-3056
- C Ling-Hui, L.¹ Zhen-Hua, S.²

34
- 0001578518
- A learning algorithm for Boltzmann machines
- DH Ackley, GE Hinton, TJ Sejnowski, A learning algorithm for Boltzmann machines. Cogn. Sci. 9(1), 147–169 (1985).
- (1985) Cogn. Sci , vol.9 , Issue.1 , pp. 147-169
- Ackley, D.H.¹ Hinton, G.E.² Sejnowski, T.J.³

35
- 0345368881
- Unsupervised learning of distributions of binary vectors using two layer networks
- Y Freund, D Haussler, Unsupervised learning of distributions of binary vectors using two layer networks. Adv, Neural Inform. Process. Syst. 4, 912–919 (1991).
- (1991) Adv, Neural Inform. Process. Syst , vol.4 , pp. 912-919
- Freund, Y.¹ Haussler, D.²

36
- 33746600649
- Reducing the dimensionality of data with neural networks
- GE Hinton, RR Salakhutdinov, Reducing the dimensionality of data with neural networks. Science. 313(5786), 504–507 (2006).
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

37
- 84924287911
- in Tech. Rep. Department of Computer Science
- G Hinton, in Tech. Rep. Department of Computer Science. A practical guide to training restricted Boltzmann machines (University of Toronto, 2010).
- (2010) A practical guide to training restricted Boltzmann machines (University of Toronto
- Hinton, G.¹

38
- 77956002520
- University of Toronto: Tech. Rep
- A Krizhevsky, G Hinton, Learning multiple layers of features from tiny images (Computer Science Department, University of Toronto, Tech. Rep, 2009).
- (2009) Learning multiple layers of features from tiny images (Computer Science Department
- Krizhevsky, A.¹ Hinton, G.²

39
- 79959342724
- T Raiko, in Artificial Neural Networks and Machine Learning–ICANN 2011
- K Cho, A Ilin, T Raiko, in Artificial Neural Networks and Machine Learning–ICANN 2011. Improved learning of gaussian-bernoulli restricted Boltzmann machines, (2011), pp. 10–17.
- (2011) Improved learning of gaussian-bernoulli restricted Boltzmann machines , pp. 10-17
- Cho, K.¹ Ilin, A.²

40
- 84864026688
- ST Roweis, in Advances in Neural Information Processing Systems
- GW Taylor, GE Hinton, ST Roweis, in Advances in Neural Information Processing Systems. Modeling human motion using binary latent variables, (2006), pp. 1345–1352.
- (2006) Modeling human motion using binary latent variables , pp. 1345-1352
- Taylor, G.W.¹ Hinton, G.E.²

41
- 84883153580
- Y Bengio
- R Pascanu, T Mikolov, Y Bengio, On the difficulty of training recurrent neural networks. (2012).
- (2012) On the difficulty of training recurrent neural networks
- Pascanu, R.¹ Mikolov, T.²

42
- 0025475528
- ATR japanese speech database as a tool of speech recognition and synthesis
- A Kurematsu, K Takeda, Y Sagisaka, S Katagiri, H Kuwabara, K Shikano, ATR japanese speech database as a tool of speech recognition and synthesis. Speech Communication. 9(4), 357–363 (1990).
- (1990) Speech Communication , vol.9 , Issue.4 , pp. 357-363
- Kurematsu, A.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Kuwabara, H.⁵ Shikano, K.⁶

43
- 51449108867
- M Morise, T Takahashi, R Nisimura, T Irino, H Banno, in Proceedings of
- H Kawahara, M Morise, T Takahashi, R Nisimura, T Irino, H Banno, in Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Tandem-straight: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation, (2008), pp. 3933–3936.
- (2008) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Tandem-straight: a temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation , pp. 3933-3936

44
- 80052359758
- X Shao, in Proceedings of Interspeech
- B Milner, X Shao, in Proceedings of Interspeech. Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model, (2002), pp. 2421–2424.
- (2002) Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model , pp. 2421-2424
- Milner, B.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.