SCOPUS 정보 검색 플랫폼

IEICE Transactions on Information and Systems

Volumn E97-D, Issue 6, 2014, Pages 1403-1410

Voice conversion based on speaker-dependent restricted boltzmann machines

(3) Nakashika, Toru a Takiguchi, Tetsuya a Ariki, Yasuo a

a KOBE UNIVERSITY (Japan)

Author keywords

Deep learning; Restricted boltzmann machine; Speaker individuality; Voice conversion

Indexed keywords

ABSTRACTING; SPEECH COMMUNICATION;

DEEP LEARNING; MODEL-BASED METHOD; NEURAL NETWORK (NN); OBJECTIVE CRITERIA; RESTRICTED BOLTZMANN MACHINE; SPEAKER INDIVIDUALITY; VOICE CONVERSION; VOICE CONVERSION TECHNIQUES;

SPEECH PROCESSING;

EID: 84901766069 PISSN: 09168532 EISSN: 17451361 Source Type: Journal
DOI: 10.1587/transinf.E97.D.1403 Document Type: Article

Times cited : (43)

References (39)

1
- 0031623661
- Spectral voice conversion for text-tospeech synthesis
- A. Kain and M.W. Macon, "Spectral voice conversion for text-tospeech synthesis, " Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.285-288, 1998.
- (1998) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 285-288
- Kain, A.¹ Macon, M.W.²

2
- 84865747520
- Intonation conversion from neutral to expressive speech
- C. Veaux and X. Robet, "Intonation conversion from neutral to expressive speech, " Proc. Interspeech, pp.2765-2768, 2011.
- (2011) Proc. Interspeech , pp. 2765-2768
- Veaux, C.¹ Robet, X.²

3
- 80052698826
- Speakingaid systems using gmm-based voice conversion for electrolaryngeal speech
- K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speakingaid systems using gmm-based voice conversion for electrolaryngeal speech, " Speech Commun., vol.54, no.1, pp.134-146, 2012.
- (2012) Speech Commun. , vol.54 , Issue.1 , pp. 134-146
- Nakamura, K.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

4
- 0034855352
- Highperformance robust speech recognition using stereo training data
- L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "Highperformance robust speech recognition using stereo training data, " Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.301-304, 2001.
- (2001) Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 301-304
- Deng, L.¹ Acero, A.² Jiang, L.³ Droppo, J.⁴ Huang, X.⁵

5
- 70450192197
- Speech generation from hand gestures based on space mapping
- A. Kunikoshi, Y. Qiao, N. Minematsu, and K. Hirose, "Speech generation from hand gestures based on space mapping, " Proc. Interspeech, pp.308-311, 2009.
- (2009) Proc. Interspeech , pp. 308-311
- Kunikoshi, A.¹ Qiao, Y.² Minematsu, N.³ Hirose, K.⁴

6
- 0021412027
- Vector quantization
- R. Gray, "Vector quantization, " IEEE ASSP Mag., vol.1, no.2, pp.4- 29, 1984.
- (1984) IEEE ASSP Mag. , vol.1 , Issue.2 , pp. 4-29
- Gray, R.¹

7
- 0026880275
- Voice transformation using psola technique
- H. Valbret, E. Moulines, and J.P. Tubach, "Voice transformation using psola technique, " Speech Commun., vol.11, no.2, pp.175-187, 1992.
- (1992) Speech Commun. , vol.11 , Issue.2 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.P.³

8
- 0032026483
- Continuous probabilistic transform for voice conversion
- Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion, " IEEE Trans. Speech Audio Process., vol.6, no.2, pp.131-142, 1998.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

9
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A.W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio Speech Language Process., vol.15, no.8, pp.2222-2235, 2007.
- (2007) IEEE Trans. Audio Speech Language Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

10
- 77953712499
- Voice conversion using partial least squares regression
- E. Helander, T. Virtanen, J. Nurminen, andM. Gabbouj, "Voice conversion using partial least squares regression, " IEEE Trans. Audio Speech Language Process., vol.18, no.5, pp.912-921, 2010.
- (2010) IEEE Trans. Audio Speech Language Process. , vol.18 , Issue.5 , pp. 912-921
- Helander, E.¹ Virtanen, T.² Nurminen, J.³ Gabbouj, A.⁴

11
- 44949210554
- Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
- C.H. Lee and C.H. Wu, "Map-based adaptation for speech conversion using adaptation data selection and non-parallel training, " Proc. Interspeech, pp.2254-2257, 2006.
- (2006) Proc. Interspeech , pp. 2254-2257
- Lee, C.H.¹ Wu, C.H.²

12
- 34547512822
- Eigenvoice conversion based on gaussian mixture model
- T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on gaussian mixture model, " Proc. Interspeech, pp.2446-2449, 2006.
- (2006) Proc. Interspeech , pp. 2446-2449
- Toda, T.¹ Ohtani, Y.² Shikano, K.³

13
- 84865798483
- One-tomany voice conversion based on tensor representation of speaker space
- D. Saito, K. Yamamoto, N. Minematsu, and K. Hirose, "One-tomany voice conversion based on tensor representation of speaker space, " Proc. Interspeech, pp.653-656, 2011.
- (2011) Proc. Interspeech , pp. 653-656
- Saito, D.¹ Yamamoto, K.² Minematsu, N.³ Hirose, K.⁴

14
- 79959834571
- Probabilistic integration of joint density model and speaker model for voice conversion
- D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, "Probabilistic integration of joint density model and speaker model for voice conversion, " Proc. Interspeech, pp.1728-1731, 2010.
- (2010) Proc. Interspeech , pp. 1728-1731
- Saito, D.¹ Watanabe, S.² Nakamura, A.³ Minematsu, N.⁴

15
- 35148852326
- Voice conversion using canonical correlation analysis based on gaussian mixture model
- IEEE
- Z. Jian and Z. Yang, "Voice conversion using canonical correlation analysis based on gaussian mixture model, " Proc. International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing, pp.210-215, IEEE, 2007.
- (2007) Proc. International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing , pp. 210-215
- Jian, Z.¹ Yang, Z.²

16
- 84874248255
- Exemplar-based voice conversion in noisy environment
- R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplar-based voice conversion in noisy environment, " IEEE Spoken Language Technology Workshop (SLT), pp.313-317, 2012.
- (2012) IEEE Spoken Language Technology Workshop (SLT) , pp. 313-317
- Takashima, R.¹ Takiguchi, T.² Ariki, Y.³

17
- 70349197691
- Voice conversion using artificial neural networks
- S. Desai, E.V. Raghavendra, B. Yegnanarayana, A.W. Black, and K. Prahallad, "Voice conversion using artificial neural networks, " Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.3893-3896, 2009.
- (2009) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 3893-3896
- Desai, S.¹ Raghavendra, E.V.² Yegnanarayana, B.³ Black, A.W.⁴ Prahallad, K.⁵

18
- 4544270860
- Minimum segmentation error based discriminative training for speech synthesis application
- Y.J. Wu, H. Kawai, J. Ni, and R.H. Wang, "Minimum segmentation error based discriminative training for speech synthesis application, " Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp.I-629, 2004.
- (2004) Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
- Wu, Y.J.¹ Kawai, H.² Ni, J.³ Wang, R.H.⁴

19
- 34547522070
- Discriminative training for large-vocabulary speech recognition using minimum classification error
- E. McDermott, T.J. Hazen, J. Le Roux, A. Nakamura, and S. Katagiri, "Discriminative training for large-vocabulary speech recognition using minimum classification error, " IEEE Trans. Audio Speech Language Process., vol.15, no.1, pp.203-223, 2007.
- (2007) IEEE Trans. Audio Speech Language Process. , vol.15 , Issue.1 , pp. 203-223
- McDermott, E.¹ Hazen, T.J.² Roux, J.L.³ Nakamura, A.⁴ Katagiri, S.⁵

20
- 38549096029
- A speech parameter generation algorithm considering global variance for hmm-based speech synthesis
- May
- T. Tomoki and K. Tokuda, "A speech parameter generation algorithm considering global variance for hmm-based speech synthesis, " IEICE Trans. Inf. & Syst., vol.E90-D, no.5, pp.816-824, May 2007.
- (2007) IEICE Trans. Inf. & Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Tomoki, T.¹ Tokuda, K.²

21
- 84901793334
- Minimum kullback-leibler divergence parameter generation for hmm-based speech synthesis
- Z.H. Ling and L.R. Dai, "Minimum kullback-leibler divergence parameter generation for hmm-based speech synthesis, " IEEE Trans. Audio Speech Language Process., vol.20, no.5, pp.1492-1502, 2012.
- (2012) IEEE Trans. Audio Speech Language Process , vol.20 , Issue.5 , pp. 1492-1502
- Ling, Z.H.¹ Dai, L.R.²

22
- 67650851754
- Ustc system for blizzard challenge 2006 an improved hmm-based speech synthesis method
- Z.H. Ling, Y.J.Wu, Y.P.Wang, L. Qin, and R.H.Wang, "Ustc system for blizzard challenge 2006 an improved hmm-based speech synthesis method, " Blizzard Challenge Workshop, 2006.
- (2006) Blizzard Challenge Workshop
- Ling, Z.H.¹ Wu, Y.J.² Wang, Y.P.³ Qin, L.⁴ Wang, R.H.⁵

23
- 84901803470
- Exemplarbased voice conversion using non-negative spectrogram deconvolution
- Z. Wu, T. Virtanen, T. Kinnunen, E.S. Chng, and H. Li, "Exemplarbased voice conversion using non-negative spectrogram deconvolution, " Proc. 8th ISCA Speech Synthesis Workshop, 2013.
- (2013) Proc. 8th ISCA Speech Synthesis Workshop
- Wu, Z.¹ Virtanen, T.² Kinnunen, T.³ Chng, E.S.⁴ Li, H.⁵

24
- 0000329993
- P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory, " vol.1, pp.194-281, 1986.
- (1986) Information Processing in Dynamical Systems: Foundations of Harmony Theory , vol.1 , pp. 194-281
- Smolensky, P.¹

25
- 33745805403
- A fast learning algorithm for deep belief nets
- G.E. Hinton, S. Osindero, and Y.W. Teh, "A fast learning algorithm for deep belief nets, " Neural computation, vol.18, no.7, pp.1527- 1554, 2006.
- (2006) Neural Computation , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.W.³

26
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z.H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis, " IEEE Trans. Audio Speech Language Process., no.10, pp.2129-2139, 2013.
- (2013) IEEE Trans. Audio Speech Language Process. , Issue.10 , pp. 2129-2139
- Ling, Z.H.¹ Deng, L.² Yu, D.³

27
- 84055211743
- Acoustic modeling using deep belief networks
- A.r.Mohamed, G.E. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks, " IEEE Trans. Audio Speech Language Process., vol.20, no.1, pp.14-22, 2012.
- (2012) IEEE Trans. Audio Speech Language Process. , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.R.¹ Dahl, G.E.² Hinton, G.³

28
- 78149306047
- 3-D object recognition with deep belief nets
- V. Nair and G. Hinton, "3-d object recognition with deep belief nets, " Advances in Neural Information Processing Systems, vol.22, pp.1339-1347, 2009.
- (2009) Advances in Neural Information Processing Systems , vol.22 , pp. 1339-1347
- Nair, V.¹ Hinton, G.²

29
- 84991233704
- A deep learning approach to machine transliteration
- Association for Computational Linguistics
- T. Deselaers, S. Hasan, O. Bender, and H. Ney, "A deep learning approach to machine transliteration, " Proc. Fourth Workshop on Statistical Machine Translation, pp.233-241, Association for Computational Linguistics, 2009.
- (2009) Proc. Fourth Workshop on Statistical Machine Translation , pp. 233-241
- Deselaers, T.¹ Hasan, S.² Bender, O.³ Ney, H.⁴

30
- 84906280857
- Voice conversion in high-order eigen space using deep belief nets
- T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, "Voice conversion in high-order eigen space using deep belief nets, " Proc. Interspeech, pp.369-372, 2013.
- (2013) Proc. Interspeech , pp. 369-372
- Nakashika, T.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

31
- 84889579519
- Conditional restricted boltzmann machine for voice conversion
- Z.Wu, E.S. Chng, and H. Li, "Conditional restricted boltzmann machine for voice conversion, " Proc. IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP), 2013.
- (2013) Proc. IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP)
- Wu, Z.¹ Chng, E.S.² Li, H.³

32
- 84906225084
- Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
- C. Ling-Hui, L. Zhen-Hua, S. Yan, and D. Li-Rong, "Joint spectral distribution modeling using restricted boltzmann machines for voice conversion, " Proc. Interspeech, pp.3052-3056, 2013.
- (2013) Proc. Interspeech , pp. 3052-3056
- Ling-Hui, C.¹ Zhen-Hua, L.² Yan, S.³ Li-Rong, D.⁴

33
- 0025475528
- Atr japanese speech database as a tool of speech recognition and synthesis
- A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "Atr japanese speech database as a tool of speech recognition and synthesis, " Speech Commun., no.4, pp.357-363, 1990.
- (1990) Speech Commun , Issue.4 , pp. 357-363
- Kurematsu, A.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Kuwabara, H.⁵ Shikano, K.⁶

34
- 51449108867
- Tandem-straight: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation
- H. Kawahara, M.Morise, T. Takahashi, R. Nisimura, T. Irino, and H. Banno, "Tandem-straight: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation, " Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.3933-3936, 2008.
- (2008) Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 3933-3936
- Kawahara, H.¹ Morise, M.² Takahashi, T.³ Nisimura, R.⁴ Irino, T.⁵ Banno, H.⁶

35
- 85039958911
- Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model
- B. Milner and X. Shao, "Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model., " Proc. Interspeech, 2002.
- (2002) Proc. Interspeech
- Milner, B.¹ Shao, X.²

36
- 78651276374
- PhD thesis, University of Toronto
- R. Salakhutdinov, Learning deep generative models, PhD thesis, University of Toronto, 2009.
- (2009) Learning Deep Generative Models
- Salakhutdinov, R.¹

37
- 78650474133
- A practical guide to training restricted boltzmann machines
- University of Toronto
- G. Hinton, "A practical guide to training restricted boltzmann machines, " Tech. Rep. Department of Computer Science, University of Toronto, 2010.
- (2010) Tech. Rep. Department of Computer Science
- Hinton, G.¹

38
- 78049409973
- Phone recognition using restricted boltzmann machines
- A.R. Mohamed and G. Hinton, "Phone recognition using restricted boltzmann machines, " Proc. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), pp.4354-4357, 2010.
- (2010) Proc. IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP) , pp. 4354-4357
- Mohamed, A.R.¹ Hinton, G.²

39
- 84865581203
- An analysis of gaussianbinary restricted boltzmann machines for natural images
- N. Wang, J. Melchior, and L. Wiskott, "An analysis of gaussianbinary restricted boltzmann machines for natural images, " Proc. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN), pp.287-292, 2012.
- (2012) Proc. European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning (ESANN) , pp. 287-292
- Wang, N.¹ Melchior, J.² Wiskott, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.