SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 23, Issue 3, 2015, Pages 580-587

Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines

(3) Nakashika, Toru a Takiguchi, Tetsuya a Ariki, Yasuo a

Author keywords

Deep Learning; recurrent neural network; recurrent temporal restricted Boltzmann machine (RTRBM); speaker specific features; voice conversion

Indexed keywords

RECURRENT NEURAL NETWORKS;

DEEP LEARNING; GAUSSIAN MIXTURE MODEL; NEURAL NETWORK (NN); OBJECTIVE CRITERIA; PROBABILISTIC MODELS; RESTRICTED BOLTZMANN MACHINE; SPEAKER-SPECIFIC FEATURES; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 84923867813 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2014.2379589 Document Type: Article

Times cited : (83)

References (42)

1
- 0031623661
- Spectral voice conversion for text-to-speech synthesis
- A. Kain and M. W. Macon, "Spectral voice conversion for text-to-speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1998, pp. 285-288.
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1998 , pp. 285-288
- Kain, A.¹ Macon, M.W.²

2
- 84865747520
- Intonation conversion from neutral to expressive speech
- C. Veaux and X. Robet, "Intonation conversion from neutral to expressive speech," in Proc. Interspeech, 2011, pp. 2765-2768.
- Proc. Interspeech, 2011 , pp. 2765-2768
- Veaux, C.¹ Robet, X.²

3
- 80052698826
- Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech
- K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech," Speech Commun., vol. 54, no. 1, pp. 134-146, 2012.
- (2012) Speech Commun. , vol.54 , Issue.1 , pp. 134-146
- Nakamura, K.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

4
- 0034855352
- High-performance robust speech recognition using stereo training data
- L. Deng, A. Acero, L. Jiang, J. Droppo, and X. Huang, "High-performance robust speech recognition using stereo training data," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2001, pp. 301-304.
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2001 , pp. 301-304
- Deng, L.¹ Acero, A.² Jiang, L.³ Droppo, J.⁴ Huang, X.⁵

5
- 70450192197
- Speech generation from hand gestures based on space mapping
- A. Kunikoshi, Y. Qiao, N. Minematsu, and K. Hirose, "Speech generation from hand gestures based on space mapping," in Proc. Interspeech, 2009, pp. 308-311.
- Proc. Interspeech, 2009 , pp. 308-311
- Kunikoshi, A.¹ Qiao, Y.² Minematsu, N.³ Hirose, K.⁴

6
- 0023739214
- Voice conversion through vector quantization
- M. Abe, S. Nakamura, K. Shikano, and H. Kuwabara, "Voice conversion through vector quantization," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1988, pp. 655-658.
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 1988 , pp. 655-658
- Abe, M.¹ Nakamura, S.² Shikano, K.³ Kuwabara, H.⁴

7
- 0026880275
- Voice transformation using PSOLA technique
- H. Valbret, E. Moulines, and J.-P. Tubach, "Voice transformation using PSOLA technique," Speech Commun., vol. 11, no. 2, pp. 175-187, 1992.
- (1992) Speech Commun. , vol.11 , Issue.2 , pp. 175-187
- Valbret, H.¹ Moulines, E.² Tubach, J.-P.³

8
- 0032026483
- Continuous probabilistic transform for voice conversion
- Mar.
- Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
- (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

9
- 44949210554
- Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
- C.-H. Lee and C.-H. Wu, "MAP-based adaptation for speech conversion using adaptation data selection and non-parallel training," in Proc. Interspeech, 2006, pp. 2254-2257.
- Proc. Interspeech, 2006 , pp. 2254-2257
- Lee, C.-H.¹ Wu, C.-H.²

10
- 34547512822
- Eigenvoice conversion based on gaussian mixture model
- T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on gaussian mixture model," in Proc. Interspeech, 2006, pp. 2446-2449.
- Proc. Interspeech, 2006 , pp. 2446-2449
- Toda, T.¹ Ohtani, Y.² Shikano, K.³

11
- 84865798483
- One-to-many voice conversion based on tensor representation of speaker space
- D. Saito, K. Yamamoto, N. Minematsu, and K. Hirose, "One-to-many voice conversion based on tensor representation of speaker space," in Proc. Interspeech, 2011, pp. 653-656.
- Proc. Interspeech, 2011 , pp. 653-656
- Saito, D.¹ Yamamoto, K.² Minematsu, N.³ Hirose, K.⁴

12
- 79959834571
- Probabilistic integration of joint density model and speaker model for voice conversion
- D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, "Probabilistic integration of joint density model and speaker model for voice conversion,"in Proc. Interspeech, 2010, pp. 1728-1731.
- Proc. Interspeech, 2010 , pp. 1728-1731
- Saito, D.¹ Watanabe, S.² Nakamura, A.³ Minematsu, N.⁴

13
- 70349197691
- Voice conversion using artificial neural networks
- S. Desai, E. V. Raghavendra, B. Yegnanarayana, A. W. Black, and K. Prahallad, "Voice conversion using artificial neural networks," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009, pp. 3893-3896.
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2009 , pp. 3893-3896
- Desai, S.¹ Raghavendra, E.V.² Yegnanarayana, B.³ Black, A.W.⁴ Prahallad, K.⁵

14
- 4544270860
- Minimum segmentation error based discriminative training for speech synthesis application
- Y.-J. Wu, H. Kawai, J. Ni, and R.-H. Wang, "Minimum segmentation error based discriminative training for speech synthesis application,"in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2004, pp. 629-632.
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2004 , pp. 629-632
- Wu, Y.-J.¹ Kawai, H.² Ni, J.³ Wang, R.-H.⁴

15
- 34547522070
- Discriminative training for large-vocabulary speech recognition using minimum classification error
- Jan.
- E. McDermott, T. J. Hazen, J. Le Roux, A. Nakamura, and S. Katagiri, "Discriminative training for large-vocabulary speech recognition using minimum classification error," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 203-223, Jan. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 203-223
- McDermott, E.¹ Hazen, T.J.² Le Roux, J.³ Nakamura, A.⁴ Katagiri, S.⁵

16
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Tomoki and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. 90, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.90 , Issue.5 , pp. 816-824
- Tomoki, T.¹ Tokuda, K.²

17
- 84901793334
- Minimum kullback-leibler divergence parameter generation for HMM-based speech synthesis
- Jul.
- Z.-H. Ling and L.-R. Dai, "Minimum kullback-leibler divergence parameter generation for HMM-based speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 5, pp. 1492-1502, Jul. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.5 , pp. 1492-1502
- Ling, Z.-H.¹ Dai, L.-R.²

18
- 67650851754
- USTC system for blizzard challenge 2006. An improved HMM-based speech synthesis method
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC system for blizzard challenge 2006. An improved HMM-based speech synthesis method," in Proc. Blizzard Challenge Workshop, 2006.
- Proc. Blizzard Challenge Workshop, 2006
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

19
- 35148852326
- Voice conversion using canonical correlation analysis based on gaussian mixture model
- Z. Jian and Z. Yang, "Voice conversion using canonical correlation analysis based on gaussian mixture model," in Proc. 8thACIS Int. Conf. IEEE Software Eng., Artif. Intell., Netw., Parallel/Distrib. Comput. (SNPD '07), 2007, pp. 210-215.
- Proc. 8thACIS Int. Conf. IEEE Software Eng., Artif. Intell., Netw., Parallel/Distrib. Comput. (SNPD '07), 2007 , pp. 210-215
- Jian, Z.¹ Yang, Z.²

20
- 84874248255
- Exemplar-based voice conversion in noisy environment
- R. Takashima, T. Takiguchi, and Y. Ariki, "Exemplar-based voice conversion in noisy environment," in Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2012, pp. 313-317.
- Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2012 , pp. 313-317
- Takashima, R.¹ Takiguchi, T.² Ariki, Y.³

21
- 84901803470
- Exemplar-based voice conversion using non-negative spectrogram deconvolution
- Z. Wu, T. Virtanen, T. Kinnunen, E. S. Chng, and H. Li, "Exemplar-based voice conversion using non-negative spectrogram deconvolution,"in Proc. 8th ISCA Speech Synth. Workshop, 2013.
- Proc. 8th ISCA Speech Synth. Workshop, 2013
- Wu, Z.¹ Virtanen, T.² Kinnunen, T.³ Chng, E.S.⁴ Li, H.⁵

22
- 84906280857
- Voice conversion in high-order eigen space using deep belief nets
- T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, "Voice conversion in high-order eigen space using deep belief nets," in Proc. Interspeech, 2013, pp. 369-372.
- Proc. Interspeech, 2013 , pp. 369-372
- Nakashika, T.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

23
- 0000329993
- Information processing in dynamical systems: Foundations of harmony theory
- P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," Parallel Distrib. Process., vol. 1, 1986.
- (1986) Parallel Distrib. Process , vol.1
- Smolensky, P.¹

24
- 33745805403
- A fast learning algorithm for deep belief nets
- G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
- (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
- Hinton, G.E.¹ Osindero, S.² Teh, Y.-W.³

25
- 84901237776
- Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Oct.
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., no. 10, pp. 2129-2139, Oct. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

26
- 84055211743
- Acoustic modeling using deep belief networks
- Jan.
- A.-r. Mohamed, G. E. Dahl, and G. Hinton, "Acoustic modeling using deep belief networks," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 14-22, Jan. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 14-22
- Mohamed, A.-R.¹ Dahl, G.E.² Hinton, G.³

27
- 78149306047
- 3-d object recognition with deep belief nets
- V. Nair and G. Hinton, "3-d object recognition with deep belief nets,"Adv. Neural Inf. Process. Syst., vol. 22, pp. 1339-1347, 2009.
- (2009) Adv. Neural Inf. Process. Syst. , vol.22 , pp. 1339-1347
- Nair, V.¹ Hinton, G.²

28
- 84991233704
- A deep learning approach to machine transliteration
- T. Deselaers, S. Hasan, O. Bender, and H. Ney, "A deep learning approach to machine transliteration," in Proc. 4th Workshop Statist. Mach. Translat. Assoc. Comput. Linguist., 2009, pp. 233-241.
- Proc. 4th Workshop Statist. Mach. Translat. Assoc. Comput. Linguist., 2009 , pp. 233-241
- Deselaers, T.¹ Hasan, S.² Bender, O.³ Ney, H.⁴

29
- 84858768256
- The recurrent temporal restricted Boltzmann machine
- I. Sutskever, G. Hinton, and G. Taylor, "The recurrent temporal restricted Boltzmann machine," NIPS, vol. 19, pp. 1601-1608, 2008.
- (2008) NIPS , vol.19 , pp. 1601-1608
- Sutskever, I.¹ Hinton, G.² Taylor, G.³

30
- 84867129058
- Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription
- N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, "Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription," in Proc. Int. Conf. Mach. Learn., 2012.
- Proc. Int. Conf. Mach. Learn., 2012
- Boulanger-Lewandowski, N.¹ Bengio, Y.² Vincent, P.³

31
- 84889579519
- Conditional restricted Boltzmann machine for voice conversion
- Z. Wu, E. S. Chng, and H. Li, "Conditional restricted Boltzmann machine for voice conversion," in Proc. IEEE China Summit and Int. Conf. Signal Inf. Process. (ChinaSIP), 2013, pp. 104-108.
- Proc. IEEE China Summit and Int. Conf. Signal Inf. Process. (ChinaSIP), 2013 , pp. 104-108
- Wu, Z.¹ Chng, E.S.² Li, H.³

32
- 84906225084
- Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
- L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion,"in Proc. Interspeech, 2013, pp. 3052-3056.
- Proc. Interspeech, 2013 , pp. 3052-3056
- Chen, L.-H.¹ Ling, Z.-H.² Song, Y.³ Dai, L.-R.⁴

33
- 84864026688
- Modeling human motion using binary latent variables
- G. W. Taylor, G. E. Hinton, and S. T. Roweis, "Modeling human motion using binary latent variables," Adv. Neural Inf. Process. Syst., pp. 1345-1352, 2006.
- (2006) Adv. Neural Inf. Process. Syst , pp. 1345-1352
- Taylor, G.W.¹ Hinton, G.E.² Roweis, S.T.³

34
- 56449085852
- Santa Cruz, CA, USA: Computer Research Laboratory
- Y. Freund and D. Haussler, Unsupervised Learning of Distributions of Binary Vectors Using Two Layer Networks. Santa Cruz, CA, USA: Computer Research Laboratory, 1994.
- (1994) Unsupervised Learning of Distributions of Binary Vectors Using Two Layer Networks
- Freund, Y.¹ Haussler, D.²

35
- 79959342724
- Improved learning of gaussian-bernoulli restricted boltzmann machines
- K. Cho, A. Ilin, and T. Raiko, "Improved learning of gaussian-bernoulli restricted Boltzmann machines," Artif. Neur. Netw. Mach. Learn., pp. 10-17, 2011.
- (2011) Artif. Neur. Netw. Mach. Learn , pp. 10-17
- Cho, K.¹ Ilin, A.² Raiko, T.³

36
- 84883153580
- arXiv preprint arXiv:1211.5063
- R. Pascanu, T. Mikolov, and Y. Bengio, "On the difficulty of training recurrent neural networks," arXiv preprint arXiv:1211.5063, 2012.
- (2012) On the Difficulty of Training Recurrent Neural Networks
- Pascanu, R.¹ Mikolov, T.² Bengio, Y.³

37
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- Nov.
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,"IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

38
- 0025475528
- ATR japanese speech database as a tool of speech recognition and synthesis
- A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR japanese speech database as a tool of speech recognition and synthesis," Speech Commun., vol. 9, no. 4, pp. 357-363, 1990.
- (1990) Speech Commun. , vol.9 , Issue.4 , pp. 357-363
- Kurematsu, A.¹ Takeda, K.² Sagisaka, Y.³ Katagiri, S.⁴ Kuwabara, H.⁵ Shikano, K.⁶

39
- 51449108867
- TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation
- H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, and H. Banno, "TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2008, pp. 3933-3936.
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2008 , pp. 3933-3936
- Kawahara, H.¹ Morise, M.² Takahashi, T.³ Nisimura, R.⁴ Irino, T.⁵ Banno, H.⁶

40
- 80052359758
- Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model
- B. Milner and X. Shao, "Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model," in Proc. Interspeech, 2002, pp. 2421-2424.
- Proc. Interspeech, 2002 , pp. 2421-2424
- Milner, B.¹ Shao, X.²

41
- 77956002520
- Comput. Sci. Dept., Univ. of Toronto, Toronto, ON, USA, Tech. Rep
- A. Krizhevsky and G. Hinton, "Learning multiple layers of features from tiny images," Comput. Sci. Dept., Univ. of Toronto, Toronto, ON, USA, Tech. Rep., 2009.
- (2009) Learning Multiple Layers of Features from Tiny images
- Krizhevsky, A.¹ Hinton, G.²

42
- 84905252390
- Voice conversion in time-invariant speaker-independent space
- T. Nakashika, T. Takiguchi, and Y. Ariki, "Voice conversion in time-invariant speaker-independent space," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014, pp. 7939-7943.
- Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2014 , pp. 7939-7943
- Nakashika, T.¹ Takiguchi, T.² Ariki, Y.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.