메뉴 건너뛰기




Volumn 23, Issue 3, 2015, Pages 580-587

Voice conversion using RNN pre-trained by recurrent temporal restricted boltzmann machines

Author keywords

Deep Learning; recurrent neural network; recurrent temporal restricted Boltzmann machine (RTRBM); speaker specific features; voice conversion

Indexed keywords

RECURRENT NEURAL NETWORKS;

EID: 84923867813     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2014.2379589     Document Type: Article
Times cited : (83)

References (42)
  • 2
    • 84865747520 scopus 로고    scopus 로고
    • Intonation conversion from neutral to expressive speech
    • C. Veaux and X. Robet, "Intonation conversion from neutral to expressive speech," in Proc. Interspeech, 2011, pp. 2765-2768.
    • Proc. Interspeech, 2011 , pp. 2765-2768
    • Veaux, C.1    Robet, X.2
  • 3
    • 80052698826 scopus 로고    scopus 로고
    • Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech
    • K. Nakamura, T. Toda, H. Saruwatari, and K. Shikano, "Speaking-aid systems using GMM-based voice conversion for electrolaryngeal speech," Speech Commun., vol. 54, no. 1, pp. 134-146, 2012.
    • (2012) Speech Commun. , vol.54 , Issue.1 , pp. 134-146
    • Nakamura, K.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 7
    • 0026880275 scopus 로고
    • Voice transformation using PSOLA technique
    • H. Valbret, E. Moulines, and J.-P. Tubach, "Voice transformation using PSOLA technique," Speech Commun., vol. 11, no. 2, pp. 175-187, 1992.
    • (1992) Speech Commun. , vol.11 , Issue.2 , pp. 175-187
    • Valbret, H.1    Moulines, E.2    Tubach, J.-P.3
  • 8
    • 0032026483 scopus 로고    scopus 로고
    • Continuous probabilistic transform for voice conversion
    • Mar.
    • Y. Stylianou, O. Cappé, and E. Moulines, "Continuous probabilistic transform for voice conversion," IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1998.
    • (1998) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
    • Stylianou, Y.1    Cappé, O.2    Moulines, E.3
  • 9
    • 44949210554 scopus 로고    scopus 로고
    • Map-based adaptation for speech conversion using adaptation data selection and non-parallel training
    • C.-H. Lee and C.-H. Wu, "MAP-based adaptation for speech conversion using adaptation data selection and non-parallel training," in Proc. Interspeech, 2006, pp. 2254-2257.
    • Proc. Interspeech, 2006 , pp. 2254-2257
    • Lee, C.-H.1    Wu, C.-H.2
  • 10
    • 34547512822 scopus 로고    scopus 로고
    • Eigenvoice conversion based on gaussian mixture model
    • T. Toda, Y. Ohtani, and K. Shikano, "Eigenvoice conversion based on gaussian mixture model," in Proc. Interspeech, 2006, pp. 2446-2449.
    • Proc. Interspeech, 2006 , pp. 2446-2449
    • Toda, T.1    Ohtani, Y.2    Shikano, K.3
  • 11
    • 84865798483 scopus 로고    scopus 로고
    • One-to-many voice conversion based on tensor representation of speaker space
    • D. Saito, K. Yamamoto, N. Minematsu, and K. Hirose, "One-to-many voice conversion based on tensor representation of speaker space," in Proc. Interspeech, 2011, pp. 653-656.
    • Proc. Interspeech, 2011 , pp. 653-656
    • Saito, D.1    Yamamoto, K.2    Minematsu, N.3    Hirose, K.4
  • 12
    • 79959834571 scopus 로고    scopus 로고
    • Probabilistic integration of joint density model and speaker model for voice conversion
    • D. Saito, S. Watanabe, A. Nakamura, and N. Minematsu, "Probabilistic integration of joint density model and speaker model for voice conversion,"in Proc. Interspeech, 2010, pp. 1728-1731.
    • Proc. Interspeech, 2010 , pp. 1728-1731
    • Saito, D.1    Watanabe, S.2    Nakamura, A.3    Minematsu, N.4
  • 15
    • 34547522070 scopus 로고    scopus 로고
    • Discriminative training for large-vocabulary speech recognition using minimum classification error
    • Jan.
    • E. McDermott, T. J. Hazen, J. Le Roux, A. Nakamura, and S. Katagiri, "Discriminative training for large-vocabulary speech recognition using minimum classification error," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 203-223, Jan. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 203-223
    • McDermott, E.1    Hazen, T.J.2    Le Roux, J.3    Nakamura, A.4    Katagiri, S.5
  • 16
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Tomoki and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. 90, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.90 , Issue.5 , pp. 816-824
    • Tomoki, T.1    Tokuda, K.2
  • 17
    • 84901793334 scopus 로고    scopus 로고
    • Minimum kullback-leibler divergence parameter generation for HMM-based speech synthesis
    • Jul.
    • Z.-H. Ling and L.-R. Dai, "Minimum kullback-leibler divergence parameter generation for HMM-based speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 5, pp. 1492-1502, Jul. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.5 , pp. 1492-1502
    • Ling, Z.-H.1    Dai, L.-R.2
  • 23
    • 0000329993 scopus 로고
    • Information processing in dynamical systems: Foundations of harmony theory
    • P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory," Parallel Distrib. Process., vol. 1, 1986.
    • (1986) Parallel Distrib. Process , vol.1
    • Smolensky, P.1
  • 24
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • G. E. Hinton, S. Osindero, and Y.-W. Teh, "A fast learning algorithm for deep belief nets," Neural Comput., vol. 18, no. 7, pp. 1527-1554, 2006.
    • (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.-W.3
  • 25
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
    • Oct.
    • Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., no. 10, pp. 2129-2139, Oct. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.10 , pp. 2129-2139
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 27
    • 78149306047 scopus 로고    scopus 로고
    • 3-d object recognition with deep belief nets
    • V. Nair and G. Hinton, "3-d object recognition with deep belief nets,"Adv. Neural Inf. Process. Syst., vol. 22, pp. 1339-1347, 2009.
    • (2009) Adv. Neural Inf. Process. Syst. , vol.22 , pp. 1339-1347
    • Nair, V.1    Hinton, G.2
  • 29
    • 84858768256 scopus 로고    scopus 로고
    • The recurrent temporal restricted Boltzmann machine
    • I. Sutskever, G. Hinton, and G. Taylor, "The recurrent temporal restricted Boltzmann machine," NIPS, vol. 19, pp. 1601-1608, 2008.
    • (2008) NIPS , vol.19 , pp. 1601-1608
    • Sutskever, I.1    Hinton, G.2    Taylor, G.3
  • 30
    • 84867129058 scopus 로고    scopus 로고
    • Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription
    • N. Boulanger-Lewandowski, Y. Bengio, and P. Vincent, "Modeling temporal dependencies in high-dimensional sequences: Application to polyphonic music generation and transcription," in Proc. Int. Conf. Mach. Learn., 2012.
    • Proc. Int. Conf. Mach. Learn., 2012
    • Boulanger-Lewandowski, N.1    Bengio, Y.2    Vincent, P.3
  • 32
    • 84906225084 scopus 로고    scopus 로고
    • Joint spectral distribution modeling using restricted boltzmann machines for voice conversion
    • L.-H. Chen, Z.-H. Ling, Y. Song, and L.-R. Dai, "Joint spectral distribution modeling using restricted Boltzmann machines for voice conversion,"in Proc. Interspeech, 2013, pp. 3052-3056.
    • Proc. Interspeech, 2013 , pp. 3052-3056
    • Chen, L.-H.1    Ling, Z.-H.2    Song, Y.3    Dai, L.-R.4
  • 35
    • 79959342724 scopus 로고    scopus 로고
    • Improved learning of gaussian-bernoulli restricted boltzmann machines
    • K. Cho, A. Ilin, and T. Raiko, "Improved learning of gaussian-bernoulli restricted Boltzmann machines," Artif. Neur. Netw. Mach. Learn., pp. 10-17, 2011.
    • (2011) Artif. Neur. Netw. Mach. Learn , pp. 10-17
    • Cho, K.1    Ilin, A.2    Raiko, T.3
  • 37
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • Nov.
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,"IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 38
    • 0025475528 scopus 로고
    • ATR japanese speech database as a tool of speech recognition and synthesis
    • A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR japanese speech database as a tool of speech recognition and synthesis," Speech Commun., vol. 9, no. 4, pp. 357-363, 1990.
    • (1990) Speech Commun. , vol.9 , Issue.4 , pp. 357-363
    • Kurematsu, A.1    Takeda, K.2    Sagisaka, Y.3    Katagiri, S.4    Kuwabara, H.5    Shikano, K.6
  • 39
    • 51449108867 scopus 로고    scopus 로고
    • TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation
    • H. Kawahara, M. Morise, T. Takahashi, R. Nisimura, T. Irino, and H. Banno, "TANDEM-STRAIGHT: A temporally stable power spectral representation for periodic signals and applications to interference-free spectrum, f0, and aperiodicity estimation," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2008, pp. 3933-3936.
    • Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2008 , pp. 3933-3936
    • Kawahara, H.1    Morise, M.2    Takahashi, T.3    Nisimura, R.4    Irino, T.5    Banno, H.6
  • 40
    • 80052359758 scopus 로고    scopus 로고
    • Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model
    • B. Milner and X. Shao, "Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model," in Proc. Interspeech, 2002, pp. 2421-2424.
    • Proc. Interspeech, 2002 , pp. 2421-2424
    • Milner, B.1    Shao, X.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.