SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2018-September, Issue , 2018, Pages 2833-2837

Investigation of using disentangled and interpretable representations for one-shot cross-lingual voice conversion

(2) Mohammadi, Seyed Hamidreza a Kim, Taehwan a

a ObEN (United States)

Author keywords

Cross lingual; One shot learning; Variational autoencoder; Voice conversion

Indexed keywords

LEARNING SYSTEMS; SPEECH COMMUNICATION;

AUTO ENCODERS; CROSS-LINGUAL; INTERPRETABLE REPRESENTATION; ONE-SHOT LEARNING; SHORT TIME FOURIER TRANSFORMS; SPEECH QUALITY; TARGET SPEAKER; VOICE CONVERSION;

SPEECH PROCESSING;

EID: 85054986055 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2018-2525 Document Type: Conference Paper

Times cited : (12)

References (33)

1
- 85046996130
- Unsupervised learning of disentangled and interpretable representations from sequential data
- W.-N. Hsu, Y. Zhang, and J. Glass, “Unsupervised learning of disentangled and interpretable representations from sequential data,” in Advances in neural information processing systems, 2017, pp. 1876-1887.
- (2017) Advances in Neural Information Processing Systems , pp. 1876-1887
- Hsu, W.-N.¹ Zhang, Y.² Glass, J.³

2
- 70349197715
- Voice transformation: A survey
- IEEE
- Y. Stylianou, “Voice transformation: a survey,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. IEEE, 2009, pp. 3585-3588.
- (2009) Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. , pp. 3585-3588
- Stylianou, Y.¹

3
- 85010399617
- An overview of voice conversion systems
- S. H. Mohammadi and A. Kain, “An overview of voice conversion systems,” Speech Communication, vol. 88, pp. 65-82, 2017.
- (2017) Speech Communication , vol.88 , pp. 65-82
- Mohammadi, S.H.¹ Kain, A.²

4
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) IEEE Transactions on Audio, Speech, and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

5
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- S. Desai, A. W. Black, B. Yegnanarayana, and K. Prahallad, “Spectral mapping using artificial neural networks for voice conversion,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 954-964, 2010.
- (2010) IEEE Transactions on Audio, Speech, and Language Processing , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.W.² Yegnanarayana, B.³ Prahallad, K.⁴

6
- 84869384026
- Mixture of factor analyzers using priors from non-parallel speech for voice conversion
- Z. Wu, T. Kinnunen, E. S. Chng, and H. Li, “Mixture of factor analyzers using priors from non-parallel speech for voice conversion,” IEEE Signal Processing Letters, vol. 19, no. 12, pp. 914-917, 2012.
- (2012) IEEE Signal Processing Letters , vol.19 , Issue.12 , pp. 914-917
- Wu, Z.¹ Kinnunen, T.² Chng, E.S.³ Li, H.⁴

7
- 85013754728
- Voice conversion from non-parallel corpora using variational auto-encoder
- IEEE
- C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Voice conversion from non-parallel corpora using variational auto-encoder,” in Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific. IEEE, 2016, pp. 1-6.
- (2016) Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific , pp. 1-6
- Hsu, C.-C.¹ Hwang, H.-T.² Wu, Y.-C.³ Tsao, Y.⁴ Wang, H.-M.⁵

8
- 84890484652
- Non-parallel training for voice conversion based on adaptation method
- IEEE
- P. Song, W. Zheng, and L. Zhao, “Non-parallel training for voice conversion based on adaptation method,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 6905-6909.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pp. 6905-6909
- Song, P.¹ Zheng, W.² Zhao, L.³

9
- 85047009420
- arXiv preprint
- C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Voice conversion from unaligned corpora using variational au-toencoding wasserstein generative adversarial networks,” arXiv preprint arXiv:1704.00849, 2017.
- (2017) Voice Conversion from Unaligned Corpora Using Variational Au-Toencoding Wasserstein Generative Adversarial Networks
- Hsu, C.-C.¹ Hwang, H.-T.² Wu, Y.-C.³ Tsao, Y.⁴ Wang, H.-M.⁵

10
- 85023740493
- Non-parallel voice conversion using i-vector plda: Towards unifying speaker verification and transformation
- IEEE
- T. Kinnunen, L. Juvela, P. Alku, and J. Yamagishi, “Non-parallel voice conversion using i-vector plda: Towards unifying speaker verification and transformation,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 5535-5539.
- (2017) Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on , pp. 5535-5539
- Kinnunen, T.¹ Juvela, L.² Alku, P.³ Yamagishi, J.⁴

11
- 85039165729
- Siamese autoencoders for speech style extraction and switching applied to voice identification and conversion
- S. H. Mohammadi and A. Kain, “Siamese autoencoders for speech style extraction and switching applied to voice identification and conversion,” Proceedings of Interspeech, pp. 1293-1297, 2017.
- (2017) Proceedings of Interspeech , pp. 1293-1297
- Mohammadi, S.H.¹ Kain, A.²

12
- 84959297010
- A multi-level gmm-based cross-lingual voice conversion using language-specific mixture weights for polyglot synthesis
- B. Ramani, M. A. Jeeva, P. Vijayalakshmi, and T. Nagarajan, “A multi-level gmm-based cross-lingual voice conversion using language-specific mixture weights for polyglot synthesis,” Circuits, Systems, and Signal Processing, vol. 35, no. 4, pp. 1283-1311, 2016.
- (2016) Circuits, Systems, and Signal Processing , vol.35 , Issue.4 , pp. 1283-1311
- Ramani, B.¹ Jeeva, M.A.² Vijayalakshmi, P.³ Nagarajan, T.⁴

13
- 70450205902
- Cross-language voice conversion based on eigenvoices
- M. Charlier, Y. Ohtani, T. Toda, A. Moinet, and T. Dutoit, “Cross-language voice conversion based on eigenvoices,” Proceedings of Interspeech, 2009.
- (2009) Proceedings of Interspeech
- Charlier, M.¹ Ohtani, Y.² Toda, T.³ Moinet, A.⁴ Dutoit, T.⁵

14
- 84919810317
- arXiv preprint
- D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
- (2013) Auto-Encoding Variational Bayes
- Kingma, D.P.¹ Welling, M.²

15
- 84937849144
- Generative adversarial nets
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672-2680.
- (2014) Advances in Neural Information Processing Systems , pp. 2672-2680
- Goodfellow, I.¹ Pouget-Abadie, J.² Mirza, M.³ Xu, B.⁴ Warde-Farley, D.⁵ Ozair, S.⁶ Courville, A.⁷ Bengio, Y.⁸

16
- 84989286061
- arXiv preprint
- A. v. d. Oord, N. Kalchbrenner, and K. Kavukcuoglu, “Pixel recurrent neural networks,” arXiv preprint arXiv:1601.06759, 2016.
- (2016) Pixel Recurrent Neural Networks
- Oord, A.¹ Kalchbrenner, N.² Kavukcuoglu, K.³

17
- 85011070895
- arXiv preprint
- A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
- (2016) Wavenet: A Generative Model for Raw Audio
- Van Den Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

18
- 85047569371
- arXiv preprint
- T. Kaneko and H. Kameoka, “Parallel-data-free voice conversion using cycle-consistent adversarial networks,” arXiv preprint arXiv:1711.11293, 2017.
- (2017) Parallel-Data-Free Voice Conversion Using Cycle-Consistent Adversarial Networks
- Kaneko, T.¹ Kameoka, H.²

19
- 85028596902
- arXiv preprint
- J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprint arXiv:1703.10593, 2017.
- (2017) Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
- Zhu, J.-Y.¹ Park, T.² Isola, P.³ Efros, A.A.⁴

20
- 84965156877
- Deep convolutional inverse graphics network
- T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum, “Deep convolutional inverse graphics network,” in Advances in Neural Information Processing Systems, 2015, pp. 2539-2547.
- (2015) Advances in Neural Information Processing Systems , pp. 2539-2547
- Kulkarni, T.D.¹ Whitney, W.F.² Kohli, P.³ Tenenbaum, J.⁴

21
- 85019228440
- Infogan: Interpretable representation learning by information maximizing generative adversarial nets
- X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems, 2016, pp. 2172-2180.
- (2016) Advances in Neural Information Processing Systems , pp. 2172-2180
- Chen, X.¹ Duan, Y.² Houthooft, R.³ Schulman, J.⁴ Sutskever, I.⁵ Abbeel, P.⁶

22
- 85047021413
- I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework,” 2016.
- (2016) Beta-Vae: Learning Basic Visual Concepts with A Constrained Variational Framework
- Higgins, I.¹ Matthey, L.² Pal, A.³ Burgess, C.⁴ Glorot, X.⁵ Botvinick, M.⁶ Mohamed, S.⁷ Lerchner, A.⁸

23
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104-3112.
- (2014) Advances in Neural Information Processing Systems , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

24
- 0003548585
- DARPA timit acoustic-phonetic continous speech corpus cd-rom. Nist speech disc 1-1.1
- J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1,” NASA STI/Recon technical report n, vol. 93, 1993.
- (1993) NASA STI/Recon Technical Report N , vol.93
- Garofolo, J.S.¹ Lamel, L.F.² Fisher, W.M.³ Fiscus, J.G.⁴ Pallett, D.S.⁵

25
- 85020204432
- arXiv preprint
- D. Wang and X. Zhang, “Thchs-30: A free chinese speech corpus,” arXiv preprint arXiv:1512.01882, 2015.
- (2015) Thchs-30: A Free Chinese Speech Corpus
- Wang, D.¹ Zhang, X.²

26
- 85054236992
- C. Veaux, J. Yamagishi, K. MacDonald et al., “Cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit,” 2017.
- (2017) Cstr Vctk Corpus: English Multi-Speaker Corpus for Cstr Voice Cloning Toolkit
- Veaux, C.¹ Yamagishi, J.² MacDonald, K.³

27
- 85090475413
- The cmu arctic speech databases
- J. Kominek and A. W. Black, “The cmu arctic speech databases,” in Fifth ISCA Workshop on Speech Synthesis, 2004.
- (2004) Fifth ISCA Workshop on Speech Synthesis
- Kominek, J.¹ Black, A.W.²

28
- 84976902575
- World: A vocoder-based high-quality speech synthesis system for real-time applications
- M. Morise, F. Yokomori, and K. Ozawa, “World: a vocoder-based high-quality speech synthesis system for real-time applications,” IEICE TRANSACTIONS on Information and Systems, vol. 99, no. 7, pp. 1877-1884, 2016.
- (2016) IEICE TRANSACTIONS on Information and Systems , vol.99 , Issue.7 , pp. 1877-1884
- Morise, M.¹ Yokomori, F.² Ozawa, K.³

29
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

30
- 84941620184
- arXiv preprint
- D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.P.¹ Ba, J.²

31
- 4444285698
- A. B. Kain, “High resolution voice transformation,” 2001.
- (2001) High Resolution Voice Transformation
- Kain, A.B.¹

32
- 85039150586
- Speaker-dependent wavenet vocoder
- A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, and T. Toda, “Speaker-dependent wavenet vocoder,” in Proceedings of Interspeech, 2017, pp. 1118-1122.
- (2017) Proceedings of Interspeech , pp. 1118-1122
- Tamamori, A.¹ Hayashi, T.² Kobayashi, K.³ Takeda, K.⁴ Toda, T.⁵

33
- 85050516344
- An investigation of multi-speaker training for wavenet vocoder
- IEEE
- T. Hayashi, A. Tamamori, K. Kobayashi, K. Takeda, and T. Toda, “An investigation of multi-speaker training for wavenet vocoder,” in Automatic Speech Recognition and Understanding Workshop (ASRU), 2017 IEEE. IEEE, 2017, pp. 712-718.
- (2017) Automatic Speech Recognition and Understanding Workshop (ASRU), 2017 IEEE , pp. 712-718
- Hayashi, T.¹ Tamamori, A.² Kobayashi, K.³ Takeda, K.⁴ Toda, T.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.