-
1
-
-
85046996130
-
Unsupervised learning of disentangled and interpretable representations from sequential data
-
W.-N. Hsu, Y. Zhang, and J. Glass, “Unsupervised learning of disentangled and interpretable representations from sequential data,” in Advances in neural information processing systems, 2017, pp. 1876-1887.
-
(2017)
Advances in Neural Information Processing Systems
, pp. 1876-1887
-
-
Hsu, W.-N.1
Zhang, Y.2
Glass, J.3
-
2
-
-
70349197715
-
Voice transformation: A survey
-
IEEE
-
Y. Stylianou, “Voice transformation: a survey,” in Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on. IEEE, 2009, pp. 3585-3588.
-
(2009)
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on.
, pp. 3585-3588
-
-
Stylianou, Y.1
-
3
-
-
85010399617
-
An overview of voice conversion systems
-
S. H. Mohammadi and A. Kain, “An overview of voice conversion systems,” Speech Communication, vol. 88, pp. 65-82, 2017.
-
(2017)
Speech Communication
, vol.88
, pp. 65-82
-
-
Mohammadi, S.H.1
Kain, A.2
-
4
-
-
57749193836
-
Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
-
T. Toda, A. W. Black, and K. Tokuda, “Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
-
(2007)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.15
, Issue.8
, pp. 2222-2235
-
-
Toda, T.1
Black, A.W.2
Tokuda, K.3
-
5
-
-
77953707533
-
Spectral mapping using artificial neural networks for voice conversion
-
S. Desai, A. W. Black, B. Yegnanarayana, and K. Prahallad, “Spectral mapping using artificial neural networks for voice conversion,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 18, no. 5, pp. 954-964, 2010.
-
(2010)
IEEE Transactions on Audio, Speech, and Language Processing
, vol.18
, Issue.5
, pp. 954-964
-
-
Desai, S.1
Black, A.W.2
Yegnanarayana, B.3
Prahallad, K.4
-
6
-
-
84869384026
-
Mixture of factor analyzers using priors from non-parallel speech for voice conversion
-
Z. Wu, T. Kinnunen, E. S. Chng, and H. Li, “Mixture of factor analyzers using priors from non-parallel speech for voice conversion,” IEEE Signal Processing Letters, vol. 19, no. 12, pp. 914-917, 2012.
-
(2012)
IEEE Signal Processing Letters
, vol.19
, Issue.12
, pp. 914-917
-
-
Wu, Z.1
Kinnunen, T.2
Chng, E.S.3
Li, H.4
-
7
-
-
85013754728
-
Voice conversion from non-parallel corpora using variational auto-encoder
-
IEEE
-
C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Voice conversion from non-parallel corpora using variational auto-encoder,” in Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific. IEEE, 2016, pp. 1-6.
-
(2016)
Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2016 Asia-Pacific
, pp. 1-6
-
-
Hsu, C.-C.1
Hwang, H.-T.2
Wu, Y.-C.3
Tsao, Y.4
Wang, H.-M.5
-
8
-
-
84890484652
-
Non-parallel training for voice conversion based on adaptation method
-
IEEE
-
P. Song, W. Zheng, and L. Zhao, “Non-parallel training for voice conversion based on adaptation method,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 6905-6909.
-
(2013)
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
, pp. 6905-6909
-
-
Song, P.1
Zheng, W.2
Zhao, L.3
-
9
-
-
85047009420
-
-
arXiv preprint
-
C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang, “Voice conversion from unaligned corpora using variational au-toencoding wasserstein generative adversarial networks,” arXiv preprint arXiv:1704.00849, 2017.
-
(2017)
Voice Conversion from Unaligned Corpora Using Variational Au-Toencoding Wasserstein Generative Adversarial Networks
-
-
Hsu, C.-C.1
Hwang, H.-T.2
Wu, Y.-C.3
Tsao, Y.4
Wang, H.-M.5
-
10
-
-
85023740493
-
Non-parallel voice conversion using i-vector plda: Towards unifying speaker verification and transformation
-
IEEE
-
T. Kinnunen, L. Juvela, P. Alku, and J. Yamagishi, “Non-parallel voice conversion using i-vector plda: Towards unifying speaker verification and transformation,” in Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 2017, pp. 5535-5539.
-
(2017)
Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on
, pp. 5535-5539
-
-
Kinnunen, T.1
Juvela, L.2
Alku, P.3
Yamagishi, J.4
-
11
-
-
85039165729
-
Siamese autoencoders for speech style extraction and switching applied to voice identification and conversion
-
S. H. Mohammadi and A. Kain, “Siamese autoencoders for speech style extraction and switching applied to voice identification and conversion,” Proceedings of Interspeech, pp. 1293-1297, 2017.
-
(2017)
Proceedings of Interspeech
, pp. 1293-1297
-
-
Mohammadi, S.H.1
Kain, A.2
-
12
-
-
84959297010
-
A multi-level gmm-based cross-lingual voice conversion using language-specific mixture weights for polyglot synthesis
-
B. Ramani, M. A. Jeeva, P. Vijayalakshmi, and T. Nagarajan, “A multi-level gmm-based cross-lingual voice conversion using language-specific mixture weights for polyglot synthesis,” Circuits, Systems, and Signal Processing, vol. 35, no. 4, pp. 1283-1311, 2016.
-
(2016)
Circuits, Systems, and Signal Processing
, vol.35
, Issue.4
, pp. 1283-1311
-
-
Ramani, B.1
Jeeva, M.A.2
Vijayalakshmi, P.3
Nagarajan, T.4
-
13
-
-
70450205902
-
Cross-language voice conversion based on eigenvoices
-
M. Charlier, Y. Ohtani, T. Toda, A. Moinet, and T. Dutoit, “Cross-language voice conversion based on eigenvoices,” Proceedings of Interspeech, 2009.
-
(2009)
Proceedings of Interspeech
-
-
Charlier, M.1
Ohtani, Y.2
Toda, T.3
Moinet, A.4
Dutoit, T.5
-
15
-
-
84937849144
-
Generative adversarial nets
-
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in neural information processing systems, 2014, pp. 2672-2680.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 2672-2680
-
-
Goodfellow, I.1
Pouget-Abadie, J.2
Mirza, M.3
Xu, B.4
Warde-Farley, D.5
Ozair, S.6
Courville, A.7
Bengio, Y.8
-
17
-
-
85011070895
-
-
arXiv preprint
-
A. Van Den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, “Wavenet: A generative model for raw audio,” arXiv preprint arXiv:1609.03499, 2016.
-
(2016)
Wavenet: A Generative Model for Raw Audio
-
-
Van Den Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
19
-
-
85028596902
-
-
arXiv preprint
-
J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” arXiv preprint arXiv:1703.10593, 2017.
-
(2017)
Unpaired Image-to-Image Translation Using Cycle-Consistent Adversarial Networks
-
-
Zhu, J.-Y.1
Park, T.2
Isola, P.3
Efros, A.A.4
-
20
-
-
84965156877
-
Deep convolutional inverse graphics network
-
T. D. Kulkarni, W. F. Whitney, P. Kohli, and J. Tenenbaum, “Deep convolutional inverse graphics network,” in Advances in Neural Information Processing Systems, 2015, pp. 2539-2547.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 2539-2547
-
-
Kulkarni, T.D.1
Whitney, W.F.2
Kohli, P.3
Tenenbaum, J.4
-
21
-
-
85019228440
-
Infogan: Interpretable representation learning by information maximizing generative adversarial nets
-
X. Chen, Y. Duan, R. Houthooft, J. Schulman, I. Sutskever, and P. Abbeel, “Infogan: Interpretable representation learning by information maximizing generative adversarial nets,” in Advances in Neural Information Processing Systems, 2016, pp. 2172-2180.
-
(2016)
Advances in Neural Information Processing Systems
, pp. 2172-2180
-
-
Chen, X.1
Duan, Y.2
Houthooft, R.3
Schulman, J.4
Sutskever, I.5
Abbeel, P.6
-
22
-
-
85047021413
-
-
I. Higgins, L. Matthey, A. Pal, C. Burgess, X. Glorot, M. Botvinick, S. Mohamed, and A. Lerchner, “beta-vae: Learning basic visual concepts with a constrained variational framework,” 2016.
-
(2016)
Beta-Vae: Learning Basic Visual Concepts with A Constrained Variational Framework
-
-
Higgins, I.1
Matthey, L.2
Pal, A.3
Burgess, C.4
Glorot, X.5
Botvinick, M.6
Mohamed, S.7
Lerchner, A.8
-
23
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural networks,” in Advances in neural information processing systems, 2014, pp. 3104-3112.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
24
-
-
0003548585
-
DARPA timit acoustic-phonetic continous speech corpus cd-rom. Nist speech disc 1-1.1
-
J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, and D. S. Pallett, “Darpa timit acoustic-phonetic continous speech corpus cd-rom. nist speech disc 1-1.1,” NASA STI/Recon technical report n, vol. 93, 1993.
-
(1993)
NASA STI/Recon Technical Report N
, vol.93
-
-
Garofolo, J.S.1
Lamel, L.F.2
Fisher, W.M.3
Fiscus, J.G.4
Pallett, D.S.5
-
28
-
-
84976902575
-
World: A vocoder-based high-quality speech synthesis system for real-time applications
-
M. Morise, F. Yokomori, and K. Ozawa, “World: a vocoder-based high-quality speech synthesis system for real-time applications,” IEICE TRANSACTIONS on Information and Systems, vol. 99, no. 7, pp. 1877-1884, 2016.
-
(2016)
IEICE TRANSACTIONS on Information and Systems
, vol.99
, Issue.7
, pp. 1877-1884
-
-
Morise, M.1
Yokomori, F.2
Ozawa, K.3
-
29
-
-
0031573117
-
Long short-term memory
-
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Computation
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
32
-
-
85039150586
-
Speaker-dependent wavenet vocoder
-
A. Tamamori, T. Hayashi, K. Kobayashi, K. Takeda, and T. Toda, “Speaker-dependent wavenet vocoder,” in Proceedings of Interspeech, 2017, pp. 1118-1122.
-
(2017)
Proceedings of Interspeech
, pp. 1118-1122
-
-
Tamamori, A.1
Hayashi, T.2
Kobayashi, K.3
Takeda, K.4
Toda, T.5
-
33
-
-
85050516344
-
An investigation of multi-speaker training for wavenet vocoder
-
IEEE
-
T. Hayashi, A. Tamamori, K. Kobayashi, K. Takeda, and T. Toda, “An investigation of multi-speaker training for wavenet vocoder,” in Automatic Speech Recognition and Understanding Workshop (ASRU), 2017 IEEE. IEEE, 2017, pp. 712-718.
-
(2017)
Automatic Speech Recognition and Understanding Workshop (ASRU), 2017 IEEE
, pp. 712-718
-
-
Hayashi, T.1
Tamamori, A.2
Kobayashi, K.3
Takeda, K.4
Toda, T.5
|