-
1
-
-
84890452886
-
Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
-
O. Abdel-Hamid and H. Jiang. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In ICASSP, 2013.
-
(2013)
ICASSP
-
-
Abdel-Hamid, O.1
Jiang, H.2
-
2
-
-
85039156048
-
Deep voice: Real-time neural text-to-speech
-
S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, J. Raiman, S. Sengupta, and M. Shoeybi. Deep voice: Real-time neural text-to-speech. In ICML, 2017.
-
(2017)
ICML
-
-
Arik, S.O.1
Chrzanowski, M.2
Coates, A.3
Diamos, G.4
Gibiansky, A.5
Kang, Y.6
Li, X.7
Miller, J.8
Raiman, J.9
Sengupta, S.10
Shoeybi, M.11
-
4
-
-
84919728106
-
-
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078, 2014.
-
(2014)
Learning Phrase Representations Using Rnn Encoder-decoder for Statistical Machine Translation
-
-
Cho, K.1
Van Merriënboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
5
-
-
84946051934
-
Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis
-
Y. Fan, Y. Qian, F. K. Soong, and L. He. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. In IEEE ICASSP, 2015.
-
(2015)
IEEE ICASSP
-
-
Fan, Y.1
Qian, Y.2
Soong, F.K.3
He, L.4
-
6
-
-
33749259827
-
Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
-
A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006.
-
(2006)
ICML
-
-
Graves, A.1
Fernández, S.2
Gomez, F.3
Schmidhuber, J.4
-
7
-
-
85047009420
-
-
C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang. Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks. arXiv:1704.00849, 2017.
-
(2017)
Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks
-
-
Hsu, C.-C.1
Hwang, H.-T.2
Wu, Y.-C.3
Tsao, Y.4
Wang, H.-M.5
-
10
-
-
84994130883
-
Neural architectures for named entity recognition
-
G. Lample, M. Ballesteros, K. Kawakami, S. Subramanian, and C. Dyer. Neural architectures for named entity recognition. In Proc. NAACL-HLT, 2016.
-
(2016)
Proc. NAACL-HLT
-
-
Lample, G.1
Ballesteros, M.2
Kawakami, K.3
Subramanian, S.4
Dyer, C.5
-
11
-
-
85047019895
-
-
C. Li, X. Ma, B. Jiang, X. Li, X. Zhang, X. Liu, Y. Cao, A. Kannan, and Z. Zhu. Deep speaker: an end-to-end neural speaker embedding system. arXiv preprint arXiv:1705.02304 2017.
-
(2017)
Deep Speaker: An End-to-end Neural Speaker Embedding System
-
-
Li, C.1
Ma, X.2
Jiang, B.3
Li, X.4
Zhang, X.5
Liu, X.6
Cao, Y.7
Kannan, A.8
Zhu, Z.9
-
12
-
-
85039166060
-
-
S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. Courville, and Y. Bengio. SampleRNN: An unconditional end-to-end neural audio generation model. arXiv:1612.07837, 2016.
-
(2016)
SampleRNN: An Unconditional End-to-end Neural Audio Generation Model
-
-
Mehri, S.1
Kumar, K.2
Gulrajani, I.3
Kumar, R.4
Jain, S.5
Sotelo, J.6
Courville, A.7
Bengio, Y.8
-
13
-
-
85011070895
-
-
A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv:1609.03499, 2016.
-
(2016)
Wavenet: A Generative Model for Raw Audio
-
-
Van Den Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
14
-
-
0033884858
-
Speaker verification using adapted Gaussian mixture models
-
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted gaussian mixture models. Digital signal processing, 10(1-3):19-41, 2000.
-
(2000)
Digital Signal Processing
, vol.10
, Issue.1-3
, pp. 19-41
-
-
Reynolds, D.A.1
Quatieri, T.F.2
Dunn, R.B.3
-
17
-
-
85018875486
-
Improved techniques for training gans
-
T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In NIPS, 2016.
-
(2016)
NIPS
-
-
Salimans, T.1
Goodfellow, I.2
Zaremba, W.3
Cheung, V.4
Radford, A.5
Chen, X.6
-
18
-
-
85122685393
-
Char2wav: End-to-end speech synthesis
-
J. Sotelo, S. Mehri, K. Kumar, J. F. Santos, K. Kastner, A. Courville, and Y. Bengio. Char2wav: End-to-end speech synthesis. In ICLR2017 workshop submission, 2017.
-
(2017)
ICLR2017 Workshop Submission
-
-
Sotelo, J.1
Mehri, S.2
Kumar, K.3
Santos, J.F.4
Kastner, K.5
Courville, A.6
Bengio, Y.7
-
19
-
-
85038442478
-
Tacotron: Towards end-to-end speech synthesis
-
Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, et al. Tacotron: Towards end-to-end speech synthesis. In Interspeech, 2017.
-
(2017)
Interspeech
-
-
Wang, Y.1
Skerry-Ryan, R.2
Stanton, D.3
Wu, Y.4
Weiss, R.J.5
Jaitly, N.6
Yang, Z.7
Xiao, Y.8
Chen, Z.9
Bengio, S.10
-
20
-
-
84959112868
-
A study of speaker adaptation for DNN-based speech synthesis
-
Z. Wu, P. Swietojanski, C. Veaux, S. Renals, and S. King. A study of speaker adaptation for DNN-based speech synthesis. In Interspeech, 2015.
-
(2015)
Interspeech
-
-
Wu, Z.1
Swietojanski, P.2
Veaux, C.3
Renals, S.4
King, S.5
-
21
-
-
85008006694
-
Robust speaker-adaptive hmm-based text-to-speech synthesis
-
J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals. Robust speaker-adaptive hmm-based text-to-speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 2009.
-
(2009)
IEEE Transactions on Audio, Speech, and Language Processing
-
-
Yamagishi, J.1
Nose, T.2
Zen, H.3
Ling, Z.-H.4
Toda, T.5
Tokuda, K.6
King, S.7
Renals, S.8
-
23
-
-
85047016414
-
Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
-
H. Zen and H. Sak. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In IEEE ICASSP, 2015.
-
(2015)
IEEE ICASSP
-
-
Zen, H.1
Sak, H.2
-
24
-
-
85047009186
-
-
H. Zen, Y. Agiomyrgiannakis, N. Egberts, F. Henderson, and P. Szczepaniak. Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices. arXiv:1606.06061, 2016.
-
(2016)
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
-
-
Zen, H.1
Agiomyrgiannakis, Y.2
Egberts, N.3
Henderson, F.4
Szczepaniak, P.5
|