-
1
-
-
85064830948
-
Fast speaker adaptation of hybrid nn/hmm model for speech recognition based on discriminative learning of speaker code
-
O. Abdel-Hamid and H. Jiang. Fast speaker adaptation of hybrid nn/hmm model for speech recognition based on discriminative learning of speaker code. In IEEE ICASSP, 2013.
-
(2013)
IEEE ICASSP
-
-
Abdel-Hamid, O.1
Jiang, H.2
-
2
-
-
85064816663
-
Voice morphing that improves tts quality using an optimal dynamic frequency warping-and-weighting transform
-
Y. Agiomyrgiannakis and Z. Roupakia. Voice morphing that improves tts quality using an optimal dynamic frequency warping-and-weighting transform. IEEE ICASSP, 2016.
-
(2016)
IEEE ICASSP
-
-
Agiomyrgiannakis, Y.1
Roupakia, Z.2
-
3
-
-
84971463350
-
Deep speech 2: End-to-end speech recognition in english and Mandarin
-
D. Amodei, S. Ananthanarayanan, R. Anubhai, J. Bai, E. Battenberg, C. Case, J. Casper, B. Catanzaro, Q. Cheng, G. Chen, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. In International Conference on Machine Learning, pages 173-182, 2016.
-
(2016)
International Conference on Machine Learning
, pp. 173-182
-
-
Amodei, D.1
Ananthanarayanan, S.2
Anubhai, R.3
Bai, J.4
Battenberg, E.5
Case, C.6
Casper, J.7
Catanzaro, B.8
Cheng, Q.9
Chen, G.10
-
4
-
-
85039156048
-
Deep voice: Real-time neural text-to-speech
-
S. Ö. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, J. Raiman, S. Sengupta, and M. Shoeybi. Deep Voice: Real-time neural text-to-speech. In ICML, 2017a.
-
(2017)
ICML
-
-
Arik, S.Ö.1
Chrzanowski, M.2
Coates, A.3
Diamos, G.4
Gibiansky, A.5
Kang, Y.6
Li, X.7
Miller, J.8
Raiman, J.9
Sengupta, S.10
Shoeybi, M.11
-
5
-
-
85046637415
-
Deep voice 2: Multi-speaker neural text-to-speech
-
S. Ö. Arik, G. F. Diamos, A. Gibiansky, J. Miller, K. Peng, W. Ping, J. Raiman, and Y. Zhou. Deep Voice 2: Multi-speaker neural text-to-speech. In NIPS, pages 2966-2974, 2017b.
-
(2017)
NIPS
, pp. 2966-2974
-
-
Arik, S.Ö.1
Diamos, G.F.2
Gibiansky, A.3
Miller, J.4
Peng, K.5
Ping, W.6
Raiman, J.7
Zhou, Y.8
-
6
-
-
85064836326
-
Multi-content gan for few-shot font style transfer
-
abs/1708.02182
-
S. Azadi, M. Fisher, V. Kim, Z. Wang, E. Shechtman, and T. Darrell. Multi-content gan for few-shot font style transfer. CoRR, abs/1708.02182, 2017.
-
(2017)
CoRR
-
-
Azadi, S.1
Fisher, M.2
Kim, V.3
Wang, Z.4
Shechtman, E.5
Darrell, T.6
-
7
-
-
84921735339
-
Voice conversion using deep neural networks with layer-wise generative training
-
L. H. Chen, Z. H. Ling, L. J. Liu, and L. R. Dai. Voice conversion using deep neural networks with layer-wise generative training. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014.
-
(2014)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
-
-
Chen, L.H.1
Ling, Z.H.2
Liu, L.J.3
Dai, L.R.4
-
9
-
-
77953707533
-
Spectral mapping using artificial neural networks for voice conversion
-
S. Desai, A. W. Black, B. Yegnanarayana, and K. Prahallad. Spectral mapping using artificial neural networks for voice conversion. IEEE Transactions on Audio, Speech, and Language Processing, 2010.
-
(2010)
IEEE Transactions on Audio, Speech, and Language Processing
-
-
Desai, S.1
Black, A.W.2
Yegnanarayana, B.3
Prahallad, K.4
-
10
-
-
84986185211
-
A probabilistic interpretation for artificial neural network-based voice conversion
-
H. T. Hwang, Y. Tsao, H. M. Wang, Y. R. Wang, and S. H. Chen. A probabilistic interpretation for artificial neural network-based voice conversion. In 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2015.
-
(2015)
2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)
-
-
Hwang, H.T.1
Tsao, Y.2
Wang, H.M.3
Wang, Y.R.4
Chen, S.H.5
-
11
-
-
84978840213
-
-
arXiv preprint
-
R. Jozefowicz, O. Vinyals, M. Schuster, N. Shazeer, and Y. Wu. Exploring the limits of language modeling. arXiv preprint arXiv:1602.02410, 2016.
-
(2016)
Exploring the Limits of Language Modeling
-
-
Jozefowicz, R.1
Vinyals, O.2
Schuster, M.3
Shazeer, N.4
Wu, Y.5
-
12
-
-
85049871154
-
Progressive growing of gans for improved quality, stability, and variation
-
abs/1710.10196
-
T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive growing of gans for improved quality, stability, and variation. CoRR, abs/1710.10196, 2017.
-
(2017)
CoRR
-
-
Karras, T.1
Aila, T.2
Laine, S.3
Lehtinen, J.4
-
13
-
-
84898998554
-
One-shot learning by inverting a compositional causal process
-
B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. One-shot learning by inverting a compositional causal process. In NIPS, 2013.
-
(2013)
NIPS
-
-
Lake, B.M.1
Salakhutdinov, R.2
Tenenbaum, J.B.3
-
15
-
-
84949683101
-
Human-level concept learning through probabilistic program induction
-
B. M. Lake, R. Salakhutdinov, and J. B. Tenenbaum. Human-level concept learning through probabilistic program induction. Science, 2015.
-
(2015)
Science
-
-
Lake, B.M.1
Salakhutdinov, R.2
Tenenbaum, J.B.3
-
16
-
-
84959173377
-
Modeling speaker variability using long short-term memory networks for speech recognition
-
X. Li and X. Wu. Modeling speaker variability using long short-term memory networks for speech recognition. In INTERSPEECH, 2015.
-
(2015)
INTERSPEECH
-
-
Li, X.1
Wu, X.2
-
17
-
-
85039166060
-
-
arXiv preprint
-
S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. Courville, and Y. Bengio. Samplernn: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837, 2016.
-
(2016)
Samplernn: An Unconditional End-to-End Neural Audio Generation Model
-
-
Mehri, S.1
Kumar, K.2
Gulrajani, I.3
Kumar, R.4
Jain, S.5
Sotelo, J.6
Courville, A.7
Bengio, Y.8
-
19
-
-
84938688160
-
Speaker adaptive training of deep neural network acoustic models using i-vectors
-
Y. Miao, H. Zhang, and F. Metze. Speaker adaptive training of deep neural network acoustic models using i-vectors. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2015.
-
(2015)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
-
-
Miao, Y.1
Zhang, H.2
Metze, F.3
-
20
-
-
85011070895
-
-
arXiv preprint
-
A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016a.
-
(2016)
Wavenet: A Generative Model for Raw Audio
-
-
Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
21
-
-
85018873682
-
Conditional image generation with pixelcnn decoders
-
A. v. d. Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et al. Conditional image generation with pixelcnn decoders. In Advances in Neural Information Processing Systems, 2016b.
-
(2016)
Advances in Neural Information Processing Systems
-
-
Oord, A.1
Kalchbrenner, N.2
Espeholt, L.3
Vinyals, O.4
Graves, A.5
-
23
-
-
85083953940
-
Deep voice 3: Scaling text-to-speech with convolutional sequence learning
-
W. Ping, K. Peng, A. Gibiansky, S. Arik, A. Kannan, S. Narang, J. Raiman, and J. Miller. Deep Voice 3: Scaling text-to-speech with convolutional sequence learning. In ICLR, 2018.
-
(2018)
ICLR
-
-
Ping, W.1
Peng, K.2
Gibiansky, A.3
Arik, S.4
Kannan, A.5
Narang, S.6
Raiman, J.7
Miller, J.8
-
24
-
-
50649094277
-
Probabilistic linear discriminant analysis for inferences about identity
-
S. Prince and J. Elder. Probabilistic linear discriminant analysis for inferences about identity. In ICCV, 2007.
-
(2007)
ICCV
-
-
Prince, S.1
Elder, J.2
-
25
-
-
85064821886
-
Few-shot autoregressive density estimation: Towards learning to learn distributions
-
S. E. Reed, Y. Chen, T. Paine, A. van den Oord, S. M. A. Eslami, D. J. Rezende, O. Vinyals, and N. de Freitas. Few-shot autoregressive density estimation: Towards learning to learn distributions. CoRR, 2017.
-
(2017)
CoRR
-
-
Reed, S.E.1
Chen, Y.2
Paine, T.3
Van Den Oord, A.4
Eslami, S.M.A.5
Rezende, D.J.6
Vinyals, O.7
De Freitas, N.8
-
27
-
-
85049199993
-
-
arXiv preprint
-
J. Shen, R. Pang, R. J. Weiss, M. Schuster, N. Jaitly, Z. Yang, Z. Chen, Y. Zhang, Y. Wang, R. Skerry-Ryan, et al. Natural tts synthesis by conditioning wavenet on mel spectrogram predictions. arXiv preprint arXiv:1712.05884, 2017.
-
(2017)
Natural Tts Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions
-
-
Shen, J.1
Pang, R.2
Weiss, R.J.3
Schuster, M.4
Jaitly, N.5
Yang, Z.6
Chen, Z.7
Zhang, Y.8
Wang, Y.9
Skerry-Ryan, R.10
-
28
-
-
85015988152
-
Deep neural network-based speaker embeddings for end-to-end speaker verification
-
D. Snyder, P. Ghahremani, D. Povey, D. Garcia-Romero, Y. Carmiel, and S. Khudanpur. Deep neural network-based speaker embeddings for end-to-end speaker verification. In IEEE Spoken Language Technology Workshop (SLT), pages 165-170, 2016.
-
(2016)
IEEE Spoken Language Technology Workshop (SLT)
, pp. 165-170
-
-
Snyder, D.1
Ghahremani, P.2
Povey, D.3
Garcia-Romero, D.4
Carmiel, Y.5
Khudanpur, S.6
-
29
-
-
85039169377
-
-
J. Sotelo, S. Mehri, K. Kumar, J. F. Santos, K. Kastner, A. Courville, and Y. Bengio. Char2wav: End-to-end speech synthesis. 2017.
-
(2017)
Char2wav: End-to-End Speech Synthesis
-
-
Sotelo, J.1
Mehri, S.2
Kumar, K.3
Santos, J.F.4
Kastner, K.5
Courville, A.6
Bengio, Y.7
-
30
-
-
85083953646
-
VoiceLoop: Voice fitting and synthesis via a phonological loop
-
Y. Taigman, L. Wolf, A. Polyak, and E. Nachmani. Voiceloop: Voice fitting and synthesis via a phonological loop. In ICLR, 2018.
-
(2018)
ICLR
-
-
Taigman, Y.1
Wolf, L.2
Polyak, A.3
Nachmani, E.4
-
31
-
-
85043317328
-
Attention is all you need
-
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. In NIPS. 2017.
-
(2017)
NIPS
-
-
Vaswani, A.1
Shazeer, N.2
Parmar, N.3
Uszkoreit, J.4
Jones, L.5
Gomez, A.N.6
Kaiser, L.7
Polosukhin, I.8
-
33
-
-
85050561560
-
Tacotron: A fully end-to-end text-to-speech synthesis model
-
abs/1703.10135
-
Y. Wang, R. J. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, Q. V. Le, Y. Agiomyrgiannakis, R. Clark, and R. A. Saurous. Tacotron: A fully end-to-end text-to-speech synthesis model. CoRR, abs/1703.10135, 2017.
-
(2017)
CoRR
-
-
Wang, Y.1
Skerry-Ryan, R.J.2
Stanton, D.3
Wu, Y.4
Weiss, R.J.5
Jaitly, N.6
Yang, Z.7
Xiao, Y.8
Chen, Z.9
Bengio, S.10
Le, Q.V.11
Agiomyrgiannakis, Y.12
Clark, R.13
Saurous, R.A.14
-
34
-
-
84994351528
-
Analysis of the voice conversion challenge 2016 evaluation results
-
09
-
M. Wester, Z. Wu, and J. Yamagishi. Analysis of the voice conversion challenge 2016 evaluation results. In INTERSPEECH, pages 1637-1641, 09 2016.
-
(2016)
INTERSPEECH
, pp. 1637-1641
-
-
Wester, M.1
Wu, Z.2
Yamagishi, J.3
-
35
-
-
84994247053
-
Locally linear embedding for exemplar-based spectral conversion
-
09
-
Y.-C. Wu, H.-T. Hwang, C.-C. Hsu, Y. Tsao, and H.-m. Wang. Locally linear embedding for exemplar-based spectral conversion. In INTERSPEECH, pages 1652-1656, 09 2016.
-
(2016)
INTERSPEECH
, pp. 1652-1656
-
-
Wu, Y.-C.1
Hwang, H.-T.2
Hsu, C.-C.3
Tsao, Y.4
Wang, H.5
-
36
-
-
84959112868
-
A study of speaker adaptation for dnn-based speech synthesis
-
Z. Wu, P. Swietojanski, C. Veaux, S. Renals, and S. King. A study of speaker adaptation for dnn-based speech synthesis. In INTERSPEECH, 2015.
-
(2015)
INTERSPEECH
-
-
Wu, Z.1
Swietojanski, P.2
Veaux, C.3
Renals, S.4
King, S.5
-
37
-
-
84921731072
-
Fast adaptation of deep neural network based on discriminant codes for speech recognition
-
S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q. Liu. Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2014.
-
(2014)
IEEE/ACM Transactions on Audio, Speech, and Language Processing
-
-
Xue, S.1
Abdel-Hamid, O.2
Jiang, H.3
Dai, L.4
Liu, Q.5
-
38
-
-
67650854725
-
Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm
-
J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai. Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm. IEEE Transactions on Audio, Speech, and Language Processing, 2009.
-
(2009)
IEEE Transactions on Audio, Speech, and Language Processing
-
-
Yamagishi, J.1
Kobayashi, T.2
Nakano, Y.3
Ogata, K.4
Isogai, J.5
-
39
-
-
85064811346
-
Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
-
D. Yu, K. Yao, H. Su, G. Li, and F. Seide. Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In IEEE ICASSP, 2013.
-
(2013)
IEEE ICASSP
-
-
Yu, D.1
Yao, K.2
Su, H.3
Li, G.4
Seide, F.5
|