-
1
-
-
84930664922
-
Vocaine the vocoder and applications in speech synthesis
-
Yannis Agiomyrgiannakis. Vocaine the vocoder and applications in speech synthesis. In ICASSP, 2015.
-
(2015)
ICASSP
-
-
Agiomyrgiannakis, Y.1
-
2
-
-
85039156048
-
Deep voice: Real-time neural text-to-speech
-
Sercan Ö. Arık, Mike Chrzanowski, Adam Coates, Gregory Diamos, Andrew Gibiansky, Yongguo Kang, Xian Li, John Miller, Jonathan Raiman, Shubho Sengupta, and Mohammad Shoeybi. Deep Voice: Real-time neural text-to-speech. In ICML, 2017.
-
(2017)
ICML
-
-
Arık, S.Ö.1
Chrzanowski, M.2
Coates, A.3
Diamos, G.4
Gibiansky, A.5
Kang, Y.6
Li, X.7
Miller, J.8
Raiman, J.9
Sengupta, S.10
Shoeybi, M.11
-
3
-
-
85046637415
-
Deep Voice 2: Multi-speaker neural text-to-speech
-
Sercan Ö. Arık, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, and Yanqi Zhou. Deep Voice 2: Multi-speaker neural text-to-speech. In NIPS, 2017b.
-
(2017)
NIPS
-
-
Arık, S.Ö.1
Diamos, G.2
Gibiansky, A.3
Miller, J.4
Peng, K.5
Ping, W.6
Raiman, J.7
Zhou, Y.8
-
5
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
-
(2015)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
6
-
-
85039170210
-
Siri on-device deep learning-guided unit selection text-to-speech system
-
Tim Capes, Paul Coles, Alistair Conkie, Ladan Golipour, Abie Hadjitarkhani, Qiong Hu, Nancy Huddleston, Melvyn Hunt, Jiangchuan Li, Matthias Neeracher, et al. Siri on-device deep learning-guided unit selection text-to-speech system. In Interspeech, 2017.
-
(2017)
Interspeech
-
-
Capes, T.1
Coles, P.2
Conkie, A.3
Golipour, L.4
Hadjitarkhani, A.5
Hu, Q.6
Huddleston, N.7
Hunt, M.8
Li, J.9
Neeracher, M.10
-
7
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol-ger Schwenk, and Yoshua Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, 2014.
-
(2014)
EMNLP
-
-
Cho, K.1
Van Merriënboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.-G.6
Bengio, Y.7
-
8
-
-
84965139600
-
Attention-based models for speech recognition
-
Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In NIPS, 2015.
-
(2015)
NIPS
-
-
Chorowski, J.K.1
Bahdanau, D.2
Serdyuk, D.3
Cho, K.4
Bengio, Y.5
-
9
-
-
85048443641
-
Language modeling with gated convolutional networks
-
Yann Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In ICML, 2017.
-
(2017)
ICML
-
-
Dauphin, Y.1
Fan, A.2
Auli, M.3
Grangier, D.4
-
10
-
-
85046994169
-
Convolutional sequence to sequence learning
-
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann Dauphin. Convolutional sequence to sequence learning. In ICML, 2017.
-
(2017)
ICML
-
-
Gehring, J.1
Auli, M.2
Grangier, D.3
Yarats, D.4
Dauphin, Y.5
-
11
-
-
84994309294
-
Recent advances in Google real-time HMM-driven unit selection synthesizer
-
Xavi Gonzalvo, Siamak Tazari, Chun-an Chan, Markus Becker, Alexander Gutkin, and Hanna Silen. Recent advances in Google real-time HMM-driven unit selection synthesizer. In Interspeech, 2016.
-
(2016)
Interspeech
-
-
Gonzalvo, X.1
Tazari, S.2
Chan, C.-A.3
Becker, M.4
Gutkin, A.5
Silen, H.6
-
13
-
-
0032673049
-
Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
-
Hideki Kawahara, Ikuyo Masuda-Katsuse, and Alain De Cheveigne. Restructuring speech representations using a pitch-adaptive time–frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds. Speech communication, 1999.
-
(1999)
Speech Communication
-
-
Kawahara, H.1
Masuda-Katsuse, I.2
De Cheveigne, A.3
-
14
-
-
85088227413
-
Samplernn: An unconditional end-to-end neural audio generation model
-
Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. SampleRNN: An unconditional end-to-end neural audio generation model. In ICLR, 2017.
-
(2017)
ICLR
-
-
Mehri, S.1
Kumar, K.2
Gulrajani, I.3
Kumar, R.4
Jain, S.5
Sotelo, J.6
Courville, A.7
Bengio, Y.8
-
17
-
-
85011070895
-
-
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. WaveNet: A generative model for raw audio. arXiv:1609.03499, 2016.
-
(2016)
WaveNet: A Generative Model for Raw Audio
-
-
Van den Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
18
-
-
84946015916
-
Librispeech: An ASR corpus based on public domain audio books
-
IEEE
-
Vassil Panayotov, Guoguo Chen, Daniel Povey, and Sanjeev Khudanpur. Librispeech: an ASR corpus based on public domain audio books. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 5206–5210. IEEE, 2015.
-
(2015)
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
, pp. 5206-5210
-
-
Panayotov, V.1
Chen, G.2
Povey, D.3
Khudanpur, S.4
-
19
-
-
85048524283
-
Online and linear-time attention by enforcing monotonic alignments
-
Colin Raffel, Thang Luong, Peter J Liu, Ron J Weiss, and Douglas Eck. Online and linear-time attention by enforcing monotonic alignments. In ICML, 2017.
-
(2017)
ICML
-
-
Raffel, C.1
Luong, T.2
Liu, P.J.3
Weiss, R.J.4
Eck, D.5
-
20
-
-
85047003030
-
CrowDMOS: An approach for crowdsourcing mean opinion score studies
-
Flávio Ribeiro, Dinei Florêncio, Cha Zhang, and Michael Seltzer. Crowdmos: An approach for crowdsourcing mean opinion score studies. In IEEE ICASSP, 2011.
-
(2011)
IEEE ICASSP
-
-
Ribeiro, F.1
Florêncio, D.2
Zhang, C.3
Seltzer, M.4
-
21
-
-
85011836388
-
A neural attention model for abstractive sentence summarization
-
Alexander M Rush, Sumit Chopra, and Jason Weston. A neural attention model for abstractive sentence summarization. In EMNLP, 2015.
-
(2015)
EMNLP
-
-
Rush, A.M.1
Chopra, S.2
Weston, J.3
-
22
-
-
85017457992
-
Weight normalization: A simple reparameterization to accelerate training of deep neural networks
-
Tim Salimans and Diederik P Kingma. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. In NIPS, 2016.
-
(2016)
NIPS
-
-
Salimans, T.1
Kingma, D.P.2
-
23
-
-
85122685393
-
Char2Wav: End-to-end speech synthesis
-
Jose Sotelo, Soroush Mehri, Kundan Kumar, Joao Felipe Santos, Kyle Kastner, Aaron Courville, and Yoshua Bengio. Char2wav: End-to-end speech synthesis. In ICLR workshop, 2017.
-
(2017)
ICLR Workshop
-
-
Sotelo, J.1
Mehri, S.2
Kumar, K.3
Santos, J.F.4
Kastner, K.5
Courville, A.6
Bengio, Y.7
-
24
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
-
(2014)
NIPS
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
26
-
-
84925160976
-
-
Cambridge University Press, New York, NY, USA, 1st edition, ISBN
-
Paul Taylor. Text-to-Speech Synthesis. Cambridge University Press, New York, NY, USA, 1st edition, 2009. ISBN 0521899273, 9780521899277.
-
(2009)
Text-to-Speech Synthesis
-
-
Taylor, P.1
-
27
-
-
85038368581
-
-
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. arXiv:1706.03762, 2017.
-
(2017)
Attention Is All You Need
-
-
Vaswani, A.1
Shazeer, N.2
Parmar, N.3
Uszkoreit, J.4
Jones, L.5
Gomez, A.N.6
Kaiser, L.7
Polosukhin, I.8
-
28
-
-
85038442478
-
Saurous. Tacotron: Towards end-to-end speech synthesis
-
Yuxuan Wang, RJ Skerry-Ryan, Daisy Stanton, Yonghui Wu, Ron Weiss, Navdeep Jaitly, Zongheng Yang, Ying Xiao, Zhifeng Chen, Samy Bengio, Quoc Le, Yannis Agiomyrgiannakis, Rob Clark, and Rif A. Saurous. Tacotron: Towards end-to-end speech synthesis. In Interspeech, 2017.
-
(2017)
Interspeech
-
-
Wang, Y.1
Skerry-Ryan, R.J.2
Stanton, D.3
Wu, Y.4
Weiss, R.5
Jaitly, N.6
Yang, Z.7
Xiao, Y.8
Chen, Z.9
Bengio, S.10
Le, Q.11
Agiomyrgiannakis, Y.12
Clark, R.13
Rif, A.14
-
29
-
-
85008006694
-
Robust speaker-adaptive hmm-based text-to-speech synthesis
-
Junichi Yamagishi, Takashi Nose, Heiga Zen, Zhen-Hua Ling, Tomoki Toda, Keiichi Tokuda, Simon King, and Steve Renals. Robust speaker-adaptive hmm-based text-to-speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 2009.
-
(2009)
IEEE Transactions on Audio, Speech, and Language Processing
-
-
Yamagishi, J.1
Nose, T.2
Zen, H.3
Ling, Z.-H.4
Toda, T.5
Tokuda, K.6
King, S.7
Renals, S.8
-
30
-
-
77953708096
-
Thousands of voices for hmm-based speech synthesis–analysis and application of tts systems built on various asr corpora
-
Junichi Yamagishi, Bela Usabaev, Simon King, Oliver Watts, John Dines, Jilei Tian, Yong Guan, Rile Hu, Keiichiro Oura, Yi-Jian Wu, et al. Thousands of voices for hmm-based speech synthesis–analysis and application of tts systems built on various asr corpora. IEEE Transactions on Audio, Speech, and Language Processing, 2010.
-
(2010)
IEEE Transactions on Audio, Speech, and Language Processing
-
-
Yamagishi, J.1
Usabaev, B.2
King, S.3
Watts, O.4
Dines, J.5
Tian, J.6
Guan, Y.7
Hu, R.8
Oura, K.9
Wu, Y.-J.10
|