-
3
-
-
84965179228
-
Scheduled sampling for sequence prediction with recurrent neural networks
-
C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett eds, Curran Associates, Inc
-
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 1171-1179. Curran Associates, Inc., 2015.
-
(2015)
Advances in Neural Information Processing Systems
, vol.28
, pp. 1171-1179
-
-
Bengio, S.1
Vinyals, O.2
Jaitly, N.3
Shazeer, N.4
-
4
-
-
0142166851
-
A neural probabilistic language model
-
March ISSN
-
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137-1155, March 2003. ISSN 1532-4435.
-
(2003)
J. Mach. Learn. Res.
, vol.3
, pp. 1137-1155
-
-
Bengio, Y.1
Ducharme, R.2
Vincent, P.3
Janvin, C.4
-
5
-
-
84973351869
-
Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
-
March
-
William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960-4964, March 2016.
-
(2016)
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, pp. 4960-4964
-
-
Chan, W.1
Jaitly, N.2
Le, Q.3
Vinyals, O.4
-
6
-
-
84961291190
-
Learning phrase representations using rnn encoder-decoder for statistical machine translation
-
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol-ger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
-
(2014)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
-
-
Cho, K.1
Van Merriënboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Hol-Ger, S.6
Bengio, Y.7
-
7
-
-
84965139600
-
Attention-based models for speech recognition
-
C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett eds, Curran Associates, Inc
-
Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 577-585. Curran Associates, Inc., 2015.
-
(2015)
Advances in Neural Information Processing Systems
, vol.28
, pp. 577-585
-
-
Chorowski, J.K.1
Bahdanau, D.2
Serdyuk, D.3
Cho, K.4
Bengio, Y.5
-
8
-
-
84946051934
-
Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis
-
April
-
Yuchen Fan, Yao Qian, Frank K. Soong, and Lei He. Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4475-4479, April 2015.
-
(2015)
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, pp. 4475-4479
-
-
Fan, Y.1
Qian, Y.2
Soong, F.K.3
He, L.4
-
9
-
-
85162557101
-
Practical variational inference for neural networks
-
J. Shawe-taylor, R.s. Zemel, Bartlett, F.c.n. Pereira, and K.q. Weinberger eds
-
Alex Graves. Practical variational inference for neural networks. In J. Shawe-taylor, R.s. Zemel, P. Bartlett, F.c.n. Pereira, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 24, pp. 2348-2356. 2011.
-
(2011)
Advances in Neural Information Processing Systems
, vol.24
, pp. 2348-2356
-
-
Graves, A.1
-
12
-
-
84919832465
-
Towards end-to-end speech recognition with recurrent neural networks
-
Tony Jebara and Eric Xing eds, JMLR Workshop and Conference Proceedings
-
Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In Tony Jebara and Eric P. Xing (eds.), Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764-1772. JMLR Workshop and Conference Proceedings, 2014.
-
(2014)
Proceedings of the 31st International Conference on Machine Learning (ICML-14)
, pp. 1764-1772
-
-
Graves, A.1
Jaitly, N.2
-
14
-
-
0029765811
-
Unit selection in a concatenative speech synthesis system using a large speech database
-
Andrew J Hunt and Alan W Black. Unit selection in a concatenative speech synthesis system using a large speech database. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, volume 1, pp. 373-376. IEEE, 1996.
-
(1996)
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
, vol.1
, pp. 373-376
-
-
Hunt, A.J.1
Black, A.W.2
-
15
-
-
84973370372
-
Cute: A concatenative method for voice conversion using exemplar-based unit selection
-
Zeyu Jin, Adam Finkelstein, Stephen DiVerdi, Jingwan Lu, and Gautham J Mysore. Cute: A concatenative method for voice conversion using exemplar-based unit selection. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pp. 5660-5664. IEEE, 2016.
-
(2016)
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on
, pp. 5660-5664
-
-
Jin, Z.1
Finkelstein, A.2
DiVerdi, S.3
Lu, J.4
Mysore, G.J.5
-
17
-
-
84959120024
-
Sequence-to-sequence neural net models for grapheme-to-phoneme conversion
-
May
-
Geoffrey Zweig Kaisheng Yao. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. ISCA - International Speech Communication Association, May 2015.
-
(2015)
ISCA - International Speech Communication Association
-
-
Yao, G.Z.K.1
-
18
-
-
84910105608
-
Measuring a decade of progress in text-to-speech
-
Simon King. Measuring a decade of progress in text-to-speech. Loquens, 1(1), 1 2014. ISSN 2386-2637.
-
(2014)
Loquens
, vol.1
, Issue.1
, pp. 1
-
-
King, S.1
-
21
-
-
85032750981
-
Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
-
Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, and Li Deng. Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends. IEEE Signal Processing Magazine, 32:35-52, 2015.
-
(2015)
IEEE Signal Processing Magazine
, vol.32
, pp. 35-52
-
-
Ling, Z.-H.1
Kang, S.2
Zen, H.3
Senior, A.4
Schuster, M.5
Qian, X.-J.6
Meng, H.7
Deng, L.8
-
22
-
-
85039166060
-
-
12
-
Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. Samplernn: An unconditional end-to-end neural audio generation model. 12 2016. URL https://arxiv.org/abs/1612.07837.
-
(2016)
Samplernn: An Unconditional End-to-End Neural Audio Generation Model
-
-
Mehri, S.1
Kumar, K.2
Gulrajani, I.3
Kumar, R.4
Jain, S.5
Sotelo, J.6
Courville, A.7
Bengio, Y.8
-
23
-
-
84937959846
-
Recurrent models of visual attention
-
Z. Ghahramani, M. Welling, C. Cortes, Lawrence, and K.q. Weinberger eds, Curran Associates, Inc
-
Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. Recurrent models of visual attention. In Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 2204-2212. Curran Associates, Inc., 2014.
-
(2014)
Advances in Neural Information Processing Systems
, vol.27
, pp. 2204-2212
-
-
Mnih, V.1
Heess, N.2
Graves, A.3
Kavukcuoglu, K.4
-
24
-
-
84976902575
-
World: A vocoder-based high-quality speech synthesis system for real-time applications
-
Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. World: A vocoder-based high-quality speech synthesis system for real-time applications. IEICE Transactions on Information and Systems, E99.D(7):1877-1884, 2016.
-
(2016)
IEICE Transactions on Information and Systems
, vol.E99
, Issue.7
, pp. 1877-1884
-
-
Morise, M.1
Yokomori, F.2
Ozawa, K.3
-
25
-
-
78149406722
-
The corpus dimex100: Transcription and evaluation
-
December
-
Luis A. Pineda, Hayde Castellanos, Javier Cuétara, Lucian Galescu, Janet Juárez, Joaquim Llisterri, Patricia Pérez, and Luis Villaseñor. The corpus dimex100: Transcription and evaluation. Lang. Resour. Eval., 44(4):347-370, December 2010.
-
(2010)
Lang. Resour. Eval.
, vol.44
, Issue.4
, pp. 347-370
-
-
Pineda, L.A.1
Castellanos, H.2
Cuétara, J.3
Galescu, L.4
Juárez, J.5
Llisterri, J.6
Pérez, P.7
Villaseñor, L.8
-
26
-
-
84946032010
-
Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks
-
April
-
Kanishka Rao, Fuchun Peng, Hasim Sak, and Francoise Beaufays. Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225-4229, April 2015. doi: 10.1109/ICASSP.2015.7178767.
-
(2015)
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
, pp. 4225-4229
-
-
Rao, K.1
Peng, F.2
Sak, H.3
Beaufays, F.4
-
27
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
Z. Ghahramani, M. Welling, C. Cortes, Lawrence, and K.q. Weinberger eds, Curran Associates, Inc
-
Ilya Sutskever, Oriol Vinyals, and Quoc Le. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 3104-3112. Curran Associates, Inc., 2014.
-
(2014)
Advances in Neural Information Processing Systems
, vol.27
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.3
-
28
-
-
84925160976
-
-
Cambridge University Press, Cambridge
-
Paul Taylor. Text-to-Speech Synthesis. Cambridge University Press, Cambridge, 2009.
-
(2009)
Text-to-Speech Synthesis
-
-
Taylor, P.1
-
32
-
-
84876687945
-
Speech synthesis based on hidden markov models
-
May ISSN
-
Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura. Speech synthesis based on hidden markov models. Proceedings of the IEEE, 101(5): 1234-1252, May 2013. ISSN 0018-9219.
-
(2013)
Proceedings of the IEEE
, vol.101
, Issue.5
, pp. 1234-1252
-
-
Tokuda, K.1
Nankaku, Y.2
Toda, T.3
Zen, H.4
Yamagishi, J.5
Oura, K.6
-
33
-
-
85011070895
-
-
09
-
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. 09 2016. URL https://arxiv.org/abs/1609.03499.
-
(2016)
Wavenet: A Generative Model for Raw Audio
-
-
Van Den Oord, A.1
Dieleman, S.2
Zen, H.3
Simonyan, K.4
Vinyals, O.5
Graves, A.6
Kalchbrenner, N.7
Senior, A.8
Kavukcuoglu, K.9
-
34
-
-
85029039500
-
Blocks and fuel: Frameworks for deep learning
-
abs/1506.00619
-
Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, and Yoshua Bengio. Blocks and fuel: Frameworks for deep learning. CoRR, abs/1506.00619, 2015. URL http://arxiv.org/abs/1506.00619.
-
(2015)
CoRR
-
-
Van Merriënboer, B.1
Bahdanau, D.2
Dumoulin, V.3
Serdyuk, D.4
Warde-Farley, D.5
Chorowski, J.6
Bengio, Y.7
-
35
-
-
84959112868
-
A study of speaker adaptation for dnn-based speech synthesis
-
Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, and Simon King. A study of speaker adaptation for dnn-based speech synthesis. In INTERSPEECH, pp. 879-883. ISCA, 2015.
-
(2015)
INTERSPEECH
, pp. 879-883
-
-
Wu, Z.1
Swietojanski, P.2
Veaux, C.3
Renals, S.4
King, S.5
-
37
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
David Blei and Francis Bach eds, JMLR Workshop and Conference Proceedings
-
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In David Blei and Francis Bach (eds.), Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 2048-2057. JMLR Workshop and Conference Proceedings, 2015.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning (ICML-15)
, pp. 2048-2057
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudinov, R.6
Zemel, R.7
Bengio, Y.8
-
40
-
-
84973282956
-
Acoustic modeling in statistical parametric speech synthesis - From hmm to lstm-rnn
-
Invited paper
-
Heiga Zen. Acoustic modeling in statistical parametric speech synthesis - from hmm to lstm-rnn. In Proc. MLSLP, 2015. Invited paper.
-
(2015)
Proc. MLSLP
-
-
Zen, H.1
-
41
-
-
67651002140
-
Statistical parametric speech synthesis
-
Heiga Zen, Keiichi Tokuda, and Alan W Black. Statistical parametric speech synthesis. Speech Communication, 51(11):1039-1064, 2009.
-
(2009)
Speech Communication
, vol.51
, Issue.11
, pp. 1039-1064
-
-
Zen, H.1
Tokuda, K.2
Black, A.W.3
-
42
-
-
84890490547
-
Statistical parametric speech synthesis using deep neural networks
-
Heiga Zen, Andrew Senior, and Mike Schuster. Statistical parametric speech synthesis using deep neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7962-7966, 2013.
-
(2013)
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
, pp. 7962-7966
-
-
Zen, H.1
Senior, A.2
Schuster, M.3
-
43
-
-
84994314564
-
Fast, compact, and high quality lstm-rnn based statistical parametric speech synthesizers for mobile devices
-
San Francisco, CA, USA
-
Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, and Przemysław Szczepa-niak. Fast, compact, and high quality lstm-rnn based statistical parametric speech synthesizers for mobile devices. In Proc. Interspeech, San Francisco, CA, USA, 2016.
-
(2016)
Proc. Interspeech
-
-
Zen, H.1
Agiomyrgiannakis, Y.2
Egberts, N.3
Henderson, F.4
Szczepa-Niak, P.5
|