SCOPUS 정보 검색 플랫폼

5th International Conference on Learning Representations, ICLR 2017 - Workshop Track Proceedings

Volumn , Issue , 2017, Pages

Char2Wav: End-to-end speech synthesis

(7) Sotelo, Jose a Mehri, Soroush a Kumar, Kundan a,b Santos, João Felipe c Kastner, Kyle a Courville, Aaron d Bengio, Yoshua e

a UNIVERSITÉ DE MONTRÉAL (Canada)

b INDIAN INSTITUTE OF TECHNOLOGY KANPUR (India)

c INRS EMT (Canada)

d CIFAR Fellow ^*

e CIFAR fellow (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

DECODING; RECURRENT NEURAL NETWORKS; SIGNAL ENCODING; VOCODERS;

ACOUSTIC FEATURES; BIDIRECTIONAL RECURRENT NEURAL NETWORKS; ENCODER-DECODER; END TO END; END-TO-END MODELS; INTERMEDIATE REPRESENTATIONS; LEARN+; TRADITIONAL MODELS; TWO-COMPONENT; WAVEFORMS;

SPEECH SYNTHESIS;

EID: 85122685393 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (361)

References (43)

1
- 85083953689
- Neural machine translation by jointly learning to align and translate
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- (2015) Proceedings of the International Conference on Learning Representations (ICLR)
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 67650565075
- Springer Science & Business Media
- Jacob Benesty, M Mohan Sondhi, and Yiteng Huang. Springer handbook of speech processing. Springer Science & Business Media, 2007.
- (2007) Springer Handbook of Speech Processing
- Benesty, J.¹ Mohan Sondhi, M.² Huang, Y.³

3
- 84965179228
- Scheduled sampling for sequence prediction with recurrent neural networks
- C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett eds, Curran Associates, Inc
- Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 1171-1179. Curran Associates, Inc., 2015.
- (2015) Advances in Neural Information Processing Systems , vol.28 , pp. 1171-1179
- Bengio, S.¹ Vinyals, O.² Jaitly, N.³ Shazeer, N.⁴

4
- 0142166851
- A neural probabilistic language model
- March ISSN
- Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic language model. J. Mach. Learn. Res., 3:1137-1155, March 2003. ISSN 1532-4435.
- (2003) J. Mach. Learn. Res. , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Janvin, C.⁴

5
- 84973351869
- Listen, attend and spell: A neural network for large vocabulary conversational speech recognition
- March
- William Chan, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4960-4964, March 2016.
- (2016) 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4960-4964
- Chan, W.¹ Jaitly, N.² Le, Q.³ Vinyals, O.⁴

6
- 84961291190
- Learning phrase representations using rnn encoder-decoder for statistical machine translation
- Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Hol-ger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014.
- (2014) Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Cho, K.¹ Van Merriënboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Hol-Ger, S.⁶ Bengio, Y.⁷

7
- 84965139600
- Attention-based models for speech recognition
- C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett eds, Curran Associates, Inc
- Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (eds.), Advances in Neural Information Processing Systems 28, pp. 577-585. Curran Associates, Inc., 2015.
- (2015) Advances in Neural Information Processing Systems , vol.28 , pp. 577-585
- Chorowski, J.K.¹ Bahdanau, D.² Serdyuk, D.³ Cho, K.⁴ Bengio, Y.⁵

8
- 84946051934
- Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis
- April
- Yuchen Fan, Yao Qian, Frank K. Soong, and Lei He. Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4475-4479, April 2015.
- (2015) 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4475-4479
- Fan, Y.¹ Qian, Y.² Soong, F.K.³ He, L.⁴

9
- 85162557101
- Practical variational inference for neural networks
- J. Shawe-taylor, R.s. Zemel, Bartlett, F.c.n. Pereira, and K.q. Weinberger eds
- Alex Graves. Practical variational inference for neural networks. In J. Shawe-taylor, R.s. Zemel, P. Bartlett, F.c.n. Pereira, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 24, pp. 2348-2356. 2011.
- (2011) Advances in Neural Information Processing Systems , vol.24 , pp. 2348-2356
- Graves, A.¹

10
- 84906979661
- 08
- Alex Graves. Generating sequences with recurrent neural networks. 08 2013. URL https://arxiv.org/abs/1308.0850.
- (2013) Generating Sequences with Recurrent Neural Networks
- Graves, A.¹

11
- 85064811436
- Alex Graves. Hallucination with recurrent neural networks, 2015. URL https://www.youtube.com/watch?v=-yX1SYeDHbg.
- (2015) Hallucination with Recurrent Neural Networks
- Graves, A.¹

12
- 84919832465
- Towards end-to-end speech recognition with recurrent neural networks
- Tony Jebara and Eric Xing eds, JMLR Workshop and Conference Proceedings
- Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In Tony Jebara and Eric P. Xing (eds.), Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 1764-1772. JMLR Workshop and Conference Proceedings, 2014.
- (2014) Proceedings of the 31st International Conference on Machine Learning (ICML-14) , pp. 1764-1772
- Graves, A.¹ Jaitly, N.²

13
- 84930616355
- 10
- Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. 10 2014. URL https://arxiv.org/abs/1410.5401.
- (2014) Neural Turing Machines
- Graves, A.¹ Wayne, G.² Danihelka, I.³

14
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- Andrew J Hunt and Alan W Black. Unit selection in a concatenative speech synthesis system using a large speech database. In Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on, volume 1, pp. 373-376. IEEE, 1996.
- (1996) Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on , vol.1 , pp. 373-376
- Hunt, A.J.¹ Black, A.W.²

15
- 84973370372
- Cute: A concatenative method for voice conversion using exemplar-based unit selection
- Zeyu Jin, Adam Finkelstein, Stephen DiVerdi, Jingwan Lu, and Gautham J Mysore. Cute: A concatenative method for voice conversion using exemplar-based unit selection. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pp. 5660-5664. IEEE, 2016.
- (2016) Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on , pp. 5660-5664
- Jin, Z.¹ Finkelstein, A.² DiVerdi, S.³ Lu, J.⁴ Mysore, G.J.⁵

16
- 85019766609
- 10
- Alexander Rosenberg Johansen, Jonas Meinertz Hansen, Elias Khazen Obeid, Casper Kaae Sønderby, and Ole Winther. Neural machine translation with characters and hierarchical encoding. 10 2016. URL https://arxiv.org/abs/1610.06550.
- (2016) Neural Machine Translation with Characters and Hierarchical Encoding
- Johansen, A.R.¹ Hansen, J.M.² Obeid, E.K.³ Sønderby, C.K.⁴ Winther, O.⁵

17
- 84959120024
- Sequence-to-sequence neural net models for grapheme-to-phoneme conversion
- May
- Geoffrey Zweig Kaisheng Yao. Sequence-to-sequence neural net models for grapheme-to-phoneme conversion. ISCA - International Speech Communication Association, May 2015.
- (2015) ISCA - International Speech Communication Association
- Yao, G.Z.K.¹

18
- 84910105608
- Measuring a decade of progress in text-to-speech
- Simon King. Measuring a decade of progress in text-to-speech. Loquens, 1(1), 1 2014. ISSN 2386-2637.
- (2014) Loquens , vol.1 , Issue.1 , pp. 1
- King, S.¹

19
- 85083951076
- ADaM: A method for stochastic optimization
- Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
- (2015) Proceedings of the International Conference on Learning Representations (ICLR)
- Kingma, D.P.¹ Ba, J.²

20
- 85039152072
- Bo Li and Heiga Zen. Multi-language multi-speaker acoustic modeling for lstm-rnn based statistical parametric speech synthesis. 2016.
- (2016) Multi-Language Multi-Speaker Acoustic Modeling for Lstm-Rnn Based Statistical Parametric Speech Synthesis
- Li, B.¹ Zen, H.²

21
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- Zhen-Hua Ling, Shiyin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, and Li Deng. Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends. IEEE Signal Processing Magazine, 32:35-52, 2015.
- (2015) IEEE Signal Processing Magazine , vol.32 , pp. 35-52
- Ling, Z.-H.¹ Kang, S.² Zen, H.³ Senior, A.⁴ Schuster, M.⁵ Qian, X.-J.⁶ Meng, H.⁷ Deng, L.⁸

22
- 85039166060
- 12
- Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. Samplernn: An unconditional end-to-end neural audio generation model. 12 2016. URL https://arxiv.org/abs/1612.07837.
- (2016) Samplernn: An Unconditional End-to-End Neural Audio Generation Model
- Mehri, S.¹ Kumar, K.² Gulrajani, I.³ Kumar, R.⁴ Jain, S.⁵ Sotelo, J.⁶ Courville, A.⁷ Bengio, Y.⁸

23
- 84937959846
- Recurrent models of visual attention
- Z. Ghahramani, M. Welling, C. Cortes, Lawrence, and K.q. Weinberger eds, Curran Associates, Inc
- Volodymyr Mnih, Nicolas Heess, Alex Graves, and Koray Kavukcuoglu. Recurrent models of visual attention. In Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 2204-2212. Curran Associates, Inc., 2014.
- (2014) Advances in Neural Information Processing Systems , vol.27 , pp. 2204-2212
- Mnih, V.¹ Heess, N.² Graves, A.³ Kavukcuoglu, K.⁴

24
- 84976902575
- World: A vocoder-based high-quality speech synthesis system for real-time applications
- Masanori Morise, Fumiya Yokomori, and Kenji Ozawa. World: A vocoder-based high-quality speech synthesis system for real-time applications. IEICE Transactions on Information and Systems, E99.D(7):1877-1884, 2016.
- (2016) IEICE Transactions on Information and Systems , vol.E99 , Issue.7 , pp. 1877-1884
- Morise, M.¹ Yokomori, F.² Ozawa, K.³

25
- 78149406722
- The corpus dimex100: Transcription and evaluation
- December
- Luis A. Pineda, Hayde Castellanos, Javier Cuétara, Lucian Galescu, Janet Juárez, Joaquim Llisterri, Patricia Pérez, and Luis Villaseñor. The corpus dimex100: Transcription and evaluation. Lang. Resour. Eval., 44(4):347-370, December 2010.
- (2010) Lang. Resour. Eval. , vol.44 , Issue.4 , pp. 347-370
- Pineda, L.A.¹ Castellanos, H.² Cuétara, J.³ Galescu, L.⁴ Juárez, J.⁵ Llisterri, J.⁶ Pérez, P.⁷ Villaseñor, L.⁸

26
- 84946032010
- Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks
- April
- Kanishka Rao, Fuchun Peng, Hasim Sak, and Francoise Beaufays. Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4225-4229, April 2015. doi: 10.1109/ICASSP.2015.7178767.
- (2015) 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pp. 4225-4229
- Rao, K.¹ Peng, F.² Sak, H.³ Beaufays, F.⁴

27
- 84928547704
- Sequence to sequence learning with neural networks
- Z. Ghahramani, M. Welling, C. Cortes, Lawrence, and K.q. Weinberger eds, Curran Associates, Inc
- Ilya Sutskever, Oriol Vinyals, and Quoc Le. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N.d. Lawrence, and K.q. Weinberger (eds.), Advances in Neural Information Processing Systems 27, pp. 3104-3112. Curran Associates, Inc., 2014.
- (2014) Advances in Neural Information Processing Systems , vol.27 , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.³

28
- 84925160976
- Cambridge University Press, Cambridge
- Paul Taylor. Text-to-Speech Synthesis. Cambridge University Press, Cambridge, 2009.
- (2009) Text-to-Speech Synthesis
- Taylor, P.¹

29
- 84979557463
- arXiv e-prints, abs/1605.02688, May
- Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv e-prints, abs/1605.02688, May 2016. URL http://arxiv.org/abs/1605.02688.
- (2016) Theano: A Python Framework for Fast Computation of Mathematical Expressions

30
- 84946077883
- Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis
- Keiichi Tokuda and Heiga Zen. Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4215-4219, 2015.
- (2015) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 4215-4219
- Tokuda, K.¹ Zen, H.²

31
- 84973307947
- Directly modeling voiced and unvoiced components in speech waveforms by neural networks
- Keiichi Tokuda and Heiga Zen. Directly modeling voiced and unvoiced components in speech waveforms by neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 5640-5644, 2016.
- (2016) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 5640-5644
- Tokuda, K.¹ Zen, H.²

32
- 84876687945
- Speech synthesis based on hidden markov models
- May ISSN
- Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura. Speech synthesis based on hidden markov models. Proceedings of the IEEE, 101(5): 1234-1252, May 2013. ISSN 0018-9219.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

33
- 85011070895
- 09
- Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. 09 2016. URL https://arxiv.org/abs/1609.03499.
- (2016) Wavenet: A Generative Model for Raw Audio
- Van Den Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

34
- 85029039500
- Blocks and fuel: Frameworks for deep learning
- abs/1506.00619
- Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, and Yoshua Bengio. Blocks and fuel: Frameworks for deep learning. CoRR, abs/1506.00619, 2015. URL http://arxiv.org/abs/1506.00619.
- (2015) CoRR
- Van Merriënboer, B.¹ Bahdanau, D.² Dumoulin, V.³ Serdyuk, D.⁴ Warde-Farley, D.⁵ Chorowski, J.⁶ Bengio, Y.⁷

35
- 84959112868
- A study of speaker adaptation for dnn-based speech synthesis
- Zhizheng Wu, Pawel Swietojanski, Christophe Veaux, Steve Renals, and Simon King. A study of speaker adaptation for dnn-based speech synthesis. In INTERSPEECH, pp. 879-883. ISCA, 2015.
- (2015) INTERSPEECH , pp. 879-883
- Wu, Z.¹ Swietojanski, P.² Veaux, C.³ Renals, S.⁴ King, S.⁵

36
- 85013792491
- 7
- Zhizheng Wu, Oliver Watts, and Simon King. Merlin: An Open Source Neural Network Speech Synthesis System. 7 2016.
- (2016) Merlin: An Open Source Neural Network Speech Synthesis System
- Wu, Z.¹ Watts, O.² King, S.³

37
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- David Blei and Francis Bach eds, JMLR Workshop and Conference Proceedings
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. Show, attend and tell: Neural image caption generation with visual attention. In David Blei and Francis Bach (eds.), Proceedings of the 32nd International Conference on Machine Learning (ICML-15), pp. 2048-2057. JMLR Workshop and Conference Proceedings, 2015.
- (2015) Proceedings of the 32nd International Conference on Machine Learning (ICML-15) , pp. 2048-2057
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhudinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

38
- 85047003120
- Junichi Yamagishi. English multi-speaker corpus for cstr voice cloning toolkit, 2012. URL http://homepages.inf.ed.ac.uk/jyamagis/page3/page58/page58.html.
- (2012) English Multi-Speaker Corpus for Cstr Voice Cloning Toolkit
- Yamagishi, J.¹

39
- 79551555531
- Heiga Zen. An example of context-dependent label format for hmm-based speech synthesis in english, 2006. URL http://hts.sp.nitech.ac.jp/?Download.
- (2006) An Example of Context-Dependent Label Format for Hmm-Based Speech Synthesis in English
- Zen, H.¹

40
- 84973282956
- Acoustic modeling in statistical parametric speech synthesis - From hmm to lstm-rnn
- Invited paper
- Heiga Zen. Acoustic modeling in statistical parametric speech synthesis - from hmm to lstm-rnn. In Proc. MLSLP, 2015. Invited paper.
- (2015) Proc. MLSLP
- Zen, H.¹

41
- 67651002140
- Statistical parametric speech synthesis
- Heiga Zen, Keiichi Tokuda, and Alan W Black. Statistical parametric speech synthesis. Speech Communication, 51(11):1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

42
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- Heiga Zen, Andrew Senior, and Mike Schuster. Statistical parametric speech synthesis using deep neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7962-7966, 2013.
- (2013) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

43
- 84994314564
- Fast, compact, and high quality lstm-rnn based statistical parametric speech synthesizers for mobile devices
- San Francisco, CA, USA
- Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, and Przemysław Szczepa-niak. Fast, compact, and high quality lstm-rnn based statistical parametric speech synthesizers for mobile devices. In Proc. Interspeech, San Francisco, CA, USA, 2016.
- (2016) Proc. Interspeech
- Zen, H.¹ Agiomyrgiannakis, Y.² Egberts, N.³ Henderson, F.⁴ Szczepa-Niak, P.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.