SCOPUS 정보 검색 플랫폼

34th International Conference on Machine Learning, ICML 2017

Volumn 1, Issue , 2017, Pages 264-273

Deep voice: Real-time neural text-to-speech

(12) Arik, Sercan Ö a Chrzanowski, Mike a Coates, Adam a Diamos, Gregory a Gibiansky, Andrew a Kang, Yongguo a Li, Xian a Miller, John a Ng, Andrew a Raiman, Jonathan a Sengupta, Shubho a Shoeybi, Mohammad a

a BAIDU INC (China)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; SPEECH SYNTHESIS;

BOUNDARY DETECTION; DURATION PREDICTIONS; FEATURE ENGINEERINGS; FUNDAMENTAL FREQUENCY PREDICTIONS; PRODUCTION QUALITY; SEGMENTATION MODELS; TEMPORAL CLASSIFICATION; TEXT-TO-SPEECH SYSTEM;

DEEP NEURAL NETWORKS;

EID: 85039156048 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (247)

References (27)

1
- 84958264664
- Abadi, Martín, Agarwal, Ashish, Barham, Paul, Brevdo, Eugene, Chen, Zhifeng, Citro, Craig, Corrado, Greg S., Davis, Andy, Dean, Jeffrey, Devin, Matthieu, Ghemawat, Sanjay, Goodfellow, Ian, Harp, Andrew, Irving, Geoffrey, Isard, Michael, Jia, Yangqing, Jozefowicz, Rafal, Kaiser, Lukasz, Kudlur, Manjunath, Levenberg, Josh, Mané, Dan, Monga, Rajat, Moore, Sherry, Murray, Derek, Olah, Chris, Schuster, Mike, Shlens, Jonathon, Steiner, Benoit, Sutskever, Ilya, Talwar, Kunal, Tucker, Paul, Vanhoucke, Vincent, Vasudevan, Vijay, Viégas, Fernanda, Vinyals, Oriol, Warden, Pete, Wattenberg, Martin, Wicke, Martin, Yu, Yuan, and Zheng, Xiaoqiang. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL http://tensorflow.org/. Software available from tensorflow.org.
- (2015) TensorFlow: Large-scale Machine Learning on Heterogeneous Systems
- Abadi, M.¹ Agarwal, A.² Barham, P.³ Brevdo, E.⁴ Chen, Z.⁵ Citro, C.⁶ Corrado, G.S.⁷ Davis, A.⁸ Dean, J.⁹ Devin, M.¹⁰ Ghemawat, S.¹¹ Goodfellow, I.¹² Harp, A.¹³ Irving, G.¹⁴ Isard, M.¹⁵ Jia, Y.¹⁶ Jozefowicz, R.¹⁷ Kaiser, L.¹⁸ Kudlur, M.¹⁹ Levenberg, J.²⁰ more..

2
- 84971463350
- Amodei, Dario, Anubhai, Rishita, Battenberg, Eric, Case, Carl, Casper, Jarcd, Catanzaro, Bryan, Chen, Jingdong, Chrzanowski, Mike, Coates, Adam, Diamos, Greg, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. arXiv preprint arXiv: 1512.02595, 2015.
- (2015) Deep Speech 2: End-to-end Speech Recognition in English and Mandarin
- Amodei, D.¹ Anubhai, R.² Battenberg, E.³ Case, C.⁴ Casper, J.⁵ Catanzaro, B.⁶ Chen, J.⁷ Chrzanowski, M.⁸ Coates, A.⁹ Diamos, G.¹⁰

3
- 4444257069
- Praat, a system for doing phonetics by computer
- Boersma, Paulus Petrus Gerardus et al. Praat, a system for doing phonetics by computer. Glot international, 5, 2002.
- (2002) Glot International , pp. 5
- Boersma, P.P.G.¹

4
- 85037362563
- Bradbury, James, Merity, Stephen, Xiong, Caiming, and Socher, Richard. Quasi-recurrent neural networks. arXiv preprint arXiv:1611.01576, 2016.
- (2016) Quasi-recurrent Neural Networks
- Bradbury, J.¹ Merity, S.² Xiong, C.³ Socher, R.⁴

5
- 84939821078
- Chung, Junyoung, Gulcehre, Caglar, Cho, KyungHyun, and Bengio, Yoshua. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv: 1412.3555, 2014.
- (2014) Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Chung, J.¹ Gulcehre, C.² Cho, K.³ Bengio, Y.⁴

6
- 85048676855
- Persistent mns: Stashing recurrent weights on-chip
- Diamos, Greg, Sengupta, Shubho, Catanzaro, Bryan, Chrzanowski, Mike, Coates, Adam, Elsen, Erich, Engel, Jesse, Hannun, Awni, and Satheesh, Sanjeev. Persistent mns: Stashing recurrent weights on-chip. In Proceedings of The 33rd International Conference on Machine Learning, pp. 2024-2033, 2016.
- (2016) Proceedings of the 33rd International Conference on Machine Learning , pp. 2024-2033
- Diamos, G.¹ Sengupta, S.² Catanzaro, B.³ Chrzanowski, M.⁴ Coates, A.⁵ Elsen, E.⁶ Engel, J.⁷ Hannun, A.⁸ Satheesh, S.⁹

7
- 84975744527
- Peachpy meets opcodes: Direct machine code generation from python
- ACM
- Dukhan, Marat. Peachpy meets opcodes: direct machine code generation from python. In Proceedings of the 5th Workshop on Python for High-Performance and Scientific Computing, pp. 3. ACM, 2015.
- (2015) Proceedings of the 5th Workshop on Python for High-performance and Scientific Computing , pp. 3
- Dukhan, M.¹

8
- 34250704813
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- New York, NY, USA, ACM
- Graves, Alex, Fernández, Santiago, Gomez, Faustino, and Schmidhuber, Jürgen. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks. In Proceedings of the 23rd International Conference on Machine Learning, ICML'06, pp. 369-376, New York, NY, USA, 2006. ACM.
- (2006) Proceedings of the 23rd International Conference on Machine Learning, ICML'06 , pp. 369-376
- Graves, A.¹ Fernández, S.² Gomez, F.³ Schmidhuber, J.⁴

9
- 84941620184
- Kingma, D. and Ba, J. Adam: A method for stochastic optimization. arXiv preprint arXiv: 1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

10
- 85039166060
- Mehri, Soroush, Kumar, Kundan, Gulrajani, Ishaan, Kumar, Rithesh, Jain, Shubham, Sotelo, Jose, Courville, Aaron, and Bengio, Yoshua. Samplernn: An unconditional end-to-end neural audio generation model. arXiv preprint arXiv:1612.07837, 2016.
- (2016) Samplernn: An Unconditional End-to-end Neural Audio Generation Model
- Mehri, S.¹ Kumar, K.² Gulrajani, I.³ Kumar, R.⁴ Jain, S.⁵ Sotelo, J.⁶ Courville, A.⁷ Bengio, Y.⁸

11
- 84976902575
- World: A vocoder-based high-quality speech synthesis system for real-time applications
- Morise, Masanori, Yokomori, Fumiya, and Ozawa, Kenji. World: a vocoder-based high-quality speech synthesis system for real-time applications. IEICE TRANSAC-TIONS on Information and Systems, 99(7):1877-1884, 2016.
- (2016) IEICE TRANSAC-TIONS on Information and Systems , vol.99 , Issue.7 , pp. 1877-1884
- Morise, M.¹ Yokomori, F.² Ozawa, K.³

12
- 84989286061
- Oord, Aaron van den, Kalchbrenner, Nal, and Kavukcuoglu, Koray. Pixel recurrent neural networks. arXiv preprint arXiv:1601.06759, 2016.
- (2016) Pixel Recurrent Neural Networks
- Van Den Oord, A.¹ Kalchbrenner, N.² Kavukcuoglu, K.³

13
- 85039156182
- Paine, Tom Le, Khorrami, Pooya, Chang, Shiyu, Zhang, Yang, Ramachandran, Prajit, Hasegawa-Johnson, Mark A, and Huang, Thomas S. Fast wavenet generation algorithm. arXiv preprint arXiv:1611.09482, 2016.
- (2016) Fast Wavenet Generation Algorithm
- Paine, T.L.¹ Khorrami, P.² Chang, S.³ Zhang, Y.⁴ Ramachandran, P.⁵ Hasegawa-Johnson, M.A.⁶ Huang, T.S.⁷

14
- 85048678744
- Multi-output rnn-lstm for multiple speaker speech synthesis with α-interpolation model
- Pascual, Santiago and Bonafonte, Antonio. Multi-output rnn-lstm for multiple speaker speech synthesis with α-interpolation model. way, 1000:2, 2016.
- (2016) Way , vol.1000 , pp. 2
- Pascual, S.¹ Bonafonte, A.²

15
- 84984782848
- The blizzard challenge 2013indian language task
- Prahallad, Kishore, Vadapalli, Anandaswarup, Elluru, Naresh, et al. The blizzard challenge 2013indian language task. In In Blizzard Challenge Workshop 2013, 2013.
- (2013) Blizzard Challenge Workshop 2013
- Prahallad, K.¹ Vadapalli, A.² Elluru, N.³

16
- 84946032010
- Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks
- IEEE
- Rao, Kanishka, Peng, Fuchun, Sak, Hasim, and Beaufays, Françoise. Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 4225-4229. IEEE, 2015.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on , pp. 4225-4229
- Rao, K.¹ Peng, F.² Sak, H.³ Beaufays, F.⁴

17
- 80051607565
- Crowdmos: An approach for crowdsourcing mean opinion score studies
- IEEE
- Ribeiro, Flávio, Florencio, Dinei, Zhang, Cha, and Seltzer, Michael. Crowdmos: An approach for crowdsourcing mean opinion score studies. In Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 2416-2419. IEEE, 2011.
- (2011) Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on , pp. 2416-2419
- Ribeiro, F.¹ Florencio, D.² Zhang, C.³ Seltzer, M.⁴

18
- 84994213378
- A template-based approach for speech synthesis intonation generation using lstms
- Ronanki, Srikanth, Henter, Gustav Eje, Wu, Zhizheng, and King, Simon. A template-based approach for speech synthesis intonation generation using lstms. Interspeech 2016, pp. 2463-2467, 2016.
- (2016) Interspeech 2016 , pp. 2463-2467
- Ronanki, S.¹ Henter, G.E.² Wu, Z.³ King, S.⁴

19
- 85122685393
- Char2wav: End-to-end speech synthesis
- Sotelo, Jose, Mehri, Soroush, Kumar, Kundan, Santos, Joao Felipe, Kastner, Kyle, Courville, Aaron, and Bengio, Yoshua. Char2wav: End-to-end speech synthesis. In ICLR 2017 workshop submission, 2017. URL https://openreview.net/forum?id=BlVWyySKx.
- (2017) ICLR 2017 Workshop Submission
- Sotelo, J.¹ Mehri, S.² Kumar, K.³ Santos, J.F.⁴ Kastner, K.⁵ Courville, A.⁶ Bengio, Y.⁷

20
- 84892324142
- Springer
- Stephenson, Ian. Production Rendering, Design and Implementation. Springer, 2005.
- (2005) Production Rendering, Design and Implementation
- Stephenson, I.¹

21
- 84925160976
- Cambridge University Press, New York, NY, USA, 1st edition, 9780521899277
- Taylor, Paul. Text-to-Speech Synthesis. Cambridge University Press, New York, NY, USA, 1st edition, 2009. ISBN 0521899273, 9780521899277.
- (2009) Text-to-speech Synthesis
- Taylor, P.¹

22
- 84998678663
- Theis, Lucas, Oord, Aäron van den, and Bethge, Matthias. A note on the evaluation of generative models. arXiv preprint arXiv:1511.01844, 2015.
- (2015) A Note on the Evaluation of Generative Models
- Theis, L.¹ Van Den Oord, A.² Bethge, M.³

23
- 85017259342
- Wavenet: A generative model for raw audio
- 1609.03499
- van den Oord, Aäron, Dieleman, Sander, Zen, Heiga, Simonyan, Karen, Vinyals, Oriol, Graves, Alex, Kalchbrenner, Nal, Senior, Andrew, and Kavukcuoglu, Koray. Wavenet: A generative model for raw audio. CoRR abs/1609.03499, 2016.
- (2016) CoRR
- Van Den Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

24
- 51449095035
- Carnegie Mellon University
- Weide, R. The CMU pronunciation dictionary 0.7. Carnegie Mellon University, 2008.
- (2008) The CMU Pronunciation Dictionary 0.7
- Weide, R.¹

25
- 85006506329
- Yao, Kaisheng and Zweig, Geoffrey. Sequence-tosequence neural net models for grapheme-to-phoneme conversion. arXiv preprint arXiv:1506.00196, 2015.
- (2015) Sequence-tosequence Neural Net Models for Grapheme-to-phoneme Conversion
- Yao, K.¹ Zweig, G.²

26
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- IEEE
- Zen, Heiga and Sak, Hasim. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 4470-4474. IEEE, 2015.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on , pp. 4470-4474
- Zen, H.¹ Sak, H.²

27
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- Zen, Heiga, Senior, Andrew, and Schuster, Mike. Statistical parametric speech synthesis using deep neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 7962-7966, 2013.
- (2013) Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.