SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn 2017-December, Issue , 2017, Pages 2963-2971

Deep voice 2: Multi-speaker neural text-to-speech

(8) Arik, Sercan Ö a Diamos, Gregory a Gibiansky, Andrew a Miller, John a Peng, Kainan a Ping, Wei a Raiman, Jonathan a Zhou, Yanqi a

a BAIDU INC (China)

Author keywords

[No Author keywords available]

Indexed keywords

SPEECH SYNTHESIS;

AUDIO QUALITY; BUILDING BLOCKES; LOW DIMENSIONAL; POST PROCESSING; SINGLE MODELS; SINGLE NEURAL; TEXT TO SPEECH; TTS SYSTEMS;

SOUND REPRODUCTION;

EID: 85046637415 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (411)

References (24)

1
- 84890452886
- Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
- O. Abdel-Hamid and H. Jiang. Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In ICASSP, 2013.
- (2013) ICASSP
- Abdel-Hamid, O.¹ Jiang, H.²

2
- 85039156048
- Deep voice: Real-time neural text-to-speech
- S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, J. Raiman, S. Sengupta, and M. Shoeybi. Deep voice: Real-time neural text-to-speech. In ICML, 2017.
- (2017) ICML
- Arik, S.O.¹ Chrzanowski, M.² Coates, A.³ Diamos, G.⁴ Gibiansky, A.⁵ Kang, Y.⁶ Li, X.⁷ Miller, J.⁸ Raiman, J.⁹ Sengupta, S.¹⁰ Shoeybi, M.¹¹

3
- 85088231695
- Quasi-recurrent neural networks
- J. Bradbury, S. Merity, C. Xiong, and R. Socher. Quasi-recurrent neural networks. In ICLR, 2017.
- (2017) ICLR
- Bradbury, J.¹ Merity, S.² Xiong, C.³ Socher, R.⁴

4
- 84919728106
- K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv:1406.1078, 2014.
- (2014) Learning Phrase Representations Using Rnn Encoder-decoder for Statistical Machine Translation
- Cho, K.¹ Van Merriënboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

5
- 84946051934
- Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis
- Y. Fan, Y. Qian, F. K. Soong, and L. He. Multi-speaker modeling and speaker adaptation for DNN-based TTS synthesis. In IEEE ICASSP, 2015.
- (2015) IEEE ICASSP
- Fan, Y.¹ Qian, Y.² Soong, F.K.³ He, L.⁴

6
- 33749259827
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML, 2006.
- (2006) ICML
- Graves, A.¹ Fernández, S.² Gomez, F.³ Schmidhuber, J.⁴

7
- 85047009420
- C.-C. Hsu, H.-T. Hwang, Y.-C. Wu, Y. Tsao, and H.-M. Wang. Voice conversion from unaligned corpora using variational autoencoding wasserstein generative adversarial networks. arXiv:1704.00849, 2017.
- (2017) Voice Conversion from Unaligned Corpora Using Variational Autoencoding Wasserstein Generative Adversarial Networks
- Hsu, C.-C.¹ Hwang, H.-T.² Wu, Y.-C.³ Tsao, Y.⁴ Wang, H.-M.⁵

8
- 84964923476
- S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 2015.
- (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Ioffe, S.¹ Szegedy, C.²

9
- 84941620184
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

10
- 84994130883
- Neural architectures for named entity recognition
- G. Lample, M. Ballesteros, K. Kawakami, S. Subramanian, and C. Dyer. Neural architectures for named entity recognition. In Proc. NAACL-HLT, 2016.
- (2016) Proc. NAACL-HLT
- Lample, G.¹ Ballesteros, M.² Kawakami, K.³ Subramanian, S.⁴ Dyer, C.⁵

11
- 85047019895
- C. Li, X. Ma, B. Jiang, X. Li, X. Zhang, X. Liu, Y. Cao, A. Kannan, and Z. Zhu. Deep speaker: an end-to-end neural speaker embedding system. arXiv preprint arXiv:1705.02304 2017.
- (2017) Deep Speaker: An End-to-end Neural Speaker Embedding System
- Li, C.¹ Ma, X.² Jiang, B.³ Li, X.⁴ Zhang, X.⁵ Liu, X.⁶ Cao, Y.⁷ Kannan, A.⁸ Zhu, Z.⁹

12
- 85039166060
- S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. Courville, and Y. Bengio. SampleRNN: An unconditional end-to-end neural audio generation model. arXiv:1612.07837, 2016.
- (2016) SampleRNN: An Unconditional End-to-end Neural Audio Generation Model
- Mehri, S.¹ Kumar, K.² Gulrajani, I.³ Kumar, R.⁴ Jain, S.⁵ Sotelo, J.⁶ Courville, A.⁷ Bengio, Y.⁸

13
- 85011070895
- A. v. d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv:1609.03499, 2016.
- (2016) Wavenet: A Generative Model for Raw Audio
- Van Den Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

14
- 0033884858
- Speaker verification using adapted Gaussian mixture models
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn. Speaker verification using adapted gaussian mixture models. Digital signal processing, 10(1-3):19-41, 2000.
- (2000) Digital Signal Processing , vol.10 , Issue.1-3 , pp. 19-41
- Reynolds, D.A.¹ Quatieri, T.F.² Dunn, R.B.³

15
- 85047003030
- Crowdmos: An approach for crowdsourcing mean opinion score studies
- F. Ribeiro, D. Florêncio, C. Zhang, and M. Seltzer. Crowdmos: An approach for crowdsourcing mean opinion score studies. In IEEE ICASSP, 2011.
- (2011) IEEE ICASSP
- Ribeiro, F.¹ Florêncio, D.² Zhang, C.³ Seltzer, M.⁴

16
- 85039173327
- S. Ronanki, O. Watts, S. King, and G. E. Henter. Median-based generation of synthetic speech durations using a non-parametric approach. arXiv:1608.06134, 2016.
- (2016) Median-based Generation of Synthetic Speech Durations Using a Non-parametric Approach
- Ronanki, S.¹ Watts, O.² King, S.³ Henter, G.E.⁴

17
- 85018875486
- Improved techniques for training gans
- T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In NIPS, 2016.
- (2016) NIPS
- Salimans, T.¹ Goodfellow, I.² Zaremba, W.³ Cheung, V.⁴ Radford, A.⁵ Chen, X.⁶

18
- 85122685393
- Char2wav: End-to-end speech synthesis
- J. Sotelo, S. Mehri, K. Kumar, J. F. Santos, K. Kastner, A. Courville, and Y. Bengio. Char2wav: End-to-end speech synthesis. In ICLR2017 workshop submission, 2017.
- (2017) ICLR2017 Workshop Submission
- Sotelo, J.¹ Mehri, S.² Kumar, K.³ Santos, J.F.⁴ Kastner, K.⁵ Courville, A.⁶ Bengio, Y.⁷

19
- 85038442478
- Tacotron: Towards end-to-end speech synthesis
- Y. Wang, R. Skerry-Ryan, D. Stanton, Y. Wu, R. J. Weiss, N. Jaitly, Z. Yang, Y. Xiao, Z. Chen, S. Bengio, et al. Tacotron: Towards end-to-end speech synthesis. In Interspeech, 2017.
- (2017) Interspeech
- Wang, Y.¹ Skerry-Ryan, R.² Stanton, D.³ Wu, Y.⁴ Weiss, R.J.⁵ Jaitly, N.⁶ Yang, Z.⁷ Xiao, Y.⁸ Chen, Z.⁹ Bengio, S.¹⁰

20
- 84959112868
- A study of speaker adaptation for DNN-based speech synthesis
- Z. Wu, P. Swietojanski, C. Veaux, S. Renals, and S. King. A study of speaker adaptation for DNN-based speech synthesis. In Interspeech, 2015.
- (2015) Interspeech
- Wu, Z.¹ Swietojanski, P.² Veaux, C.³ Renals, S.⁴ King, S.⁵

21
- 85008006694
- Robust speaker-adaptive hmm-based text-to-speech synthesis
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals. Robust speaker-adaptive hmm-based text-to-speech synthesis. IEEE Transactions on Audio, Speech, and Language Processing, 2009.
- (2009) IEEE Transactions on Audio, Speech, and Language Processing
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

22
- 85013762788
- On the training of DNN-based average voice model for speech synthesis
- Asia-Pacific
- S. Yang, Z. Wu, and L. Xie. On the training of DNN-based average voice model for speech synthesis. In Signal and Information Processing Association Annual Summit and Conference (APSIPA), Asia-Pacific, 2016.
- (2016) Signal and Information Processing Association Annual Summit and Conference (APSIPA)
- Yang, S.¹ Wu, Z.² Xie, L.³

23
- 85047016414
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- H. Zen and H. Sak. Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis. In IEEE ICASSP, 2015.
- (2015) IEEE ICASSP
- Zen, H.¹ Sak, H.²

24
- 85047009186
- H. Zen, Y. Agiomyrgiannakis, N. Egberts, F. Henderson, and P. Szczepaniak. Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices. arXiv:1606.06061, 2016.
- (2016) Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
- Zen, H.¹ Agiomyrgiannakis, Y.² Egberts, N.³ Henderson, F.⁴ Szczepaniak, P.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.