SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 864-868

Sequence generation error (SGE) minimization based deep neural networks training for text-to-speech synthesis

(4) Fan, Yuchen a Qian, Yao a Soong, Frank K a He, Lei a

a MICROSOFT RESEARCH ASIA (China)

Author keywords

Deep neural networks(DNNs); Sequence generation error (SGE) minimization training; Speech synthesis

Indexed keywords

DECISION TREES; ERRORS; SPEECH; SPEECH RECOGNITION; SPEECH SYNTHESIS;

BASE-LINE PERFORMANCE; CONTEXTUAL INFORMATION; DEEP NEURAL NETWORKS; MINIMIZATION CRITERIONS; MULTI-LAYERED STRUCTURE; SEQUENCE GENERATION; STATISTICAL CORRELATION; SUBJECTIVE LISTENING TEST;

SPEECH COMMUNICATION;

EID: 84959172579 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (13)

References (18)

1
- 67651002140
- Statistical parametricspeech synthesis
- H. Zen, K. Tokuda, and W. Black, Alan, "Statistical parametricspeech synthesis", Speech Communication, Volume 51, Issue 11, pp. 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Alan, B.W.³

2
- 85032751458
- Deep neural networks for acoustic modeling in speechrecognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speechrecognition: The shared views of four research groups, " SignalProcessing Magazine, IEEE, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) SignalProcessing Magazine, IEEE , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

3
- 84055222005
- Context-dependentpre-trained deep neural networks for large-vocabulary speechrecognition
- G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependentpre-trained deep neural networks for large-vocabulary speechrecognition, " IEEE Trans. on Audio, Speech, and LanguageProcessing, vol. 20, no. 1, pp. 30-42, 2012.
- (2012) IEEE Trans. on Audio, Speech, and LanguageProcessing , vol.20 , Issue.1 , pp. 30-42
- Dahl, G.E.¹ Yu, D.² Deng, L.³ Acero, A.⁴

4
- 84865801985
- Conversational speechtranscription using context-dependent deep neural networks
- F. Seide, G. Li, and D. Yu, "Conversational speechtranscription using context-dependent deep neural networks, " inProc. InterSpeech, pp. 437-440, 2011
- (2011) Proc. InterSpeech , pp. 437-440
- Seide, F.¹ Li, G.² Yu, D.³

5
- 84890490547
- Statistical parametric speechsynthesis using deep neural networks
- H. Zen, A. Senior and M. Senior, "Statistical Parametric SpeechSynthesis Using Deep Neural Networks", in Proc. ICASSP, pp. 8012-8016, 2013.
- (2013) Proc. ICASSP , pp. 8012-8016
- Zen, H.¹ Senior, A.² Senior, M.³

6
- 84905251808
- On the trainingaspects of deep neural network (DNN) for parametric TTSsynthesis
- Y. Qian, Y.-C. Fan, W.-P. Hu and F. K. Soong, "On the trainingaspects of deep neural network (DNN) for parametric TTSsynthesis", in Proc. ICASSP, pp. 3829-3833, 2014.
- (2014) Proc. ICASSP , pp. 3829-3833
- Qian, Y.¹ Fan, Y.-C.² Hu, W.-P.³ Soong, F.K.⁴

7
- 84929157442
- Combining a vector spacerepresentation of linguistic context with a deep neural networkfor text-to-speech synthesis
- H. Lu, S. King, and O. Watts, "Combining a vector spacerepresentation of linguistic context with a deep neural networkfor text-to-speech synthesis", in 8th ISCA Workshop on SpeechSynthesis, pp. 281-285, 2013.
- (2013) 8th ISCA Workshop on SpeechSynthesis , pp. 281-285
- Lu, H.¹ King, S.² Watts, O.³

8
- 84890527090
- Multi-distribution deep beliefnetwork for speech synthesis
- S. Kang, X. Qian, and H. Meng, "Multi-distribution deep beliefnetwork for speech synthesis", in Proc. ICASSP, pp. 7962-7966, 2013.
- (2013) Proc. ICASSP , pp. 7962-7966
- Kang, S.¹ Qian, X.² Meng, H.³

9
- 84890447002
- Modeling spectral envelopesusing restricted Boltzmann machines for statistical parametricspeech synthesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopesusing restricted Boltzmann machines for statistical parametricspeech synthesis", in Proc. ICASSP, pp. 7825-7829, 2013.
- (2013) Proc. ICASSP , pp. 7825-7829
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

10
- 0033708106
- Speech parameter generation algorithms for HMMbasedspeech synthesis
- K. Tokuda, T. Kobayashi, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMMbasedspeech synthesis", in Proc. ICASSP, pp. 1315-1318, 2000.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Kobayashi, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

11
- 33846429403
- Minimum generation error trainingfor HMM-based speech synthesis
- Y.-J. Wu and R.-H. Wang, "Minimum generation error trainingfor HMM-based speech synthesis", in Proc. ICASSP, pp. I, I, 2006.
- (2006) Proc. ICASSP , pp. I-I
- Wu, Y.-J.¹ Wang, R.-H.²

12
- 79959840616
- Investigation of full-sequence training of deep belief networks for speechrecognition
- A.-R. Mohamed, D. Yu and L. Deng, "Investigation of Full-Sequence Training of Deep Belief Networks for SpeechRecognition", in Proc. Interspeech, pp. 2846-2849, 2010.
- (2010) Proc. Interspeech , pp. 2846-2849
- Mohamed, A.-R.¹ Yu, D.² Deng, L.³

13
- 84890543852
- Error back propagation forsequence training of context-dependent deep networks forconversational speech transcription
- H. Su, G. Li, D. Yu, and F. Seide, "Error back propagation forsequence training of context-dependent deep networks forconversational speech transcription", in Proc. ICASSP, pp. 6664-6668, 2013
- (2013) Proc. ICASSP , pp. 6664-6668
- Su, H.¹ Li, G.² Yu, D.³ Seide, F.⁴

14
- 0022471098
- Learningrepresentations by back-propagating errors
- D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learningrepresentations by back-propagating errors, " Nature, vol. 323, no. 9, pp. 533-536, 1986.
- (1986) Nature , vol.323 , Issue.9 , pp. 533-536
- Rumelhart, D.E.¹ Hinton, G.E.² Williams, R.J.³

15
- 84905283451
- New methods incontinuous Mand arin speech recognition
- C. Julian Chen, Ramesh A. Gopinath, Michael D. Monkowski, Michael A. Picheny, and Katherine Shen, "New methods incontinuous Mand arin speech recognition., " in Proc. EUROSPEECH, pp. 1543-1546, 1997.
- (1997) Proc. EUROSPEECH , pp. 1543-1546
- Julian Chen, C.¹ Gopinath, R.A.² Monkowski, M.D.³ Picheny, M.A.⁴ Shen, K.⁵

16
- 67650851754
- USTC System for Blizzard Challenge 2006 an ImprovedHMM-based Speech Synthesis Method
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC System for Blizzard Challenge 2006 an ImprovedHMM-based Speech Synthesis Method, " in Proc. BlizzardChallenge 2006 Workshop, 2006.
- (2006) Proc. BlizzardChallenge 2006 Workshop
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

17
- 84959149034
- Amazon Mechanical Turk
- Amazon Mechanical Turk, Avaliable: https: //www. mturk. com/mturk/welcome

18
- 84910087395
- Sequence error(SE) minimization training of neural network for voiceconversion
- F.-L. Xie, Y. Qian, Y.-C. Fan, F. K. Soong, "Sequence Error(SE) Minimization Training of Neural Network for VoiceConversion", in Proc. Interspeech, 2014.
- (2014) Proc. Interspeech
- Xie, F.-L.¹ Qian, Y.² Fan, Y.-C.³ Soong, F.K.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.