SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 854-858

Fusion of multiple parameterisations for DNN-based sinusoidal speech synthesis with multi-task learning

(6) Hu, Qiong a Wu, Zhizheng a Richmond, Korin a Yamagishi, Junichi a Stylianou, Yannis b Maia, Ranniery b

a UNIVERSITY OF EDINBURGH (United Kingdom)

b TOSHIBA CORPORATION (Japan)

Author keywords

Deep neural network; Fusion vocoder; Sinusoidal model; Statistical speech synthesis

Indexed keywords

DECISION TREES; LEARNING SYSTEMS; PARAMETERIZATION; SPEECH; SPEECH SYNTHESIS; VOCODERS;

DEEP NEURAL NETWORKS; HARMONIC AMPLITUDE; MULTITASK LEARNING; SECONDARY TASKS; SINUSOIDAL MODEL; SOURCE FILTERS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; WAVEFORM GENERATION;

SPEECH COMMUNICATION;

EID: 84959144342 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (32)

References (29)

1
- 84930664922
- Vocaine the vocoder and applicationsin speech synthesis
- Y. Agiomyrgiannakis. Vocaine the vocoder and applicationsin speech synthesis. In Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Agiomyrgiannakis, Y.¹

2
- 0041714576
- Multitask learning: A knowledge-based sourceof inductive bias
- R. Caruna. Multitask learning: A knowledge-based sourceof inductive bias. In Machine Learning: Proceedings ofthe Tenth International Conference, 1993.
- (1993) Machine Learning: Proceedings Ofthe Tenth International Conference
- Caruna, R.¹

3
- 56449095373
- A unified architecture fornatural language processing: Deep neural networks withmultitask learning
- ACM
- R. Collobert and J. Weston. A unified architecture fornatural language processing: Deep neural networks withmultitask learning. In Proceedings of the 25th internationalconference on Machine learning, pages 160-167. ACM, 2008.
- (2008) Proceedings of the 25th Internationalconference on Machine Learning , pp. 160-167
- Collobert, R.¹ Weston, A.²

4
- 84881041616
- Analysis and synthesis ofspeech using an adaptive full-band harmonic model
- G. Degottex and Y. Stylianou. Analysis and synthesis ofspeech using an adaptive full-band harmonic model. IEEETransactions on Audio, Speech and Language Processing, 21 (10): 2085-2095, 2013.
- (2013) IEEETransactions on Audio, Speech and Language Processing , vol.21 , Issue.10 , pp. 2085-2095
- Degottex, G.¹ Stylianou, Y.²

5
- 84856248602
- The deterministic plus stochasticmodel of the residual signal and its applications
- T. Drugman and T. Dutoit. The deterministic plus stochasticmodel of the residual signal and its applications. IEEETransactions on Audio, Speech and Language Processing, 20 (3): 968-981, 2012.
- (2012) IEEETransactions on Audio, Speech and Language Processing , vol.20 , Issue.3 , pp. 968-981
- Drugman, T.¹ Dutoit, T.²

6
- 84897865577
- Harmonicsplus noise model based vocoder for statistical parametricspeech synthesis
- D. Erro, I. Sainz, E. Navas, and I. Hernaez. Harmonicsplus noise model based vocoder for statistical parametricspeech synthesis. IEEE Journal of Selected Topics in SignalProcessing, 8 (2): 184-194, 2014.
- (2014) IEEE Journal of Selected Topics in SignalProcessing , vol.8 , Issue.2 , pp. 184-194
- Erro, D.¹ Sainz, I.² Navas, E.³ Hernaez, I.⁴

7
- 85032751458
- Sainath deep neural networks for acoustic modelingin speech recognition: The shared views of four researchgroups
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, and T. Sainath. Deep neural networks for acoustic modelingin speech recognition: The shared views of four researchgroups. Signal Processing Magazine, IEEE, 29 (6): 82-97, 2012.
- (2012) Signal Processing Magazine, IEEE , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, T.P.⁹

8
- 85133196551
- Anexperimental comparison of multiple vocoder types
- Q. Hu, K. Richmond, J. Yamagishi, and J. Latorre. Anexperimental comparison of multiple vocoder types. InPro. 8th SSW, 2013.
- (2013) InPro. 8th SSW
- Hu, Q.¹ Richmond, K.² Yamagishi, J.³ Latorre, J.⁴

9
- 84946025802
- Methods for applying dynamic sinusoidal modelsto statistical parametric speech synthesis
- Q. Hu, Y. Stylianou, R. Maia, K. Richmond, and J. Yamagishi. Methods for applying dynamic sinusoidal modelsto statistical parametric speech synthesis. In Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Hu, Q.¹ Stylianou, Y.² Maia, R.³ Richmond, K.⁴ Yamagishi, J.⁵

10
- 84910049275
- An investigation of the application of dynamicsinusoidal models to statistical parametric speechsynthesis
- Q. Hu, Y. Stylianou, R. Maia, K. Richmond, J. Yamagishi, and J. Latorre. An investigation of the application of dynamicsinusoidal models to statistical parametric speechsynthesis. In Proc. Interspeech, 2014.
- (2014) Proc. Interspeech
- Hu, Q.¹ Stylianou, Y.² Maia, R.³ Richmond, K.⁴ Yamagishi, J.⁵ Latorre, J.⁶

11
- 84905280900
- A fixed dimension and perceptually baseddynamic sinusoidal model of speech
- Q. Hu, Y. Stylianou, K. Richmond, R. Maia, J. Yamagishi, and J. Latorre. A fixed dimension and perceptually baseddynamic sinusoidal model of speech. In Proc. ICASSP, 2014.
- (2014) Proc ICASSP
- Hu, Q.¹ Stylianou, Y.² Richmond, K.³ Maia, R.⁴ Yamagishi, J.⁵ Latorre, J.⁶

12
- 84976212707
- Sinusoidal speechsynthesis using deep neural networks
- Q. Hu, Z. Wu, K. Richmond, J. Yamagishi, Y. Stylianou, R. Maia, S. King, and M. Akamine. Sinusoidal speechsynthesis using deep neural networks. manuscript, 2015.
- (2015) Manuscript
- Hu, Q.¹ Wu, Z.² Richmond, K.³ Yamagishi, J.⁴ Stylianou, Y.⁵ Maia, R.⁶ King, S.⁷ Akamine, M.⁸

13
- 0032673049
- Restructuring speech representations using a pitchadaptivetime-frequency smoothing and an instantaneousfrequency-based F0 extraction: Possible role of a repetitivestructure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné. Restructuring speech representations using a pitchadaptivetime-frequency smoothing and an instantaneousfrequency-based F0 extraction: Possible role of a repetitivestructure in sounds. Speech communication, 27 (3): 187-207, 1999.
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

14
- 78649238036
- Synthesizer voicequality of new languages calibrated with mean mel cepstraldistortion
- J. Kominek, T. Schultz, and A. Black. Synthesizer voicequality of new languages calibrated with mean mel cepstraldistortion. In Pro. SLTU, 2008.
- (2008) Pro. SLTU
- Kominek, J.¹ Schultz, T.² Black, A.³

15
- 84890458846
- Multitask learning in connectionistspeech recognition
- Y. Lu, F. Lu, S. Sehgal, S. Gupta, J. Du, C. Tham, P. Green, and V. Wan. Multitask learning in connectionistspeech recognition. In Proceedings of the AustralianInternational Conference on Speech Science and Technology, 2004.
- (2004) Proceedings of the AustralianInternational Conference on Speech Science and Technology
- Lu, Y.¹ Lu, F.² Sehgal, S.³ Gupta, S.⁴ Du, J.⁵ Tham, C.⁶ Green, P.⁷ Wan, V.⁸

16
- 85009167968
- Multitask learning in connectionistrobust asr using recurrent neural networks
- S. Parveen and P. Green. Multitask learning in connectionistrobust asr using recurrent neural networks. In INTERSPEECH, 2003.
- (2003) INTERSPEECH
- Parveen, S.¹ Green, P.²

17
- 84905251808
- On the training aspectsof deep neural network for parametric tts synthesis. in Proc
- Y. Qian, Y. Fan, W. Hu, and F. Soong. On the training aspectsof deep neural network for parametric tts synthesis. In Proc. ICASSP, 2014.
- (2014) ICASSP
- Qian, Y.¹ Fan, Y.² Hu, W.³ Soong, F.⁴

18
- 77957744515
- HMM-based speech synthesisutilizing glottal inverse filtering
- T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku. HMM-based speech synthesisutilizing glottal inverse filtering. IEEE Transactions onAudio, Speech, and Language Processing, 19 (1): 153-165, 2011.
- (2011) IEEE Transactions OnAudio, Speech, and Language Processing , vol.19 , Issue.1 , pp. 153-165
- Raitio, T.¹ Suni, A.² Yamagishi, J.³ Pulakka, H.⁴ Nurminen, J.⁵ Vainio, M.⁶ Alku, P.⁷

19
- 84976247315
- K. Richmond, P. Hoole, and S. King. Announcingthe Electromagnetic Articulography (Day 1)Subset of the mngu0 Articulatory Corpus., 2011. http: //dx. doi. org/10. 7488/ds/140.
- (2011) Announcingthe Electromagnetic Articulography (Day 1)Subset of the mngu0 Articulatory Corpus.
- Richmond, K.¹ Hoole, P.² King, S.³

20
- 79959858197
- Sinusoidal model parameterizationfor HMM-based TTS system
- S. Shechtman and A. Sorin. Sinusoidal model parameterizationfor HMM-based TTS system. In Proc. Interspeech, 2010.
- (2010) Proc. Interspeech
- Shechtman, S.¹ Sorin, A.²

21
- 0003447548
- PhD thesis, Ecole Nationale Supérieure desTélécommunications
- Y. Stylianou. Harmonic plus noise models for speech, combined with statistical methods, for speech and speakermodification. PhD thesis, Ecole Nationale Supérieure desTélécommunications, 1996.
- (1996) Harmonic Plus Noise Models for Speech, Combined with Statistical Methods, for Speech and Speakermodification
- Stylianou, Y.¹

22
- 38549096029
- A speech parameter generationalgorithm considering global variance for HMM-basedspeech synthesis
- T. Toda and K. Tokuda. A speech parameter generationalgorithm considering global variance for HMM-basedspeech synthesis. IEICE Transactions on Information and Systems, 90 (5): 816-824, 2007.
- (2007) IEICE Transactions on Information and Systems , vol.90 , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, A.²

23
- 0033708106
- Speech parameter generation algorithms forHMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura. Speech parameter generation algorithms forHMM-based speech synthesis. In Proc. ICASSP, 2000.
- (2000) Proc. ICASSP
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

24
- 33947651202
- Multitask learning for spoken language understand ing
- G. Tur. Multitask learning for spoken language understand ing. In ICASSP, 2006.
- (2006) ICASSP
- Tur, G.¹

25
- 84946033275
- Deep neuralnetworks employing multi-task learning and stacked bottleneckfeatures for speech synthesis
- Z. Wu, C. Botinhao, O. Watts, and S. King. Deep neuralnetworks employing multi-task learning and stacked bottleneckfeatures for speech synthesis. In Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Wu, Z.¹ Botinhao, C.² Watts, O.³ King, S.⁴

26
- 85009139544
- Simultaneous modeling of spectrum, pitchand duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura. Simultaneous modeling of spectrum, pitchand duration in HMM-based speech synthesis. In Proc. Eurospeech, 1999.
- (1999) Proc. Eurospeech
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

27
- 84905262874
- Deep mixture density networks foracoustic modeling in statistical parametric speech synthesis
- H. Zen and A. Senior. Deep mixture density networks foracoustic modeling in statistical parametric speech synthesis. In Proc. ICASSP, 2014.
- (2014) Proc. ICASSP
- Zen, H.¹ Senior, A.²

28
- 84890490547
- Statistical parametricspeech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster. Statistical parametricspeech synthesis using deep neural networks. In Proc. ICASSP, 2013.
- (2013) Proc. ICASSP
- Zen, H.¹ Senior, A.² Schuster, M.³

29
- 67651002140
- Statistical parametricspeech synthesis
- H. Zen, K. Tokuda, and A. Black. Statistical parametricspeech synthesis. Speech Communication, 51 (11): 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.