SCOPUS 정보 검색 플랫폼

2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA 2016

Volumn , Issue , 2017, Pages

On the training of DNN-based average voice model for speech synthesis

(3) Yang, Shan a Wu, Zhizheng b Xie, Lei a

a NORTHWESTERN POLYTECHNICAL UNIVERSITY (China)

b UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

[No Author keywords available]

Indexed keywords

DEEP NEURAL NETWORKS; LINGUISTICS; SPEECH PROCESSING;

AVERAGE VOICE MODELS; LINGUISTIC FEATURES; SPEAKER DEPENDENTS; SPEAKER NORMALISATION; SPEAKER SPECIFIC INFORMATIONS; SPEECH SYNTHESIS SYSTEM; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; SYSTEMATIC ANALYSIS;

SPEECH SYNTHESIS;

EID: 85013762788 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/APSIPA.2016.7820818 Document Type: Conference Paper

Times cited : (19)

References (28)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 84876687945
- Speech synthesis based on hidden markov models
- K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden markov models," Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, 2013.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

3
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- IEEE
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on. IEEE, 2013, pp. 7962-7966.
- (2013) Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

4
- 84973359646
- From hmms to dnns: Where do the improvements come from?
- O. Watts, G. E. Henter, T. Merritt, Z. Wu, and S. King, "From hmms to dnns: where do the improvements come from?" in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2016.
- (2016) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- Watts, O.¹ Henter, G.E.² Merritt, T.³ Wu, Z.⁴ King, S.⁵

5
- 84905251808
- On the training aspects of deep neural network (dnn) for parametric tts synthesis
- IEEE
- Y. Qian, Y. Fan,W. Hu, and F. K. Soong, "On the training aspects of deep neural network (dnn) for parametric tts synthesis," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 3829-3833.
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 3829-3833
- Qian, Y.¹ Fan, Y.² Hu, W.³ Soong, F.K.⁴

6
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 10, pp. 2129-2139, 2013.
- (2013) Audio, Speech, and Language Processing, IEEE Transactions on , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

7
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- IEEE
- H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis," in Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on. IEEE, 2014, pp. 3844-3848.
- (2014) Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on , pp. 3844-3848
- Zen, H.¹ Senior, A.²

8
- 84946033275
- Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
- IEEE
- Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4460-4464.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on , pp. 4460-4464
- Wu, Z.¹ Valentini-Botinhao, C.² Watts, O.³ King, S.⁴

9
- 84946074523
- The effect of neural networks in statistical parametric speech synthesis
- IEEE
- K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, "The effect of neural networks in statistical parametric speech synthesis," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4455-4459.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on , pp. 4455-4459
- Hashimoto, K.¹ Oura, K.² Nankaku, Y.³ Tokuda, K.⁴

10
- 0007985533
- Speaker adaptation for hmm-based speech synthesis system using mllr
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Speaker adaptation for hmm-based speech synthesis system using mllr," in The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis, 1998.
- (1998) The Third ESCA/COCOSDA Workshop (ETRW) on Speech Synthesis
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

11
- 85008066911
- Speaker adaptation of pitch and spectrum for hmm-based speech synthesis
- M. TAMURA, T. MASUKO, K. TOKUDA, and T. KOBAYASHI, "Speaker adaptation of pitch and spectrum for hmm-based speech synthesis," IEICE transactions on information and systems, vol. 85, no. 4, p. 793, 2002.
- (2002) IEICE Transactions on Information and Systems , vol.85 , Issue.4 , pp. 793
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

12
- 33847129573
- Average-voice-based speech synthesis using hsmm-based speaker adaptation and adaptive training
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using hsmm-based speaker adaptation and adaptive training," IEICE TRANSACTIONS on Information and Systems, vol. 90, no. 2, pp. 533- 543, 2007.
- (2007) IEICE TRANSACTIONS on Information and Systems , vol.90 , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

13
- 67650854725
- Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 17, no. 1, pp. 66- 83, 2009.
- (2009) Audio, Speech, and Language Processing, IEEE Transactions on , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

14
- 85008006694
- Robust speaker-adaptive hmm-based text-to-speech synthesis
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, "Robust speaker-adaptive hmm-based text-to-speech synthesis," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 17, no. 6, pp. 1208-1230, 2009.
- (2009) Audio, Speech, and Language Processing, IEEE Transactions on , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

15
- 84946051934
- Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis
- IEEE
- Y. Fan, Y. Qian, F. K. Soong, and L. He, "Multi-speaker modeling and speaker adaptation for dnn-based tts synthesis," in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4475-4479.
- (2015) Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on , pp. 4475-4479
- Fan, Y.¹ Qian, Y.² Soong, F.K.³ He, L.⁴

16
- 84973333167
- Speaker and language factorization in dnn-based tts synthesis
- IEEE
- -, "Speaker and language factorization in dnn-based tts synthesis," in IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2016, pp. 5540-5544.
- (2016) IEEE International Conference on Acoustics, Speech and Signal Processing , pp. 5540-5544
- Fan, Y.¹ Qian, Y.² Soong, F.K.³ He, L.⁴

17
- 84959112868
- A study of speaker adaptation for dnn-based speech synthesis
- Z. Wu, P. Swietojanski, C. Veaux, S. Renals, and S. King, "A study of speaker adaptation for dnn-based speech synthesis," in Proceedings interspeech, 2015.
- (2015) Proceedings Interspeech
- Wu, Z.¹ Swietojanski, P.² Veaux, C.³ Renals, S.⁴ King, S.⁵

18
- 84959106025
- Sentence-level control vectors for deep neural network speech synthesis
- O. Watts, Z. Wu, and S. King, "Sentence-level control vectors for deep neural network speech synthesis," in Interspeech, 2015.
- (2015) Interspeech
- Watts, O.¹ Wu, Z.² King, S.³

19
- 84865733857
- Analysis of i-vector length normalization in speaker recognition systems
- D. Garcia-Romero and C. Y. Espy-Wilson, "Analysis of i-vector length normalization in speaker recognition systems." in Interspeech, 2011, pp. 249-252.
- (2011) Interspeech , pp. 249-252
- Garcia-Romero, D.¹ Espy-Wilson, C.Y.²

20
- 33644965031
- Fisher discriminant analysis with kernels
- B. Scholkopft and K.-R. Mullert, "Fisher discriminant analysis with kernels," Neural networks for signal processing IX, vol. 1, no. 1, p. 1, 1999.
- (1999) Neural Networks for Signal Processing IX , vol.1 , Issue.1 , pp. 1
- Scholkopft, B.¹ Mullert, K.-R.²

21
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification," Audio, Speech, and Language Processing, IEEE Transactions on, vol. 19, no. 4, pp. 788-798, 2011.
- (2011) Audio, Speech, and Language Processing, IEEE Transactions on , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

22
- 0000764772
- The use of multiple measurements in taxonomic problems
- R. A. Fisher, "The use of multiple measurements in taxonomic problems," Annals of eugenics, vol. 7, no. 2, pp. 179-188, 1936.
- (1936) Annals of Eugenics , vol.7 , Issue.2 , pp. 179-188
- Fisher, R.A.¹

23
- 0001565436
- The utilization of multiple measurements in problems of biological classification
- C. R. Rao, "The utilization of multiple measurements in problems of biological classification," Journal of the Royal Statistical Society. Series B (Methodological), vol. 10, no. 2, pp. 159-203, 1948.
- (1948) Journal of the Royal Statistical Society. Series B (Methodological) , vol.10 , Issue.2 , pp. 159-203
- Rao, C.R.¹

24
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. De Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds," Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigne, A.³

25
- 85013748715
- The festival speech synthesis system version 1.4.2
- Jun
- A. Black, P. Taylor, R. Caley, R. Clark, K. Richmond, S. King, V. Strom, and H. Zen, "The festival speech synthesis system version 1.4.2," Software, Jun 2001. [Online]. Available: http://www.cstr.ed.ac.uk/projects/festival/
- (2001) Software
- Black, A.¹ Taylor, P.² Caley, R.³ Clark, R.⁴ Richmond, K.⁵ King, S.⁶ Strom, V.⁷ Zen, H.⁸

26
- 85111073935
- Merlin: An open source neural network speech synthesis system
- Sunnyvale, CA, USA, September
- Z. Wu, O. Watts, and S. King, "Merlin: An open source neural network speech synthesis system," in 9th ISCA Speech Synthesis Workshop (SSW9), Sunnyvale, CA, USA, September 2016.
- (2016) 9th ISCA Speech Synthesis Workshop (SSW9)
- Wu, Z.¹ Watts, O.² King, S.³

27
- 84910024698
- Msr identity toolbox v1. 0: A matlab toolbox for speaker-recognition research
- S. O. Sadjadi, M. Slaney, and L. Heck, "Msr identity toolbox v1. 0: A matlab toolbox for speaker-recognition research," Speech and Language Processing Technical Committee Newsletter, 2013.
- (2013) Speech and Language Processing Technical Committee Newsletter
- Sadjadi, S.O.¹ Slaney, M.² Heck, L.³

28
- 57249084011
- Visualizing data using t-sne
- L. Van der Maaten and G. Hinton, "Visualizing data using t-sne," Journal of Machine Learning Research, vol. 9, no. 2579-2605, p. 85, 2008.
- (2008) Journal of Machine Learning Research , vol.9 , Issue.2579-2605 , pp. 85
- Van der Maaten, L.¹ Hinton, G.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.