SCOPUS 정보 검색 플랫폼

European Signal Processing Conference

Volumn , Issue , 2014, Pages 2290-2294

Voice source modelling using deep neural networks for statistical parametric speech synthesis

(7) Raitio, Tuomo a Lu, Heng b Kane, John c Suni, Antti d Vainio, Martti d King, Simon b Alku, Paavo a

a AALTO UNIVERSITY (Finland)

b UNIVERSITY OF EDINBURGH (United Kingdom)

c TRINITY COLLEGE DUBLIN (Ireland)

d UNIVERSITY OF HELSINKI (Finland)

Author keywords

Deep neural network; DNN; glottal flow; statistical parametric speech synthesis; voice source modelling

Indexed keywords

DEEP NEURAL NETWORKS; SIGNAL PROCESSING; SPEECH SYNTHESIS; TIME DOMAIN ANALYSIS;

ACOUSTIC FEATURES; GLOTTAL FLOW; MODELLING METHOD; PITCH SYNCHRONOUS; SPEECH DATABASE; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; SUBJECTIVE LISTENING TEST; VOICE SOURCES;

NEURAL NETWORKS;

EID: 84911869827 PISSN: 22195491 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (36)

References (27)

1
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T.Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2374-2350.
- (1999) Proc. Eurospeech , pp. 2374-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

2
- 67651002140
- Statistical parametric speech synthesis
- Heiga Zen, Keiichi Tokuda, and Alan W. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

3
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc., 1996, pp. 373-376.
- (1996) Proc. IEEE Int. Conf. Acoust. Speech Signal Proc , pp. 373-376
- Hunt, A.¹ Black, A.²

4
- 0016495091
- Linear prediction: A tutorial review
- J. Makhoul, "Linear prediction: A tutorial review," Proceedings of the IEEE, vol. 63, no. 4, pp. 561-580, 1975.
- (1975) Proceedings of the IEEE , vol.63 , Issue.4 , pp. 561-580
- Makhoul, J.¹

5
- 85135145847
- Speaker interpolation in HMM-based speech synthesis system
- T. Yoshimura, K. Tokuda, T.Masuko, T. Kobayashi, and T. Kitamura, "Speaker interpolation in HMM-based speech synthesis system," in Proc. Eurospeech, 1997, pp. 2523-2526.
- (1997) Proc. Eurospeech , pp. 2523-2526
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

6
- 33846406459
- Two-band excitation for HMMbased speech synthesis
- S.-J. Kim and M. Hahn, "Two-band excitation for HMMbased speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, 2007.
- (2007) IEICE Trans. Inf. Syst , vol.E90D
- Kim, S.-J.¹ Hahn, M.²

7
- 0032673049
- Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

8
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- H. Kawahara, Jo Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in 2nd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA), 2001.
- (2001) 2nd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA)
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

9
- 78649297510
- An excitation model for HMM-based speech synthesis based on residual modeling
- R. Maia, T. Toda, H. Zen, Y. Nankaku, and K. Tokuda, "An excitation model for HMM-based speech synthesis based on residual modeling," in 6th ISCA Workshop on Speech Synthesis, 2007.
- (2007) 6th ISCA Workshop on Speech Synthesis
- Maia, R.¹ Toda, T.² Zen, H.³ Nankaku, Y.⁴ Tokuda, K.⁵

10
- 84865797109
- Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters
- R. Maia, H. Zen, and M. J. F. Gales, "Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters," in 7th ISCA Speech Synthesis Workshop, 2010, pp. 88-93.
- (2010) 7th ISCA Speech Synthesis Workshop , pp. 88-93
- Maia, R.¹ Zen, H.² Gales, M.J.F.³

11
- 82155160991
- Towards an improved modeling of the glottal source in statistical parametric speech synthesis
- J. Cabral, S. Renalds, K. Richmond, and J. Yamagishi, "Towards an improved modeling of the glottal source in statistical parametric speech synthesis," in 6th ISCA Workshop on Speech Synthesis, 2007, pp. 113-118.
- (2007) 6th ISCA Workshop on Speech Synthesis , pp. 113-118
- Cabral, J.¹ Renalds, S.² Richmond, K.³ Yamagishi, J.⁴

12
- 84867224654
- Glottal spectral separation for parametric speech synthesis
- J. Cabral, S. Renalds, K. Richmond, and J. Yamagishi, "Glottal spectral separation for parametric speech synthesis," in Proc. Interspeech, 2008, pp. 1829-1832.
- (2008) Proc. Interspeech , pp. 1829-1832
- Cabral, J.¹ Renalds, S.² Richmond, K.³ Yamagishi, J.⁴

13
- 0015699693
- The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer
- J. Holmes, "The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer," IEEE Trans. Audio and Electroac., vol. 21, no. 3, pp. 298-305, 1973.
- (1973) IEEE Trans. Audio and Electroac , vol.21 , Issue.3 , pp. 298-305
- Holmes, J.¹

14
- 0026387469
- Improving naturalness in text-to-speech synthesis using natural glottal source
- K. Matsui, S. D. Pearson, K. Hata, and T. Kamai, "Improving naturalness in text-to-speech synthesis using natural glottal source," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc., 1991, vol. 2, pp. 769-772.
- (1991) Proc. IEEE Int. Conf. Acoust. Speech Signal Proc , vol.2 , pp. 769-772
- Matsui, K.¹ Pearson, S.D.² Hata, K.³ Kamai, T.⁴

15
- 77957796737
- Hybrid time-and frequency-domain speech synthesis with extended glottal source generation
- G. Fries, "Hybrid time-and frequency-domain speech synthesis with extended glottal source generation," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc., 1994, vol. 1, pp. 581-584.
- (1994) Proc. IEEE Int. Conf. Acoust. Speech Signal Proc , vol.1 , pp. 581-584
- Fries, G.¹

16
- 84867209230
- HMM-based Finnish text-to-speech system utilizing glottal inverse filtering
- T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "HMM-based Finnish text-to-speech system utilizing glottal inverse filtering," in Proc. Interspeech, 2008, pp. 1881-1884.
- (2008) Proc. Interspeech , pp. 1881-1884
- Raitio, T.¹ Suni, A.² Pulakka, H.³ Vainio, M.⁴ Alku, P.⁵

17
- 77957744515
- HMM-based speech synthesis utilizing glottal inverse filtering
- T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, "HMM-based speech synthesis utilizing glottal inverse filtering," IEEE Trans. Audio Speech Lang. Proc., vol. 19, no. 1, pp. 153-165, 2011.
- (2011) IEEE Trans. Audio Speech Lang. Proc , vol.19 , Issue.1 , pp. 153-165
- Raitio, T.¹ Suni, A.² Yamagishi, J.³ Pulakka, H.⁴ Nurminen, J.⁵ Vainio, M.⁶ Alku, P.⁷

18
- 70450204573
- A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis
- T. Drugman, G. Wilfart, and T. Dutoit, "A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis," in Proc. Interspeech, 2009, pp. 1779-1782.
- (2009) Proc. Interspeech , pp. 1779-1782
- Drugman, T.¹ Wilfart, G.² Dutoit, T.³

19
- 79959855183
- Excitation modeling based on waveform interpolation for HMM-based speech synthesis
- J. Sung, D. Hong, K. Oh, and N. Kim, "Excitation modeling based on waveform interpolation for HMM-based speech synthesis," in Proc. Interspeech, 2010, pp. 813-816.
- (2010) Proc. Interspeech , pp. 813-816
- Sung, J.¹ Hong, D.² Oh, K.³ Kim, N.⁴

20
- 84856248602
- The deterministic plus stochastic model of the residual signal and its applications
- T. Drugman and T. Dutoit, "The deterministic plus stochastic model of the residual signal and its applications," IEEE Trans. Audio Speech Lang. Proc., vol. 20, no. 3, pp. 968-981, 2012.
- (2012) IEEE Trans. Audio Speech Lang. Proc , vol.20 , Issue.3 , pp. 968-981
- Drugman, T.¹ Dutoit, T.²

21
- 84890462419
- Comparing glottal-flow-excited statistical parametric speech synthesis methods
- T. Raitio, A. Suni, M. Vainio, and P. Alku, "Comparing glottal-flow-excited statistical parametric speech synthesis methods," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc., 2013, pp. 7830-7834.
- (2013) Proc. IEEE Int. Conf. Acoust. Speech Signal Proc , pp. 7830-7834
- Raitio, T.¹ Suni, A.² Vainio, M.³ Alku, P.⁴

22
- 67650793794
- Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis
- T. Drugman, G. Wilfart, A. Moinet, and T. Dutoit, "Using a pitch-synchronous residual codebook for hybrid HMM/frame selection speech synthesis," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc., 2009, pp. 3793-3796.
- (2009) Proc. IEEE Int. Conf. Acoust. Speech Signal Proc , pp. 3793-3796
- Drugman, T.¹ Wilfart, G.² Moinet, A.³ Dutoit, T.⁴

23
- 80051650578
- Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis
- T. Raitio, A. Suni, H. Pulakka,M. Vainio, and P. Alku, "Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc., 2011, pp. 4564-4567.
- (2011) Proc. IEEE Int. Conf. Acoust. Speech Signal Proc , pp. 4564-4567
- Raitio, T.¹ Suni, A.² Pulakkam. Vainio, H.³ Alku, P.⁴

24
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition," IEEE Sig. Proc. Mag., vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Sig. Proc. Mag , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

25
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. IEEE Int. Conf. Acoust. Speech Signal Proc., 2013, pp. 7962-7966.
- (2013) Proc. IEEE Int. Conf. Acoust. Speech Signal Proc , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

26
- 0026881384
- Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
- P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering," Speech Commun., vol. 11, no. 2-3, pp. 109-118, 1992.
- (1992) Speech Commun , vol.11 , Issue.2-3 , pp. 109-118
- Alku, P.¹

27
- 85133720638
- The HMM-based speech synthesis system (HTS) version 2.0
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2.0," in 6th ISCA Workshop on Speech Synthesis, 2007, pp. 294-299.
- (2007) 6th ISCA Workshop on Speech Synthesis , pp. 294-299
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.⁶ Tokuda, K.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.