SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn , Issue , 2014, Pages 1969-1973

Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort

(5) Raitio, Tuomo a Suni, Antti b Juvela, Lauri a Vainio, Martti b Alku, Paavo a

a AALTO UNIVERSITY (Finland)

b UNIVERSITY OF HELSINKI (Finland)

Author keywords

Deep neural network; DNN; Glottal flow; Speech synthesis; Vocal effort; Voice source modelling

Indexed keywords

SPEECH SYNTHESIS; TIME DOMAIN ANALYSIS;

DEEP NEURAL NETWORKS; DNN; GLOTTAL FLOW; VOCAL EFFORTS; VOICE SOURCES;

SPEECH COMMUNICATION;

EID: 84910068090 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (27)

References (40)

1
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in hmm-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, " in Proc. Euro speech, 1999, pp. 2374-2350.
- (1999) Proc. Euro Speech , pp. 2350-2374
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

2
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Commun., vol. 51, no. 11, pp. 1039- 1064, 2009.
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

3
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), 1996, pp. 373-376.
- (1996) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , pp. 373-376
- Hunt, A.¹ Black, A.²

4
- 0034842740
- Adaptation of pitch and spectrum for hmm-based speech synthesis using mllr
- M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), 2001, pp. 805-808.
- (2001) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , pp. 805-808
- Tamura, M.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴

5
- 85135145847
- Speaker interpolation in hmm-based speech synthesis system
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Speaker interpolation in HMM-based speech synthesis system, " in Proc. Euro speech, 1997, pp. 2523-2526.
- (1997) Proc. Euro Speech , pp. 2523-2526
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

6
- 85009257840
- Eigenvoices for hmm-based speech synthesis
- K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Eigenvoices for HMM-based speech synthesis, " in Proc. ICSLP, 2002, pp. 1269-1272.
- (2002) Proc. ICSLP , pp. 1269-1272
- Shichiri, K.¹ Sawabe, A.² Tokuda, K.³ Masuko, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

7
- 51449114529
- A style control technique for hmm-based expressive speech synthesis
- T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis, " IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.
- (2007) IEICE Trans. Inf. Syst , vol.E90-D , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

8
- 67650854725
- Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, " IEEE Trans. Audio, Speech and Lang. Proc., vol. 17, no. 1, pp. 66-83, 2009.
- (2009) IEEE Trans. Audio, Speech and Lang. Proc. , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

9
- 33846935000
- Hmm-based korean speech synthesis system for hand-held devices
- S.-J. Kim, J.-J. Kim, and M.-S. Hahn, "HMM-based Korean speech synthesis system for hand-held devices, " IEEE Trans. Consum. Electron., vol. 52, no. 4, pp. 1384-1390, 2006.
- (2006) IEEE Trans. Consum. Electron. , vol.52 , Issue.4 , pp. 1384-1390
- Kim, S.-J.¹ Kim, J.-J.² Hahn, M.-S.³

10
- 79959839868
- Quantized hmms for low footprint text-to-speech synthesis
- A. Gutkin, X. Gonzalvo, S. Breuer, and P. Taylor, "Quantized HMMs for low footprint text-to-speech synthesis, " in Proc. Inter speech, 2010, pp. 837-840.
- (2010) Proc. Inter Speech , pp. 837-840
- Gutkin, A.¹ Gonzalvo, X.² Breuer, S.³ Taylor, P.⁴

11
- 85008006694
- Robust speaker-adaptive hmm-based text-to-speech synthesis
- J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals, "Robust speaker-adaptive HMM-based text-to-speech synthesis, " IEEE Trans. Audio, Speech and Lang. Proc., vol. 17, no. 6, pp. 1208-1230, 2009.
- (2009) IEEE Trans. Audio, Speech and Lang. Proc. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Ling, Z.-H.⁴ Toda, T.⁵ Tokuda, K.⁶ King, S.⁷ Renals, S.⁸

12
- 0016495091
- Linear prediction: A tutorial review
- J. Makhoul, "Linear prediction: A tutorial review, " Proceedings of the IEEE, vol. 63, no. 4, pp. 561-580, 1975.
- (1975) Proceedings of the IEEE , vol.63 , Issue.4 , pp. 561-580
- Makhoul, J.¹

13
- 85009097254
- Mixed excitation for hmm-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Mixed excitation for HMM-based speech synthesis, " in Proc. Euro speech, 2001, pp. 2259-2262.
- (2001) Proc. Euro Speech , pp. 2259-2262
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

14
- 33846406459
- Two-band excitation for hmm-based speech synthesis
- S. J. Kim and M. Hahn, "Two-band excitation for HMM-based speech synthesis, " IEICE Trans. Inf. & Syst., vol. E90-D, 2007.
- (2007) IEICE Trans. Inf. & Syst. , vol.E90-D
- Kim, S.J.¹ Hahn, M.²

15
- 0032673049
- Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigńe, "Restructuring speech representations using a pitch-adaptive time frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds, " Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

16
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT, " in 2nd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA), 2001.
- (2001) 2nd International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (MAVEBA)
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

17
- 78649297510
- An excitation model for hmm-based speech synthesis based on residual modeling
- R. Maia, T. Toda, H. Zen, Y. Nankaku, and K. Tokuda, "An excitation model for HMM-based speech synthesis based on residual modeling, " in 6th ISCA Speech Synthesis Workshop, 2007.
- (2007) 6th ISCA Speech Synthesis Workshop
- Maia, R.¹ Toda, T.² Zen, H.³ Nankaku, Y.⁴ Tokuda, K.⁵

18
- 84865797109
- Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters
- R. Maia, H. Zen, and M. J. F. Gales, "Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters, " in 7th ISCA Speech Synthesis Workshop, 2010, pp. 88-93.
- (2010) 7th ISCA Speech Synthesis Workshop , pp. 88-93
- Maia, R.¹ Zen, H.² Gales, M.J.F.³

19
- 82155160991
- Towards an improved modeling of the glottal source in statistical parametric speech synthesis
- J. Cabral, S. Renalds, K. Richmond, and J. Yamagishi, "Towards an improved modeling of the glottal source in statistical parametric speech synthesis, " in Sixth ISCA Workshop on Speech Synthesis, 2007, pp. 113-118.
- (2007) Sixth ISCA Workshop on Speech Synthesis , pp. 113-118
- Cabral, J.¹ Renalds, S.² Richmond, K.³ Yamagishi, J.⁴

20
- 84867224654
- Glottal spectral separation for parametric speech synthesis
- J. Cabral, S. Renalds, K. Richmond, and J. Yamagishi, "Glottal spectral separation for parametric speech synthesis, " in Proc. Inter speech, 2008, pp. 1829-1832.
- (2008) Proc. Inter Speech , pp. 1829-1832
- Cabral, J.¹ Renalds, S.² Richmond, K.³ Yamagishi, J.⁴

21
- 0015699693
- The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer
- J. Holmes, "The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer, " IEEE Trans. Audio and Electro acoustics, vol. 21, no. 3, pp. 298-305, 1973.
- (1973) IEEE Trans. Audio and Electro Acoustics , vol.21 , Issue.3 , pp. 298-305
- Holmes, J.¹

22
- 0026387469
- Improving naturalness in text-to-speech synthesis using natural glottal source
- K. Matsui, S. D. Pearson, K. Hata, and T. Kamai, "Improving naturalness in text-to-speech synthesis using natural glottal source, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), vol. 2, 1991, pp. 769-772.
- (1991) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , vol.2 , pp. 769-772
- Matsui, K.¹ Pearson, S.D.² Hata, K.³ Kamai, T.⁴

23
- 77957796737
- Hybrid time- And frequency-domain speech synthesis with extended glottal source generation
- G. Fries, "Hybrid time- And frequency-domain speech synthesis with extended glottal source generation, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), vol. 1, 1994, pp. 581-584.
- (1994) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , vol.1 , pp. 581-584
- Fries, G.¹

24
- 84867209230
- Hmm based finnish text-to-speech system utilizing glottal inverse filtering
- T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "HMM based Finnish text-to-speech system utilizing glottal inverse filtering, " in Proc. Inter speech, 2008, pp. 1881-1884.
- (2008) Proc. Inter Speech , pp. 1881-1884
- Raitio, T.¹ Suni, A.² Pulakka, H.³ Vainio, M.⁴ Alku, P.⁵

25
- 77957744515
- Hmm-based speech synthesis utilizing glottal inverse filtering
- T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, "HMM-based speech synthesis utilizing glottal inverse filtering, " IEEE Trans. Audio, Speech and Lang. Proc., vol. 19, no. 1, pp. 153-165, 2011.
- (2011) IEEE Trans. Audio, Speech and Lang. Proc. , vol.19 , Issue.1 , pp. 153-165
- Raitio, T.¹ Suni, A.² Yamagishi, J.³ Pulakka, H.⁴ Nurminen, J.⁵ Vainio, M.⁶ Alku, P.⁷

26
- 67650793794
- Using a pitchsynchronous residual codebook for hybrid hmm/frame selection speech synthesis
- T. Drugman, G.Wilfart, A. Moinet, and T. Dutoit, "Using a pitchsynchronous residual codebook for hybrid HMM/frame selection speech synthesis, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), 2009, pp. 3793-3796.
- (2009) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , pp. 3793-3796
- Drugman, T.¹ Wilfart, G.² Moinet, A.³ Dutoit, T.⁴

27
- 79959855183
- Excitation modeling based on waveform interpolation for hmm-based speech synthesis
- J. Sung, D. Hong, K. Oh, and N. Kim, "Excitation modeling based on waveform interpolation for HMM-based speech synthesis, " in Proc. Inter speech, 2010, pp. 813-816.
- (2010) Proc. Inter Speech , pp. 813-816
- Sung, J.¹ Hong, D.² Oh, K.³ Kim, N.⁴

28
- 84856248602
- The deterministic plus stochastic model of the residual signal and its applications
- T. Drugman and T. Dutoit, "The deterministic plus stochastic model of the residual signal and its applications, " IEEE Trans. Audio, Speech and Lang. Proc., vol. 20, no. 3, pp. 968-981, 2012.
- (2012) IEEE Trans. Audio, Speech and Lang. Proc. , vol.20 , Issue.3 , pp. 968-981
- Drugman, T.¹ Dutoit, T.²

29
- 84890462419
- Comparing glottal flow- excited statistical parametric speech synthesis methods
- T. Raitio, A. Suni, M. Vainio, and P. Alku, "Comparing glottalflow- excited statistical parametric speech synthesis methods, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), 2013, pp. 7830-7834.
- (2013) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , pp. 7830-7834
- Raitio, T.¹ Suni, A.² Vainio, M.³ Alku, P.⁴

30
- 70450204573
- A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis
- T. Drugman, G. Wilfart, and T. Dutoit, "A deterministic plus stochastic model of the residual signal for improved parametric speech synthesis, " in Proc. Inter speech, 2009, pp. 1779-1782.
- (2009) Proc. Inter Speech , pp. 1779-1782
- Drugman, T.¹ Wilfart, G.² Dutoit, T.³

31
- 80051650578
- Utilizing glottal source pulse library for generating improved excitation signal for hmm-based speech synthesis
- T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Utilizing glottal source pulse library for generating improved excitation signal for HMM-based speech synthesis, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), 2011, pp. 4564-4567.
- (2011) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , pp. 4564-4567
- Raitio, T.¹ Suni, A.² Pulakka, H.³ Vainio, M.⁴ Alku, P.⁵

32
- 84890555694
- Analysis and hmm-based synthesis of hypo and hyperarticulated speech
- B. Picart, T. Drugman, and T. Dutoit, "Analysis and HMM-based synthesis of hypo and hyperarticulated speech, " Computer Speech & Language, vol. 28, no. 2, pp. 687-707, 2014.
- (2014) Computer Speech & Language , vol.28 , Issue.2 , pp. 687-707
- Picart, B.¹ Drugman, T.² Dutoit, T.³

33
- 84890547237
- Synthesis and perception of breathy, normal, and lombard speech in the presence of noise
- T. Raitio, A. Suni, M. Vainio, and P. Alku, "Synthesis and perception of breathy, normal, and lombard speech in the presence of noise, " Computer Speech & Language, vol. 28, no. 2, pp. 648- 664, 2014.
- (2014) Computer Speech & Language , vol.28 , Issue.2 , pp. 648-664
- Raitio, T.¹ Suni, A.² Vainio, M.³ Alku, P.⁴

34
- 84905221871
- Hmm-based synthesis of creaky voice
- T. Raitio, J. Kane, T. Drugman, and C. Gobl, "HMM-based synthesis of creaky voice, " in Proc. Inter speech, 2013, pp. 2316- 2320.
- (2013) Proc. Inter Speech , pp. 2316-2320
- Raitio, T.¹ Kane, J.² Drugman, T.³ Gobl, C.⁴

35
- 84911869827
- Voice source modelling using deep neural networks for statistical parametric speech synthesis
- accepted
- T. Raitio, H. Lu, J. Kane, A. Suni, M. Vainio, S. King, and P. Alku, "Voice source modelling using deep neural networks for statistical parametric speech synthesis, " in 22nd European Signal Processing Conference (EUSIPCO), 2014, accepted.
- (2014) 22nd European Signal Processing Conference (EUSIPCO)
- Raitio, T.¹ Lu, H.² Kane, J.³ Suni, A.⁴ Vainio, M.⁵ King, S.⁶ Alku, P.⁷

36
- 0026881384
- Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
- P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, " Speech Commun., vol. 11, no. 2-3, pp. 109-118, 1992.
- (1992) Speech Commun. , vol.11 , Issue.2-3 , pp. 109-118
- Alku, P.¹

37
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition
- G. Hinton, L. Deng, D. Yu, G. Dahl, A.-R. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition, " IEEE Sig. Proc. Mag., vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Sig. Proc. Mag. , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.-R.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

38
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP), 2013, pp. 7962-7966.
- (2013) Proc. IEEE Int. Conf. on Acoust. Speech and Signal Proc. (ICASSP) , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

39
- 85133720638
- The hmm-based speech synthesis system (hts) version 2.0
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2.0, " in Sixth ISCA Workshop on Speech Synthesis, 2007, pp. 294-299.
- (2007) Sixth ISCA Workshop on Speech Synthesis , pp. 294-299
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.⁶ Tokuda, K.⁷

40
- 43549113834
- Nonlinear source-filter coupling in phonation: Theory
- I. R. Titze, "Nonlinear source-filter coupling in phonation: Theory, " J. Acoust. Soc. Am., vol. 123, no. 5, pp. 2733-2749, 2008.
- (2008) J. Acoust. Soc. Am. , vol.123 , Issue.5 , pp. 2733-2749
- Titze, I.R.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.