SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 19, Issue 1, 2011, Pages 153-165

HMM-based speech synthesis utilizing glottal inverse filtering

(7) Raitio, Tuomo a Suni, Antti b Yamagishi, Junichi c Pulakka, Hannu a Nurminen, Jani d Vainio, Martti b Alku, Paavo a

a AALTO UNIVERSITY (Finland)

b UNIVERSITY OF HELSINKI (Finland)

c UNIVERSITY OF EDINBURGH (United Kingdom)

d MICROSOFT (United States)

Author keywords

Glottal inverse filtering; hidden Markov model (HMM); speech synthesis

Indexed keywords

EXCITATION SIGNALS; GLOTTAL FLOW; GLOTTAL SOURCE; HIDDEN MARKOV MODEL (HMM); HMM-BASED SPEECH SYNTHESIS; INVERSE FILTERING; SOURCE SIGNALS; SPECTRAL FEATURE; SPEECH SYNTHESIZER; SYNTHESIS STAGES; SYNTHETIC SPEECH; TEXT INPUT; VOCAL-TRACTS; VOICE SOURCES;

HIDDEN MARKOV MODELS;

SPEECH SYNTHESIS;

EID: 77957744515 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2010.2045239 Document Type: Article

Times cited : (188)

References (85)

1
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- Sep
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis", in Proc. Eurospeech, Sep. 1999, pp. 2374-2350.
- (1999) Proc. Eurospeech , pp. 2374-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

2
- 85031628788
- An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features
- K. Tokuda, T. Masuko, T. Yamada, T. Kobayashi, and S. Imai, "An algorithm for speech parameter generation from continuous mixture HMMs with dynamic features", in Proc. Eurospeech, 1995, vol. 1, pp. 757-760.
- (1995) Proc. Eurospeech , vol.1 , pp. 757-760
- Tokuda, K.¹ Masuko, T.² Yamada, T.³ Kobayashi, T.⁴ Imai, S.⁵

3
- 84966348891
- An HMM-based speech synthesis system applied to English
- Sep
- K. Tokuda, H. Zen, and A. W. Black, "An HMM-based speech synthesis system applied to English", in Proc. IEEE Workshop Speech Synth., Sep. 2002, pp. 227-230.
- (2002) Proc. IEEE Workshop Speech Synth. , pp. 227-230
- Tokuda, K.¹ Zen, H.² Black, A.W.³

4
- 67651002140
- Statistical parametric speech synthesis
- Nov
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis", Speech Commun., vol. 51, no. 11, pp. 1039-1064, Nov. 2009.
- (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

5
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- Jan
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm", IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

6
- 29144475179
- Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing
- DOI 10.1093/ietisy/e88-d.11.2484
- M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, "Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing", IEICE Trans. Inf. Syst., vol. E88-D, no. 11, pp. 2484-2491, Nov. 2005. (Pubitemid 41816793)
- (2005) IEICE Transactions on Information and Systems , vol.E88-D , Issue.11 , pp. 2484-2491
- Tachibana, M.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

7
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis", in Proc. ICASSP, 2006, vol. 1, pp. 889-892.
- (2006) Proc. ICASSP , vol.1 , pp. 889-892
- Wu, Y.-J.¹ Wang, R.-H.²

8
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- May
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

9
- 0003418124
- Hague, The Netherlands: Mouton
- G. Fant, Acoustic Theory of Speech Production. The Hague, The Netherlands: Mouton, 1960.
- (1960) Acoustic Theory of Speech Production
- Fant, G.¹

10
- 0025321354
- Analysis, synthesis, and perception of voice quality variations among female and male talkers
- Feb
- D. H. Klatt and L. C. Klatt, "Analysis, synthesis, and perception of voice quality variations among female and male talkers", J. Acoust. Soc. Amer., vol. 87, no. 2, pp. 820-857, Feb. 1990.
- (1990) J. Acoust. Soc. Amer. , vol.87 , Issue.2 , pp. 820-857
- Klatt, D.H.¹ Klatt, L.C.²

11
- 0025786649
- Vocal quality factors: Analysis, synthesis, and perception
- Nov
- D. G. Childers and C. K. Lee, "Vocal quality factors: Analysis, synthesis, and perception", J. Acoust. Soc. Amer., vol. 90, no. 5, pp. 2394-2410, Nov. 1991.
- (1991) J. Acoust. Soc. Amer. , vol.90 , Issue.5 , pp. 2394-2410
- Childers, D.G.¹ Lee, C.K.²

12
- 0002884330
- The government standard linear predictive coding algorithm: LPC-10
- Apr
- T. E. Tremain, "The government standard linear predictive coding algorithm: LPC-10", Speech Technol., vol. 1, pp. 40-49, Apr. 1982.
- (1982) Speech Technol. , vol.1 , pp. 40-49
- Tremain, T.E.¹

13
- 0003425258
- Englewood Cliffs, NJ: Prentice-Hall
- L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. Englewood Cliffs, NJ: Prentice-Hall, 1978.
- (1978) Digital Processing of Speech Signals
- Rabiner, L.R.¹ Schafer, R.W.²

14
- 85009097254
- Mixed excitation for HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Mixed excitation for HMM-based speech synthesis", in Proc. Eurospeech, 2001, pp. 2259-2262.
- (2001) Proc. Eurospeech , pp. 2259-2262
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

15
- 78649297510
- An excitation model for HMM-based speech synthesis based on residual modeling
- Aug
- R. Maia, T. Toda, H. Zen, Y. Nankaku, and K. Tokuda, "An excitation model for HMM-based speech synthesis based on residual modeling", in Proc. 6th ISCA Workshop Speech Synth., Aug. 2007.
- (2007) Proc. 6th ISCA Workshop Speech Synth.
- Maia, R.¹ Toda, T.² Zen, H.³ Nankaku, Y.⁴ Tokuda, K.⁵

16
- 33846406459
- Two-band excitation for HMM-based speech synthesis
- S. J. Kim and M. Hahn, "Two-band excitation for HMM-based speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D
- Kim, S.J.¹ Hahn, M.²

17
- 33947684811
- A four-parameter model of glottal flow
- G. Fant, J. Liljencrants, and Q. Lin, "A four-parameter model of glottal flow", STL-QPSR, vol. 4, pp. 1-13, 1985.
- (1985) STL-QPSR , vol.4 , pp. 1-13
- Fant, G.¹ Liljencrants, J.² Lin, Q.³

18
- 0024911005
- Voice source rules for text-to-speech synthesis
- May
- R. Carlson, G. Fant, C. Gobl, B. Granström, I. Karlsson, and Q. Lin, "Voice source rules for text-to-speech synthesis", in Proc. ICASSP, May 1989, vol. 1, pp. 223-226.
- (1989) Proc. ICASSP , vol.1 , pp. 223-226
- Carlson, R.¹ Fant, G.² Gobl, C.³ Granström, B.⁴ Karlsson, I.⁵ Lin, Q.⁶

19
- 0026372714
- Experiments with voice modelling in speech synthesis
- R. Carlson, B. Granström, and I. Karlsson, "Experiments with voice modelling in speech synthesis", Speech Commun., vol. 10, pp. 481-489, 1991.
- (1991) Speech Commun. , vol.10 , pp. 481-489
- Carlson, R.¹ Granström, B.² Karlsson, I.³

20
- 82155160991
- Towards an improved modeling of the glottal source in statistical parametric speech synthesis
- Aug
- J. Cabral, S. Renalds, K. Richmond, and J. Yamagishi, "Towards an improved modeling of the glottal source in statistical parametric speech synthesis", in Proc. 6th ISCA Workshop Speech Synth., Aug. 2007, pp. 113-118.
- (2007) Proc. 6th ISCA Workshop Speech Synth. , pp. 113-118
- Cabral, J.¹ Renalds, S.² Richmond, K.³ Yamagishi, J.⁴

21
- 84867224654
- Glottal spectral separation for parametric speech synthesis
- J. Cabral, S. Renalds, K. Richmond, and J. Yamagishi, "Glottal spectral separation for parametric speech synthesis", in Proc. Interspeech, 2008, pp. 1829-1832.
- (2008) Proc. Interspeech , pp. 1829-1832
- Cabral, J.¹ Renalds, S.² Richmond, K.³ Yamagishi, J.⁴

22
- 0015699693
- The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer
- Jun
- J. Holmes, "The influence of glottal waveform on the naturalness of speech from a parallel formant synthesizer", IEEE Trans. Audio Electroacoust., vol. AES-21, no. 3, pp. 298-305, Jun. 1973.
- (1973) IEEE Trans. Audio Electroacoust. , vol.AES-21 , Issue.3 , pp. 298-305
- Holmes, J.¹

23
- 0026387469
- Improving naturalness in text-to-speech synthesis using natural glottal source
- Apr
- K. Matsui, S. D. Pearson, K. Hata, and T. Kamai, "Improving naturalness in text-to-speech synthesis using natural glottal source", in Proc. ICASSP, Apr. 1991, vol. 2, pp. 769-772.
- (1991) Proc. ICASSP , vol.2 , pp. 769-772
- Matsui, K.¹ Pearson, S.D.² Hata, K.³ Kamai, T.⁴

24
- 0032875050
- A method for generating natural-sounding speech stimuli for cognitive brain research
- P. Alku, H. Tiitinen, and R. Näätänen, "A method for generating natural-sounding speech stimuli for cognitive brain research", Clinical Neurophysiol., vol. 110, pp. 1329-1333, 1999.
- (1999) Clinical Neurophysiol. , vol.110 , pp. 1329-1333
- Alku, P.¹ Tiitinen, H.² Näätänen, R.³

25
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- Apr
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds", Speech Commun., vol. 27, Apr. 1999.
- (1999) Speech Commun. , vol.27
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigne, A.³

26
- 33846405723
- Details of the Nitech HMM-based speech synthesis for Blizzard Challenge 2005
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda, "Details of the Nitech HMM-based speech synthesis for Blizzard Challenge 2005", IEICE Trans. Inf. Syst., vol. E90-D, pp. 325-333, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

27
- 67650790758
- The Blizzard Challenge 2008
- V. Karaiskos, S. King, R. A. J. Clark, and C. Mayo, "The Blizzard Challenge 2008", in Proc. Blizzard Challenge Workshop, 2008.
- (2008) Proc. Blizzard Challenge Workshop
- Karaiskos, V.¹ King, S.² Clark, R.A.J.³ Mayo, C.⁴

28
- 70450147581
- M. S. thesis, Helsinki Univ. of Technol., Espoo, Finland
- T. Raitio, "Hidden Markov model based finnish text-to-speech system utilizing glottal inverse filtering", M. S. thesis, Helsinki Univ. of Technol., Espoo, Finland, 2008.
- (2008)
- Raitio, T.¹

29
- 84867209230
- HMM-based Finnish text-to-speech system utilizing glottal inverse filtering
- T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "HMM-based Finnish text-to-speech system utilizing glottal inverse filtering", in Proc. Interspeech, 2008, pp. 1881-1884.
- (2008) Proc. Interspeech , pp. 1881-1884
- Raitio, T.¹ Suni, A.² Pulakka, H.³ Vainio, M.⁴ Alku, P.⁵

30
- 84955013305
- Nature of the vocal cord wave
- Jun
- R. L. Miller, "Nature of the vocal cord wave", J. Acoust. Soc. Amer., vol. 31, no. 6, pp. 667-677, Jun. 1959.
- (1959) J. Acoust. Soc. Amer. , vol.31 , Issue.6 , pp. 667-677
- Miller, R.L.¹

31
- 0003757962
- 2nd ed. New York: Springer-Verlag
- J. L. Flanagan, Speech Analysis, Synthesis and Perception, 2nd ed. New York: Springer-Verlag, 1972.
- (1972) Speech Analysis, Synthesis and Perception
- Flanagan, J.L.¹

32
- 0026881384
- Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
- P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering", Speech Commun., vol. 11, no. 2-3, pp. 109-118, 1992.
- (1992) Speech Commun. , vol.11 , Issue.2-3 , pp. 109-118
- Alku, P.¹

33
- 0018653975
- Least squares glottal inverse filtering from the acoustic speech waveform
- Aug
- D. Wong, J. Markel, and A. Gray, Jr., "Least squares glottal inverse filtering from the acoustic speech waveform", IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 4, pp. 350-355, Aug. 1979.
- (1979) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-27 , Issue.4 , pp. 350-355
- Wong, D.¹ Markel, J.² Gray Jr., A.³

34
- 0016129045
- Determination of the instant of glottal closure from the speech wave
- H. Strube, "Determination of the instant of glottal closure from the speech wave", J. Acoust. Soc. Amer., vol. 56, no. 5, pp. 1625-1629, 1974.
- (1974) J. Acoust. Soc. Amer. , vol.56 , Issue.5 , pp. 1625-1629
- Strube, H.¹

35
- 0032595183
- Modeling of the glottal flow derivative waveform with application to speaker identification
- Sep
- M. Plumpe, T. Quatieri, and D. Reynolds, "Modeling of the glottal flow derivative waveform with application to speaker identification", IEEE Trans. Speech Audio Process., vol. 7, no. 5, pp. 569-585, Sep. 1999.
- (1999) IEEE Trans. Speech Audio Process. , vol.7 , Issue.5 , pp. 569-585
- Plumpe, M.¹ Quatieri, T.² Reynolds, D.³

36
- 0034945901
- SIM-Simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals
- M. Fröhlich, D. Michaelis, and H. Strube, "SIM-Simultaneous inverse filtering and matching of a glottal flow model for acoustic speech signals", J. Acoust. Soc. Amer., vol. 110, no. 1, pp. 479-488, 2001.
- (2001) J. Acoust. Soc. Amer. , vol.110 , Issue.1 , pp. 479-488
- Fröhlich, M.¹ Michaelis, D.² Strube, H.³

37
- 17444431936
- Estimation of the vocal tract transfer function with application to glottal wave analysis
- O. Akande and P. Murphy, "Estimation of the vocal tract transfer function with application to glottal wave analysis", Speech Commun., vol. 46, pp. 15-36, 2005.
- (2005) Speech Commun. , vol.46 , pp. 15-36
- Akande, O.¹ Murphy, P.²

38
- 33751247257
- Robust glottal source estimation based on joint source-filter model optimization
- Mar
- Q. Fu and P. Murphy, "Robust glottal source estimation based on joint source-filter model optimization", IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 2, pp. 492-501, Mar. 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process. , vol.14 , Issue.2 , pp. 492-501
- Fu, Q.¹ Murphy, P.²

39
- 0028797112
- Modeling the glottal volume-velocity waveform for three voice types
- D. Childers and C. Ahn, "Modeling the glottal volume-velocity waveform for three voice types", J. Acoust. Soc. Amer., vol. 97, no. 1, pp. 505-519, 1995.
- (1995) J. Acoust. Soc. Amer. , vol.97 , Issue.1 , pp. 505-519
- Childers, D.¹ Ahn, C.²

40
- 0001603967
- Two-channel speech analysis
- Aug
- A. Krishnamurthy and D. Childers, "Two-channel speech analysis", IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-34, no. 4, pp. 730-743, Aug. 1986.
- (1986) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-34 , Issue.4 , pp. 730-743
- Krishnamurthy, A.¹ Childers, D.²

41
- 0026881761
- On the relation between voice source parameters and prosodic features in connected speech
- H. Strik and L. Boves, "On the relation between voice source parameters and prosodic features in connected speech", Speech Commun., vol. 11, pp. 167-174, 1992.
- (1992) Speech Commun. , vol.11 , pp. 167-174
- Strik, H.¹ Boves, L.²

42
- 0026106454
- Discrete all-pole modeling
- Feb
- A. El-Jaroudi and J. Makhoul, "Discrete all-pole modeling", IEEE Trans. Signal Process., vol. 39, no. 2, pp. 411-423, Feb. 1991.
- (1991) IEEE Trans. Signal Process. , vol.39 , Issue.2 , pp. 411-423
- El-Jaroudi, A.¹ Makhoul, J.²

43
- 0016495091
- Linear prediction: A tutorial review
- Apr
- J. Makhoul, "Linear prediction: A tutorial review", Proc. IEEE, vol. 63, no. 4, pp. 561-580, Apr. 1975.
- (1975) Proc. IEEE , vol.63 , Issue.4 , pp. 561-580
- Makhoul, J.¹

44
- 0021892196
- Automatic glottal inverse filtering from speech and electroglottographic signals
- Apr
- D. Veeneman and S. BeMent, "Automatic glottal inverse filtering from speech and electroglottographic signals", IEEE Trans. Acoustics, Speech, Signal Process., vol. ASSP-33, no. 2, pp. 369-377, Apr. 1985.
- (1985) IEEE Trans. Acoustics, Speech, Signal Process. , vol.ASSP-33 , Issue.2 , pp. 369-377
- Veeneman, D.¹ BeMent, S.²

45
- 33745473319
- Advanced methods for glottal wave extraction
- M. Faundez-Zanuy, Ed. et al. Berlin/Heidelberg, Germany: Springer
- J. Walker and P. Murphy, "Advanced methods for glottal wave extraction", in Nonlinear Analyses and Algorithms for Speech Processing, M. Faundez-Zanuy, Ed. et al. Berlin/Heidelberg, Germany: Springer, 2005, pp. 139-149.
- (2005) Nonlinear Analyses and Algorithms for Speech Processing , pp. 139-149
- Walker, J.¹ Murphy, P.²

46
- 33750333146
- Performance of glottal inverse filtering as tested by aeroelastic modelling of phonation and FE modelling of vocal tract
- P. Alku, J. Horáček, M. Airas, F. Griffond-Boitier, and A.-M. Laukkanen, "Performance of glottal inverse filtering as tested by aeroelastic modelling of phonation and FE modelling of vocal tract", Acta Acust. United With Acust., vol. 92, pp. 717-724, 2006.
- (2006) Acta Acust. United with Acust. , vol.92 , pp. 717-724
- Alku, P.¹ Horáček, J.² Airas, M.³ Griffond-Boitier, F.⁴ Laukkanen, A.-M.⁵

47
- 32944458861
- Estimation of the voice source from speech pressure signals: Evaluation of an inverse filtering technique using physical modelling of voice production
- P. Alku, B. Story, and M. Airas, "Estimation of the voice source from speech pressure signals: Evaluation of an inverse filtering technique using physical modelling of voice production", Folia Phoniatrica et Logopaedica, vol. 58, no. 2, pp. 102-113, 2006.
- (2006) Folia Phoniatrica et Logopaedica , vol.58 , Issue.2 , pp. 102-113
- Alku, P.¹ Story, B.² Airas, M.³

48
- 0036339929
- Normalized amplitude quotient for parameterization of the glottal flow
- P. Alku, T. Bäckström, and E. Vilkman, "Normalized amplitude quotient for parameterization of the glottal flow", J. Acoust. Soc. Amer., vol. 112, no. 2, pp. 701-710, 2002.
- (2002) J. Acoust. Soc. Amer. , vol.112 , Issue.2 , pp. 701-710
- Alku, P.¹ Bäckström, T.² Vilkman, E.³

49
- 0037380186
- The role of voice quality in communicating emotion, mood and attitude
- C. Gobl and A. Ní Chasaide, "The role of voice quality in communicating emotion, mood and attitude", Speech Commun., vol. 40, no. 1-2, pp. 189-212, 2003.
- (2003) Speech Commun. , vol.40 , Issue.1-2 , pp. 189-212
- Gobl, C.¹ Chasaide, A.N.²

50
- 0001292859
- On the perception of emotions in speech: The role of voice quality
- A.-M. Laukkanen, E. Vilkman, P. Alku, and H. Oksanen, "On the perception of emotions in speech: The role of voice quality", Logopedics Phoniatrics Vocology, vol. 22, no. 4, pp. 157-168, 1997.
- (1997) Logopedics Phoniatrics Vocology , vol.22 , Issue.4 , pp. 157-168
- Laukkanen, A.-M.¹ Vilkman, E.² Alku, P.³ Oksanen, H.⁴

51
- 85133720638
- The HMM-based speech synthesis system (HTS) version 2.0
- Aug
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2.0", in Proc. 6th ISCA Workshop Speech Synth., Aug. 2007, pp. 294-299.
- (2007) Proc. 6th ISCA Workshop Speech Synth. , pp. 294-299
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.⁶ Tokuda, K.⁷

52
- 0023407575
- Review of text-to-speech conversion for English
- D. Klatt, "Review of text-to-speech conversion for English", J. Acoust. Soc. Amer., vol. 82, no. 3, pp. 737-793, 1987.
- (1987) J. Acoust. Soc. Amer. , vol.82 , Issue.3 , pp. 737-793
- Klatt, D.¹

53
- 0020750866
- On the time domain properties of the two-pole model of the glottal waveform and implications for LPC
- J. Deller, "On the time domain properties of the two-pole model of the glottal waveform and implications for LPC", Speech Commun., vol. 2, no. 1, pp. 57-63, 1983.
- (1983) Speech Commun. , vol.2 , Issue.1 , pp. 57-63
- Deller, J.¹

54
- 0002557614
- Line spectrum pair (LSP) and speech data compression
- Mar
- F. K. Soong and B.-H. Juang, "Line spectrum pair (LSP) and speech data compression", in Proc. ICASSP, Mar. 1984, vol. 9, pp. 37-40.
- (1984) Proc. ICASSP , vol.9 , pp. 37-40
- Soong, F.K.¹ Juang, B.-H.²

55
- 67650800535
- An investigation of spectral parameters for HMM-based speech synthesis
- Japanese, Sep
- M. Marume, H. Zen, Y. Nankaku, K. Tokuda, and T. Kitamura, "An investigation of spectral parameters for HMM-based speech synthesis", in Proc. Autumn Meeting Acoust. Soc. Jpn. (in Japanese), Sep. 2006.
- (2006) Proc. Autumn Meeting Acoust. Soc. Jpn.
- Marume, M.¹ Zen, H.² Nankaku, Y.³ Tokuda, K.⁴ Kitamura, T.⁵

56
- 0002557614
- Line spectrum pair (LSP) and speech data compression
- Mar
- F. Soong and B.-H. Juang, "Line spectrum pair (LSP) and speech data compression", in Proc. ICASSP, Mar. 1984, vol. 9, pp. 37-40.
- (1984) Proc. ICASSP , vol.9 , pp. 37-40
- Soong, F.¹ Juang, B.-H.²

57
- 0002077742
- Quantization of LPC parameters
- W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier
- K. Paliwal and W. Kleijn, "Quantization of LPC parameters", in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier, 1995, ch. 12.
- (1995) Speech Coding and Synthesis , pp. 12
- Paliwal, K.¹ Kleijn, W.²

58
- 0017367712
- On the use of autocorrelation analysis for pitch detection
- L. Rabiner, "On the use of autocorrelation analysis for pitch detection", IEEE Trans. Acoust., Speech, Signal Process., vol. 25, no. 1, pp. 24-33, 1977.
- (1977) IEEE Trans. Acoust., Speech, Signal Process. , vol.25 , Issue.1 , pp. 24-33
- Rabiner, L.¹

59
- 0032945155
- Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis
- P. Murphy, "Perturbation-free measurement of the harmonics-to-noise ratio in voice signals using pitch synchronous harmonic analysis", J. Acoust. Soc. Amer., vol. 105, no. 5, pp. 2866-2881, 1999.
- (1999) J. Acoust. Soc. Amer. , vol.105 , Issue.5 , pp. 2866-2881
- Murphy, P.¹

60
- 0036522887
- Multi-space probability distribution HMM
- Mar
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Multi-space probability distribution HMM", IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455-464, Mar. 2002.
- (2002) IEICE Trans. Inf. Syst. , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

61
- 0022685753
- Continuously variable duration hidden Markov models for automatic speech recognition
- S. Levinson, "Continuously variable duration hidden Markov models for automatic speech recognition", Computer Speech Lang., vol. 1, no. 1, pp. 29-45, 1986. (Pubitemid 17552445)
- (1986) Computer Speech and Language , vol.1 , Issue.1 , pp. 29-45
- Levinson, S.E.¹

62
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- May
- H. Zen, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system", IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, May 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

63
- 53049107776
- Accent and prominence in Finnish speech synthesis
- G. Kokkinakis, N. Fakotakis, E. Dermatos, and R. Potapova, Eds., Greece, Oct, Univ. of Patras
- M. Vainio, A. Suni, and P. Sirjola, "Accent and prominence in Finnish speech synthesis", in Proc. 10th Int. Conf. Speech Comput. (Specom 2005), G. Kokkinakis, N. Fakotakis, E. Dermatos, and R. Potapova, Eds., Greece, Oct. 2005, pp. 309-312, Univ. of Patras.
- (2005) Proc. 10th Int. Conf. Speech Comput. (Specom 2005) , pp. 309-312
- Vainio, M.¹ Suni, A.² Sirjola, P.³

64
- 53049097235
- Deep syntactic analysis and rule based accentuation in text-to-speech synthesis
- Text, Speech, Dialogue
- A. Suni and M. Vainio, "Deep syntactic analysis and rule based accentuation in text-to-speech synthesis", in Proc. TSD'08: Proc. 11th Int. Conf. Text, Speech, Dialogue, 2008, pp. 535-542.
- (2008) Proc. TSD'08: Proc. 11th Int. Conf , pp. 535-542
- Suni, A.¹ Vainio, M.²

65
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- Mar
- K. Shinoda and T. Watanabe, "MDL-based context-dependent subword modeling for speech recognition", J. Acoust. Soc. Japan (E), vol. 21, pp. 79-86, Mar. 2000.
- (2000) J. Acoust. Soc. Japan (E) , vol.21 , pp. 79-86
- Shinoda, K.¹ Watanabe, T.²

66
- 31744434085
- Characterizing glottal jet turbulence
- F. Alipour and R. Scherer, "Characterizing glottal jet turbulence", J. Acoust. Soc. Amer., vol. 119, no. 2, pp. 1063-1073, 2006.
- (2006) J. Acoust. Soc. Amer. , vol.119 , Issue.2 , pp. 1063-1073
- Alipour, F.¹ Scherer, R.²

67
- 0004197471
- Berlin, Germany: Springer
- G. Engeln-Müllges and E. Uhlig, Numerical Algorithms With C. Berlin, Germany: Springer, 1996.
- (1996) Numerical Algorithms with C
- Engeln-Müllges, G.¹ Uhlig, E.²

68
- 0037864375
- 3rd ed. 2009
- M. Galassi et al., GNU Scientific Library Reference Manual, 3rd ed. 2009.
- GNU Scientific Library Reference Manual
- Galassi, M.¹

69
- 67650851754
- USTC system for Blizzard Challenge 2006: An improved HMM-based speech synthesis method
- Z.-H. Ling, Y. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC system for Blizzard Challenge 2006: An improved HMM-based speech synthesis method", in Proc. Blizzard Challenge Workshop, 2006.
- (2006) Proc. Blizzard Challenge Workshop
- Ling, Z.-H.¹ Wu, Y.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

70
- 33745215669
- An overview of nitech HMM-based speech synthesis system for Blizzard Challenge 2005
- 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
- H. Zen and T. Toda, "An overview of Nitech HMM-based speech synthesis system for Blizzard Challenge 2005", in Proc. Interspeech, Sep. 2005, pp. 93-96. (Pubitemid 43908009)
- (2005) 9th European Conference on Speech Communication and Technology , pp. 93-96
- Zen, H.¹ Toda, T.²

71
- 0020596154
- Cepstral analysis synthesis on the mel frequency scale
- Apr. 1983
- S. Imai, "Cepstral analysis synthesis on the mel frequency scale", in Proc. ICASSP, Apr. 1983, vol. 8, pp. 93-96.
- Proc. ICASSP , vol.8 , pp. 93-96
- Imai, S.¹

72
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- Sep
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT", in Proc. 2nd Int. Workshop Models Anal. Vocal Emissions for Biomed. Applicat. (MAVEBA), Sep. 2001.
- (2001) Proc. 2nd Int. Workshop Models Anal. Vocal Emissions for Biomed. Applicat. (MAVEBA)
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

73
- 44949143155
- Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
- Sep
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation", in Proc. Interspeech, Sep. 2006, pp. 2266-2269.
- (2006) Proc. Interspeech , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

74
- 11144317887
- Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency
- Dec
- D. Arifianto, T. Tanaka, T. Masuko, and T. Kobayashi, "Robust F0 estimation of speech signal using harmonicity measure based on instantaneous frequency", IEICE Trans. Inf. Syst., vol. E87-D, no. 12, pp. 2812-2820, Dec. 2004.
- (2004) IEICE Trans. Inf. Syst. , vol.E87-D , Issue.12 , pp. 2812-2820
- Arifianto, D.¹ Tanaka, T.² Masuko, T.³ Kobayashi, T.⁴

75
- 84928118106
- Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity
- Sep
- H. Kawahara, H. Katayose, A. Cheveigné, and R. Patterson, "Fixed point analysis of frequency to instantaneous frequency mapping for accurate estimation of F0 and periodicity", in Proc. Eurospeech, Sep. 1999, pp. 2781-2784.
- (1999) Proc. Eurospeech , pp. 2781-2784
- Kawahara, H.¹ Katayose, H.² Cheveigné, A.³ Patterson, R.⁴

76
- 0001455934
- A robust algorithm for pitch tracking (RAPT)
- W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier
- D. Talkin, "A robust algorithm for pitch tracking (RAPT)", in Speech Coding and Synthesis, W. Kleijn and K. Paliwal, Eds. Amsterdam, The Netherlands: Elsevier, 1995, pp. 495-518.
- (1995) Speech Coding and Synthesis , pp. 495-518
- Talkin, D.¹

77
- 77957731953
- ESPS Programs Version 5.0 Entropic Research Laboratory Inc., 1993
- ESPS Programs Version 5.0 Entropic Research Laboratory Inc., 1993.

78
- 0025543906
- Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones
- Dec
- E. Moulines and F. Charpentier, "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", Speech Commun., vol. 9, no. 5-6, pp. 453-467, Dec. 1990.
- (1990) Speech Commun. , vol.9 , Issue.5-6 , pp. 453-467
- Moulines, E.¹ Charpentier, F.²

79
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech", in Proc. ICASSP, 1992, vol. 1, pp. 137-140.
- (1992) Proc. ICASSP , vol.1 , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

80
- 77957743908
- HTS, HMM-Based Speech Synthesis System Apr, Online. Available
- HTS, HMM-Based Speech Synthesis System Apr. 2009 [Online]. Available: http://hts.sp. nitech.ac.jp
- (2009)

81
- 33646681559
- Ph. D. dissertation, Univ. of Helsinki, Espoo, Finland, Dec
- M. Vainio, "Artificial neural network based prosody models for Finnish text-to-speech synthesis", Ph. D. dissertation, Univ. of Helsinki, Espoo, Finland, Dec. 2001.
- (2001) "Artificial Neural Network Based Prosody Models for Finnish Text-to-speech Synthesis
- Vainio, M.¹

82
- 0003450846
- Methods for subjective determination of transmission quality
- ITU, Aug
- ITU, "Methods for subjective determination of transmission quality", Int. Telecomm. Union, Rec. ITU-T P.800, Aug. 1996.
- (1996) Int. Telecomm. Union, Rec. ITU-T , pp. 800

83
- 0030166343
- The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences
- C. Benoít, M. Grice, and V. Hazan, "The SUS test: A method for the assessment of text-to-speech synthesis intelligibility using semantically unpredictable sentences", Speech Commun., vol. 18, no. 4, pp. 381-392, 1996.
- (1996) Speech Commun. , vol.18 , Issue.4 , pp. 381-392
- Benoít, C.¹ Grice, M.² Hazan, V.³

84
- 0001884644
- Individual comparisons by ranking methods
- F. Wilcoxon, "Individual comparisons by ranking methods", Biometrics, vol. 1, pp. 80-83, 1945.
- (1945) Biometrics , vol.1 , pp. 80-83
- Wilcoxon, F.¹

85
- 84867197177
- Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge
- Sep
- Z.-H. Ling, K. Richmond, J. Yamagishi, and R.-H. Wang, "Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge", in Proc. Interspeech, Brisbane, Australia, Sep. 2008, pp. 573-576.
- (2008) Proc. Interspeech, Brisbane, Australia , pp. 573-576
- Ling, Z.-H.¹ Richmond, K.² Yamagishi, J.³ Wang, R.-H.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.