SCOPUS 정보 검색 플랫폼

Computer Speech and Language

Volumn 28, Issue 5, 2014, Pages 1209-1232

On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis

(2) Maia, Ranniery a Akamine, Masami a

a TOSHIBA CORPORATION (Japan)

Author keywords

Expressive speech synthesis; Speech parameterization; Speech synthesis; Statistical parametric speech synthesis

Indexed keywords

SPEECH PROCESSING; SPEECH SYNTHESIS;

EMOTION IDENTIFICATIONS; EXCITATION PARAMETERS; EXPRESSIVE SPEECH SYNTHESIS; GAUSSIAN MIXTURE MODEL; PARAMETRIC SYNTHESIS; SOURCE-FILTER MODELS; SPEAKER AND LANGUAGE FACTORIZATIONS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS;

STATISTICS;

EID: 84902548006 PISSN: 08852308 EISSN: 10958363 Source Type: Journal
DOI: 10.1016/j.csl.2013.10.001 Document Type: Article

Times cited : (4)

References (45)

1
- 33644694381
- Emotions in vowel segments of continuous speech: Analysis of the glottal flow using the normalised amplitude quotient
- DOI 10.1159/000091405
- M. Airas, and P. Alku Emotions in vowel segments of continuous speech: analysis of the glottal flow using the normalized amplitude quotient Phonetica 63 1 2006 26 46 (Pubitemid 43333524)
- (2006) Phonetica , vol.63 , Issue.1 , pp. 26-46
- Airas, M.¹ Alku, P.²

2
- 70450163450
- Comparison of multiple voice source parameters in different phonation types
- M. Airas, and P. Alku Comparison of multiple voice source parameters in different phonation types Proc. of Interspeech 2007 1410 1413
- (2007) Proc. of Interspeech , pp. 1410-1413
- Airas, M.¹ Alku, P.²

3
- 0036339929
- Normalized amplitude quotient for parametrization of the glottal flow
- DOI 10.1121/1.1490365
- P. Alku, and T. Backstrom Normalized amplitude and quotient for parameterization of the glottal flow Journal of the Acoustical Society of America 112 August (2) 2002 701 710 (Pubitemid 34855925)
- (2002) Journal of the Acoustical Society of America , vol.112 , Issue.2 , pp. 701-710
- Alku, P.¹ Backstrom, T.² Vilkman, E.³

4
- 0002450185
- Efficient representation of short-time phase based on group delay
- H. Banno, J. Lu, S. Nakamura, K. Shikano, and H. Kawahara Efficient representation of short-time phase based on group delay Proc. of ICASSP 1998 861 864
- (1998) Proc. of ICASSP , pp. 861-864
- Banno, H.¹ Lu, J.² Nakamura, S.³ Shikano, K.⁴ Kawahara, H.⁵

5
- 33846516584
- Springer
- C.M. Bishop Pattern Recognition and Machine Learning 2006 Springer
- (2006) Pattern Recognition and Machine Learning
- Bishop, C.M.¹

6
- 82155160991
- Towards an improved modeling of the glottal source in statistical parametric speech synthesis
- J. Cabral, S. Renals, K. Richmond, and J. Yamagishi Towards an improved modeling of the glottal source in statistical parametric speech synthesis Proc. of 6th ISCA Speech Synthesis Workshop 2007 113 118
- (2007) Proc. of 6th ISCA Speech Synthesis Workshop , pp. 113-118
- Cabral, J.¹ Renals, S.² Richmond, K.³ Yamagishi, J.⁴

7
- 0003424145
- IEEE Press Classic Reissue
- J.R. Deller Jr., J.H.L. Hansen, and J.G. Proaks Discrete-Time Processing of Speech Signals 2000 IEEE Press Classic Reissue
- (2000) Discrete-Time Processing of Speech Signals
- Deller, Jr.J.R.¹ Hansen, J.H.L.² Proaks, J.G.³

8
- 79955528226
- Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation
- T. Drugman, B. Bozkurt, and T. Dutoit Causal-anticausal decomposition of speech using complex cepstrum for glottal source estimation Speech Communication 53 2011 855 866
- (2011) Speech Communication , vol.53 , pp. 855-866
- Drugman, T.¹ Bozkurt, B.² Dutoit, T.³

9
- 70450204573
- A deterministic plus stochastic model of residual signal for improved parametric speech synthesis
- T. Drugman, G. Wilfart, and T. Dutoit A deterministic plus stochastic model of residual signal for improved parametric speech synthesis Proc. of Interspeech 2009 1779 1782
- (2009) Proc. of Interspeech , pp. 1779-1782
- Drugman, T.¹ Wilfart, G.² Dutoit, T.³

10
- 60849097547
- Normalized mutual information feature selection
- February 2
- P.A. Estévez, M. Tesmer, C.A. Perez, and J.M. Zurada Normalized mutual information feature selection IEEE Transactions on Neural Networks 20 February (2) 2009 189 201
- (2009) IEEE Transactions on Neural Networks , vol.20 , pp. 189-201
- Estévez, P.A.¹ Tesmer, M.² Perez, C.A.³ Zurada, J.M.⁴

11
- 33947684811
- A four-parameter model of the glottal flow
- G. Fant, J. Liljencrants, and Q. Lin A four-parameter model of the glottal flow STL-QPSR 26 4 1985 001 013
- (1985) STL-QPSR , vol.26 , Issue.4 , pp. 001-013
- Fant, G.¹ Liljencrants, J.² Lin, Q.³

12
- 0008881815
- An approximation to voice aperiodicity
- March 1
- O. Fujimura An approximation to voice aperiodicity IEEE Transactions on Audio and Electroacoustics 16 March (1) 1968 68 72
- (1968) IEEE Transactions on Audio and Electroacoustics , vol.16 , pp. 68-72
- Fujimura, O.¹

13
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai An adaptive algorithm for mel-cepstral analysis of speech Proc. of ICASSP 1992 137 140
- (1992) Proc. of ICASSP , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

14
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M.J.F. Gales Maximum likelihood linear transforms for HMM-based speech recognition Computer Speech and Language 12 April (2) 1998 75 98 (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

15
- 0034227757
- Cluster adaptive training of hidden Markov models
- July 4
- M.J.F. Gales Cluster adaptive training of hidden Markov models IEEE Transactions on Speech and Audio Processing 8 July (4) 2000 417 428
- (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , pp. 417-428
- Gales, M.J.F.¹

16
- 0035472456
- Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech
- DOI 10.1109/89.952489, PII S1063667601082335
- P.J. Jackson, and C.H. Shadle Pitch-scaled estimation of simultaneous voiced and turbulence-noise components in speech IEEE Transactions on Speech and Audio Processing 9 October (7) 2001 713 726 (Pubitemid 32992835)
- (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.7 , pp. 713-726
- Jackson, P.J.B.¹ Shadle, C.H.²

17
- 0022929809
- The computation of line spectral frequencies using Chebyshev Polynomials
- December 6
- P. Kabal, and R.P. Ramachandran The computation of line spectral frequencies using Chebyshev Polynomials IEEE Transactions on Acoustics, Speech, Signal Processing 34 December (6) 1986 1419 1426
- (1986) IEEE Transactions on Acoustics, Speech, Signal Processing , vol.34 , pp. 1419-1426
- Kabal, P.¹ Ramachandran, R.P.²

18
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- H. Kawahara, J. Estill, and O. Fujimura Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT Proc. of MAVEBA 2001 13 18
- (2001) Proc. of MAVEBA , pp. 13-18
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

19
- 0003637864
- Elsevier
- W.B. Kleijn, and K.K. Paliwal Speech Coding and Synthesis 1995 Elsevier
- (1995) Speech Coding and Synthesis
- Kleijn, W.B.¹ Paliwal, K.K.²

20
- 0033703536
- A 16 kb/s wideband CELP-based speech coder using mel-generalized cepstral analysis
- April 4
- K. Koishida, G. Hirabayashi, K. Tokuda, and T. Kobayashi A 16 kb/s wideband CELP-based speech coder using mel-generalized cepstral analysis IEICE Transactions on Information & Systems E83-D April (4) 2000 876 883
- (2000) IEICE Transactions on Information & Systems , vol.83 E -D , pp. 876-883
- Koishida, K.¹ Hirabayashi, G.² Tokuda, K.³ Kobayashi, T.⁴

21
- 84878387086
- Analysis on the importance of short-term speech parameterizations for emotional statistical parametric speech synthesis
- R. Maia, and M. Akamine Analysis on the importance of short-term speech parameterizations for emotional statistical parametric speech synthesis Proc. of Interspeech 2012
- (2012) Proc. of Interspeech
- Maia, R.¹ Akamine, M.²

22
- 84876205258
- Complex cepstrum for statistical parametric speech synthesis
- June 55
- R. Maia, M. Akamine, and M. Gales Complex cepstrum for statistical parametric speech synthesis Speech Communication 5 June (55) 2013 606 618
- (2013) Speech Communication , vol.5 , pp. 606-618
- Maia, R.¹ Akamine, M.² Gales, M.³

23
- 84867616957
- Complex cepstrum as phase information for statistical parametric speech synthesis
- R. Maia, M. Akamine, and M.F.J. Gales Complex cepstrum as phase information for statistical parametric speech synthesis Proc. of ICASSP 2012 4581 4584
- (2012) Proc. of ICASSP , pp. 4581-4584
- Maia, R.¹ Akamine, M.² Gales, M.F.J.³

24
- 84906246236
- Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis
- (in press)
- R. Maia, M. Gales, Y. Stylianou, and M. Akamine Minimum mean squared error based warped complex cepstrum analysis for statistical parametric speech synthesis Proc. of Interspeech 2013 (in press)
- (2013) Proc. of Interspeech
- Maia, R.¹ Gales, M.² Stylianou, Y.³ Akamine, M.⁴

25
- 78649297510
- An excitation model for HMM-based speech synthesis based on residual modeling
- R. Maia, T. Toda, H. Zen, Y. Nankaku, and K. Tokuda An excitation model for HMM-based speech synthesis based on residual modeling Proc. of the 6th ISCA Workshop on Speesh Synthesis 2007 131 136
- (2007) Proc. of the 6th ISCA Workshop on Speesh Synthesis , pp. 131-136
- Maia, R.¹ Toda, T.² Zen, H.³ Nankaku, Y.⁴ Tokuda, K.⁵

26
- 0003874959
- Springer-Verlag New York
- J. Markel, and A. Gray Linear Prediction of Speech 1982 Springer-Verlag New York
- (1982) Linear Prediction of Speech
- Markel, J.¹ Gray, A.²

27
- 0001481529
- Bark and ERB bilinear transforms
- November 6
- J.O. Smith III, and J.S. Abel Bark and ERB bilinear transforms IEEE Transactions on Speech and Audio Processing 7 November (6) 1999 697 708
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , pp. 697-708
- Smith III, J.O.¹ Abel, J.S.²

28
- 0001052406
- Discrete representation of signals
- June 6
- A.V. Oppenheim, and D.H. Johnson Discrete representation of signals Proceedings of IEEE 60 June (6) 1972 681 691
- (1972) Proceedings of IEEE , vol.60 , pp. 681-691
- Oppenheim, A.V.¹ Johnson, D.H.²

29
- 0003513556
- Pearson
- A.W. Oppenheim Discrete-time Signal Processing 2010 Pearson
- (2010) Discrete-time Signal Processing
- Oppenheim, A.W.¹

30
- 0032595183
- Modeling of the glottal flow derivative waveform with application to speaker identification
- September 5
- M.D. Plumpe, T.F. Quatieri, and D.A. Reynolds Modeling of the glottal flow derivative waveform with application to speaker identification IEEE Transactions on Speech and Audio Processing 7 September (5) 1999 569 586
- (1999) IEEE Transactions on Speech and Audio Processing , vol.7 , pp. 569-586
- Plumpe, M.D.¹ Quatieri, T.F.² Reynolds, D.A.³

31
- 67650486451
- Multimodal signals: Cognitive and algorithmic issues
- Springer-Verlag 10.1007/978-3-642-00525-1-23
- A. Přibilová, and J. Přibil Multimodal signals: cognitive and algorithmic issues Ch. Spectrum Modification for Emotional Speech Synthesis 2009 Springer-Verlag 232 241 10.1007/978-3-642-00525-1-23
- (2009) Ch. Spectrum Modification for Emotional Speech Synthesis , pp. 232-241
- Přibilová, A.¹ Přibil, J.²

32
- 0003927842
- Prentice Hall Signal Processing Series
- T.F. Quatieri Speech Signal Processing 2002 Prentice Hall Signal Processing Series
- (2002) Speech Signal Processing
- Quatieri, T.F.¹

33
- 0003425258
- Prentice Hall
- L.R. Rabiner, and R.W. Schafer Digital Processing of Speech Signals 1978 Prentice Hall
- (1978) Digital Processing of Speech Signals
- Rabiner, L.R.¹ Schafer, R.W.²

34
- 77957744515
- HMM-based speech synthesis utilizing glottal inverse filtering
- January 1
- T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku HMM-based speech synthesis utilizing glottal inverse filtering IEEE Transactions on Audio, Speech and Language Processing 19 January (1) 2011 153 165
- (2011) IEEE Transactions on Audio, Speech and Language Processing , vol.19 , pp. 153-165
- Raitio, T.¹ Suni, A.² Yamagishi, J.³ Pulakka, H.⁴ Nurminen, J.⁵ Vainio, M.⁶ Alku, P.⁷

35
- 0029209272
- Robust text-independent speaker identification using Gaussian mixture speaker models
- January 1
- D.A. Reynolds, and R.C. Rose Robust text-independent speaker identification using Gaussian mixture speaker models IEEE Transactions on Speech and Audio Processing 3 January (1) 1995 72 83
- (1995) IEEE Transactions on Speech and Audio Processing , vol.3 , pp. 72-83
- Reynolds, D.A.¹ Rose, R.C.²

36
- 79959855615
- Cluster analysis of differential spectral envelopes on emotional speech
- G. Salvi, F. Tesser, E. Zovato, and P. Cosi Cluster analysis of differential spectral envelopes on emotional speech Proc. of Interspeech 2010 322 325
- (2010) Proc. of Interspeech , pp. 322-325
- Salvi, G.¹ Tesser, F.² Zovato, E.³ Cosi, P.⁴

37
- 84865709194
- Clustering expressive speech styles in audiobooks using glottal source parameters
- E. Székely, J.P. Cabral, P. Cahill, and J. Carson-Berndsen Clustering expressive speech styles in audiobooks using glottal source parameters Proc. of Interspeech 2011 2409 2412
- (2011) Proc. of Interspeech , pp. 2409-2412
- Székely, E.¹ Cabral, J.P.² Cahill, P.³ Carson-Berndsen, J.⁴

38
- 34047263010
- Prosody conversion from neutral to emotional speech
- July 4
- J. Tao, Y. Kang, and A. Li Prosody conversion from neutral to emotional speech IEEE Transactions on Audio, Speech, and Language Processing 14 July (4) 2006 1145 1154
- (2006) IEEE Transactions on Audio, Speech, and Language Processing , vol.14 , pp. 1145-1154
- Tao, J.¹ Kang, Y.² Li, A.³

39
- 85131821539
- Mel-generalized cepstral analysis - A unified approach to speech spectral estimation
- K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai Mel-generalized cepstral analysis - a unified approach to speech spectral estimation Proc. of ICSLP 1994 1043 1046
- (1994) Proc. of ICSLP , pp. 1043-1046
- Tokuda, K.¹ Kobayashi, T.² Masuko, T.³ Imai, S.⁴

40
- 84878422444
- Combining multiple high quality corpora for improving HMM-TTS
- V. Wan, J. Latorre, K. Chin, L. Chen, M. Gales, H. Zen, K. Knill, and M. Akamine Combining multiple high quality corpora for improving HMM-TTS Proc. of Interspeech 2012
- (2012) Proc. of Interspeech
- Wan, V.¹ Latorre, J.² Chin, K.³ Chen, L.⁴ Gales, M.⁵ Zen, H.⁶ Knill, K.⁷ Akamine, M.⁸

41
- 85009097254
- Mixed-excitation for HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura Mixed-excitation for HMM-based speech synthesis Proc. of Eurospeech 2001 2263 2266
- (2001) Proc. of Eurospeech , pp. 2263-2266
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

42
- 79955538498
- Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
- July 6
- K. Yu, H. Zen, F. Mairesse, and S. Young Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis Speech Communication 53 July (6) 2011 914 923
- (2011) Speech Communication , vol.53 , pp. 914-923
- Yu, K.¹ Zen, H.² Mairesse, F.³ Young, S.⁴

43
- 84859765673
- Statistical parametric speech synthesis based on speaker and language factorization
- August 6
- H. Zen, N. Braunschweiler, S. Buchholz, M. Gales, K. Knill, S. Krstulovic, and J. Latorre Statistical parametric speech synthesis based on speaker and language factorization. IEEE Transactions on Audio, Speech, and Language Processing 20 August (6) 2012 1713 1724
- (2012) IEEE Transactions on Audio, Speech, and Language Processing , vol.20 , pp. 1713-1724
- Zen, H.¹ Braunschweiler, N.² Buchholz, S.³ Gales, M.⁴ Knill, K.⁵ Krstulovic, S.⁶ Latorre, J.⁷

44
- 33846405723
- Details of the Nitech HMM-based speech synthesis for Blizzard Challenge 2005
- January 1
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda Details of the Nitech HMM-based speech synthesis for Blizzard Challenge 2005 IEICE Transactions on Information and Systems E90-D January (1) 2005 325 333
- (2005) IEICE Transactions on Information and Systems , vol.90 E -D , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

45
- 67651002140
- Statistical parametric speech synthesis
- November 11
- H. Zen, K. Tokuda, and A. Black Statistical parametric speech synthesis Speech Communication 51 November (11) 2009 1039 1064
- (2009) Speech Communication , vol.51 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.