SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 20, Issue 7, 2012, Pages 2134-2148

Vocal tract length normalization for statistical parametric speech synthesis

(3) Saheer, Lakshmi a,b Dines, John a Garner, Philip N a

a IDIAP RESEARCH INSTITUTE (Switzerland)

Author keywords

Expectation maximization optimization; hidden Markov model (HMM) based statistical parametric speech synthesis; speaker adaptation; vocal tract length normalization

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; EFFICIENT IMPLEMENTATION; EXPECTATION MAXIMIZATION; JACOBIANS; RAPID SPEAKER ADAPTATION; SPEAKER ADAPTATION; TRANSFORMATION MATRICES; VOCAL TRACT LENGTH NORMALIZATION; WARPING FACTORS;

HIDDEN MARKOV MODELS; LINEAR TRANSFORMATIONS; MAXIMUM PRINCIPLE; SPEECH RECOGNITION;

SPEECH SYNTHESIS;

EID: 84862291337 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2198058 Document Type: Article

Times cited : (12)

References (29)

1
- 34547526960
- Statistical parametric speech synthesis
- A. W. Black, H. Zen, and K. Tokuda, "Statistical parametric speech synthesis," in Proc. ICASSP, 2007, pp. 1229-1232.
- (2007) Proc. ICASSP , pp. 1229-1232
- Black, A.W.¹ Zen, H.² Tokuda, K.³

2
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- Jan.
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

3
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998. (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

4
- 0031647824
- A frequency warping approach to speaker normalization
- PII S1063667698000960
- L. Lee and R. Rose, "A frequency warping approach to speaker normalization," IEEE Trans. Speech Audio Process., vol. 6, no. 1, pp. 49-60, Jan. 1998. (Pubitemid 128720631)
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.1 , pp. 49-60
- Lee, L.¹ Rose, R.²

5
- 78049381954
- VTLN adaptation for statistical speech synthesis
- Mar.
- L. Saheer, P. N. Garner, J. Dines, and H. Liang, "VTLN adaptation for statistical speech synthesis," in Proc. ICASSP, Mar. 2010, pp. 4838-4841.
- (2010) Proc. ICASSP , pp. 4838-4841
- Saheer, L.¹ Garner, P.N.² Dines, J.³ Liang, H.⁴

6
- 84859095070
- Implementation of VTLN for statistical speech synthesis
- Kyoto, Japan Sep.
- L. Saheer, J. Dines, P. N. Garner, and H. Liang, "Implementation of VTLN for statistical speech synthesis," in Proc. 7th ISCA Speech Synth. Workshop, Kyoto, Japan, Sep. 2010, pp. 224-229.
- (2010) Proc. 7th ISCA Speech Synth. Workshop , pp. 224-229
- Saheer, L.¹ Dines, J.² Garner, P.N.³ Liang, H.⁴

7
- 84862296819
- Study of Jacobian normalization for VTLN
- L. Saheer, P. N. Garner, and J. Dines, "Study of Jacobian normalization for VTLN," Idiap-RR-25-2010, 2010.
- (2010) Idiap-RR-25-2010
- Saheer, L.¹ Garner, P.N.² Dines, J.³

8
- 33947674606
- Ph.D. dissertations, RWTH Aachen Univ., Aachen, Germany
- M. Pitz, "Investigations on linear transformations for speaker adaptation and normalization," Ph.D. dissertations, RWTH Aachen Univ., Aachen, Germany, 2005.
- (2005) Investigations on Linear Transformations for Speaker Adaptation and Normalization
- Pitz, M.¹

9
- 33745201218
- Implementing frequency warping and VTLN through linear transformation of conventional MFCC
- S. Umesh, A. Zolnay, and H. Ney, "Implementing frequency warping and VTLN through linear transformation of conventional MFCC," in Proc. Interspeech, Lisbon, Portugal, 2005, pp. 269-271.
- (2005) Proc. Interspeech, Lisbon, Portugal , pp. 269-271
- Umesh, S.¹ Zolnay, A.² Ney, H.³

10
- 47549091998
- Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
- S. Panchapagesan and A. Alwan, "Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC," Comput. Speech Lang., vol. 23, no. 1, pp. 42-64, 2009.
- (2009) Comput. Speech Lang. , vol.23 , Issue.1 , pp. 42-64
- Panchapagesan, S.¹ Alwan, A.²

11
- 27644522706
- Vocal tract normalization equals linear transformation in cepstral space
- DOI 10.1109/TSA.2005.848881
- M. Pitz and H. Ney, "Vocal tract normalization equals linear transformation in cepstral space," IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 930-944, Sep. 2005. (Pubitemid 41558907)
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.5 , pp. 930-944
- Pitz, M.¹ Ney, H.²

12
- 84937324786
- Ph.D. dissertations, John Hopkins Univ., Baltimore MD
- J. W. McDonough, "speaker compensation with all-pass transforms," Ph.D. dissertations, John Hopkins Univ., Baltimore, MD, 2000.
- (2000) Speaker Compensation with All-pass Transforms
- McDonough, J.W.¹

13
- 85135261079
- An investigation into vocal tract length normalisation
- L. F. Uebel and P. C. Woodland, "An investigation into vocal tract length normalisation," in Proc. Eur. Conf. Speech Commun. Technol., 1999, pp. 2527-2530.
- (1999) Proc. Eur. Conf. Speech Commun. Technol. , pp. 2527-2530
- Uebel, L.F.¹ Woodland, P.C.²

14
- 84888623995
- Ph.D. dissertation, Bundeswehr Univ. Munich, Munich, Germany
- D. Sündermann, "Text-independent voice conversion," Ph.D. dissertation, Bundeswehr Univ. Munich, Munich, Germany, 2008.
- (2008) Text-independent Voice Conversion
- Sündermann, D.¹

15
- 85131821539
- Mel-generalized cepstral analysis-A unified approach to speech spectral estimation
- Sep.
- K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Mel-generalized cepstral analysis-A unified approach to speech spectral estimation," in Proc. ICSLP, Sep. 1994, vol. 3, pp. 1043-1046.
- (1994) Proc. ICSLP , vol.3 , pp. 1043-1046
- Tokuda, K.¹ Kobayashi, T.² Masuko, T.³ Imai, S.⁴

16
- 78049377655
- A study on average voice model training using vocal tract length normalization," (Japanese)
- M. Hirohata, T. Masuko, and T. Kobayashi, "A study on average voice model training using vocal tract length normalization," (in Japanese) IEICE Tech. Rep., vol. 103, no. 27, pp. 69-74, 2003.
- (2003) IEICE Tech. Rep. , vol.103 , Issue.27 , pp. 69-74
- Hirohata, M.¹ Masuko, T.² Kobayashi, T.³

17
- 84867216464
- A computationally efficient approach to warp factor estimation in VTLN using em algorithm and sufficient statistics
- Brisbane, Australia
- P. T. Akhil, S. P. Rath, S. Umesh, and D. R. Sanand, "A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics," in Proc. Interspeech, Brisbane, Australia, 2008, pp. 1713-1716.
- (2008) Proc. Interspeech , pp. 1713-1716
- Akhil, P.T.¹ Rath, S.P.² Umesh, S.³ Sanand, D.R.⁴

18
- 0001052406
- Discrete representation of signals
- Jun.
- D. Oppenheim and A. V. Johnson, "Discrete representation of signals," Proc. IEEE, vol. 60, no. 6, pp. 681-691, Jun. 1972.
- (1972) Proc. IEEE , vol.60 , Issue.6 , pp. 681-691
- Oppenheim, D.¹ Johnson, A.V.²

19
- 0030149866
- A maximum-likelihood approach to stochastic matching for robust speech recognition
- PII S1063667696040680
- A. Sankar and C.-H. Lee, "A maximum-likelihood approach to stochastic matching for robust speech recognition," IEEE Trans. Speech Audio Process., vol. 4, no. 3, pp. 190-202, May 1996. (Pubitemid 126753005)
- (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.3 , pp. 190-202
- Sankar, A.¹ Lee, C.-H.²

20
- 51449094035
- Rapid vocal tract length normalization using maximum likelihood estimation
- T. Emori and K. Shinoda, "Rapid vocal tract length normalization using maximum likelihood estimation," in Proc. Eurospeech, 2001, pp. 1649-1652.
- (2001) Proc. Eurospeech , pp. 1649-1652
- Emori, T.¹ Shinoda, K.²

21
- 0004161838
- C. Cambridge U.K.: Cambridge Univ. Press
- W. Press, S. Teukolsky, W. Vetterling, and B. Flannery, Numerical Recipes in C. Cambridge, U.K.: Cambridge Univ. Press, 1992.
- (1992) Numerical Recipes
- Press, W.¹ Teukolsky, S.² Vetterling, W.³ Flannery, B.⁴

22
- 70450169614
- Acoustic class specific VTLN-warping using regression class trees
- Brighton, U.K.
- S. P. Rath and S. Umesh, "Acoustic class specific VTLN-warping using regression class trees," in Proc. Interspeech, Brighton, U.K., 2009, pp. 556-559.
- (2009) Proc. Interspeech , pp. 556-559
- Rath, S.P.¹ Umesh, S.²

23
- 33947681606
- Efficient vocal tract normalization in ASR
- Cottbus, Germany
- S. Molau, S. Kanthak, and H. Ney, "Efficient vocal tract normalization in ASR," in Proc. ESSV, Cottbus, Germany, 2000.
- (2000) Proc. ESSV
- Molau, S.¹ Kanthak, S.² Ney, H.³

24
- 33846187523
- Augemented state space acoustic decoding for modeling local variability in speech
- A. Miguel, E. Lleida, R. L. Buera, and A. Ortega, "Augemented state space acoustic decoding for modeling local variability in speech," in Proc. Interspeech, Lisbon, Portugal, 2005.
- (2005) Proc. Interspeech, Lisbon, Portugal
- Miguel, A.¹ Lleida, E.² Buera, R.L.³ Ortega, A.⁴

25
- 85008006694
- Robust speaker-adaptive HMM-based text-to-speech synthesis
- Aug.
- J. Yamagishi, T.Nose, H. Zen, Z.-H. Ling,T. Toda, K. Tokuda, S. King, and S. Renals, "Robust speaker-adaptive HMM-based text-to-speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 6, pp. 1208-1230, Aug. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.6 , pp. 1208-1230
- Yamagishi, J.¹ Nose, T.² Zen, H.³ Lingt Toda, Z.-H.⁴ Tokuda, K.⁵ King, S.⁶ Renals, S.⁷

26
- 84862273033
- Ph.D. dissertation Univ. of Edinburgh, Edinburgh, U.K.
- G. Garau, "Speaker normalization for large vocabulary multiparty conversational speech recognition," Ph.D. dissertation, Univ. of Edinburgh, Edinburgh, U.K., 2008.
- (2008) Speaker Normalization for Large Vocabulary Multiparty Conversational Speech Recognition
- Garau, G.¹

27
- 70450202428
- A studyon the influenceof co-variance adaptation on Jacobian compensation in vocal tract length normalization
- Brighton, U.K.
- D.R. Sanand, S.P.Rath, andS. Umesh,"A studyon the influenceof co-variance adaptation on Jacobian compensation in vocal tract length normalization," in Proc. Interspeech, Brighton, U.K., 2009, pp. 584-587.
- (2009) Proc. Interspeech , pp. 584-587
- Sanand, D.R.¹ Rath, S.P.² Umesh, S.³

28
- 33745221208
- Efficient pitch-based estimation of VTLN warp factors
- 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
- A. Faria and D. Gelbart, "Efficient pitch-based estimation of VTLN warp factors," in Proc. Interspeech, Sep. 2005, pp. 213-216. (Pubitemid 43908039)
- (2005) 9th European Conference on Speech Communication and Technology , pp. 213-216
- Faria, A.¹ Gelbart, D.²

29
- 79959858171
- Roles of the average voice in speaker-adaptive HMM-based speech synthesis
- Sep.
- J. Yamagishi, O. Watts, S. King, and B. Usabaev, "Roles of the average voice in speaker-adaptive HMM-based speech synthesis," in Proc. In-terspeech, Sep. 2010, pp. 418-421.
- (2010) Proc. In-terspeech , pp. 418-421
- Yamagishi, J.¹ Watts, O.² King, S.³ Usabaev, B.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.