메뉴 건너뛰기




Volumn 20, Issue 7, 2012, Pages 2134-2148

Vocal tract length normalization for statistical parametric speech synthesis

Author keywords

Expectation maximization optimization; hidden Markov model (HMM) based statistical parametric speech synthesis; speaker adaptation; vocal tract length normalization

Indexed keywords

AUTOMATIC SPEECH RECOGNITION; EFFICIENT IMPLEMENTATION; EXPECTATION MAXIMIZATION; JACOBIANS; RAPID SPEAKER ADAPTATION; SPEAKER ADAPTATION; TRANSFORMATION MATRICES; VOCAL TRACT LENGTH NORMALIZATION; WARPING FACTORS;

EID: 84862291337     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2198058     Document Type: Article
Times cited : (12)

References (29)
  • 1
    • 34547526960 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • A. W. Black, H. Zen, and K. Tokuda, "Statistical parametric speech synthesis," in Proc. ICASSP, 2007, pp. 1229-1232.
    • (2007) Proc. ICASSP , pp. 1229-1232
    • Black, A.W.1    Zen, H.2    Tokuda, K.3
  • 2
    • 67650854725 scopus 로고    scopus 로고
    • Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
    • Jan.
    • J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.1 , pp. 66-83
    • Yamagishi, J.1    Kobayashi, T.2    Nakano, Y.3    Ogata, K.4    Isogai, J.5
  • 3
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M. J. F. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998. (Pubitemid 128383747)
    • (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.J.F.1
  • 4
    • 0031647824 scopus 로고    scopus 로고
    • A frequency warping approach to speaker normalization
    • PII S1063667698000960
    • L. Lee and R. Rose, "A frequency warping approach to speaker normalization," IEEE Trans. Speech Audio Process., vol. 6, no. 1, pp. 49-60, Jan. 1998. (Pubitemid 128720631)
    • (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.1 , pp. 49-60
    • Lee, L.1    Rose, R.2
  • 5
    • 78049381954 scopus 로고    scopus 로고
    • VTLN adaptation for statistical speech synthesis
    • Mar.
    • L. Saheer, P. N. Garner, J. Dines, and H. Liang, "VTLN adaptation for statistical speech synthesis," in Proc. ICASSP, Mar. 2010, pp. 4838-4841.
    • (2010) Proc. ICASSP , pp. 4838-4841
    • Saheer, L.1    Garner, P.N.2    Dines, J.3    Liang, H.4
  • 9
    • 33745201218 scopus 로고    scopus 로고
    • Implementing frequency warping and VTLN through linear transformation of conventional MFCC
    • S. Umesh, A. Zolnay, and H. Ney, "Implementing frequency warping and VTLN through linear transformation of conventional MFCC," in Proc. Interspeech, Lisbon, Portugal, 2005, pp. 269-271.
    • (2005) Proc. Interspeech, Lisbon, Portugal , pp. 269-271
    • Umesh, S.1    Zolnay, A.2    Ney, H.3
  • 10
    • 47549091998 scopus 로고    scopus 로고
    • Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC
    • S. Panchapagesan and A. Alwan, "Frequency warping for VTLN and speaker adaptation by linear transformation of standard MFCC," Comput. Speech Lang., vol. 23, no. 1, pp. 42-64, 2009.
    • (2009) Comput. Speech Lang. , vol.23 , Issue.1 , pp. 42-64
    • Panchapagesan, S.1    Alwan, A.2
  • 11
    • 27644522706 scopus 로고    scopus 로고
    • Vocal tract normalization equals linear transformation in cepstral space
    • DOI 10.1109/TSA.2005.848881
    • M. Pitz and H. Ney, "Vocal tract normalization equals linear transformation in cepstral space," IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 930-944, Sep. 2005. (Pubitemid 41558907)
    • (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.5 , pp. 930-944
    • Pitz, M.1    Ney, H.2
  • 14
    • 84888623995 scopus 로고    scopus 로고
    • Ph.D. dissertation, Bundeswehr Univ. Munich, Munich, Germany
    • D. Sündermann, "Text-independent voice conversion," Ph.D. dissertation, Bundeswehr Univ. Munich, Munich, Germany, 2008.
    • (2008) Text-independent Voice Conversion
    • Sündermann, D.1
  • 15
    • 85131821539 scopus 로고
    • Mel-generalized cepstral analysis-A unified approach to speech spectral estimation
    • Sep.
    • K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Mel-generalized cepstral analysis-A unified approach to speech spectral estimation," in Proc. ICSLP, Sep. 1994, vol. 3, pp. 1043-1046.
    • (1994) Proc. ICSLP , vol.3 , pp. 1043-1046
    • Tokuda, K.1    Kobayashi, T.2    Masuko, T.3    Imai, S.4
  • 16
    • 78049377655 scopus 로고    scopus 로고
    • A study on average voice model training using vocal tract length normalization," (Japanese)
    • M. Hirohata, T. Masuko, and T. Kobayashi, "A study on average voice model training using vocal tract length normalization," (in Japanese) IEICE Tech. Rep., vol. 103, no. 27, pp. 69-74, 2003.
    • (2003) IEICE Tech. Rep. , vol.103 , Issue.27 , pp. 69-74
    • Hirohata, M.1    Masuko, T.2    Kobayashi, T.3
  • 17
    • 84867216464 scopus 로고    scopus 로고
    • A computationally efficient approach to warp factor estimation in VTLN using em algorithm and sufficient statistics
    • Brisbane, Australia
    • P. T. Akhil, S. P. Rath, S. Umesh, and D. R. Sanand, "A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics," in Proc. Interspeech, Brisbane, Australia, 2008, pp. 1713-1716.
    • (2008) Proc. Interspeech , pp. 1713-1716
    • Akhil, P.T.1    Rath, S.P.2    Umesh, S.3    Sanand, D.R.4
  • 18
    • 0001052406 scopus 로고
    • Discrete representation of signals
    • Jun.
    • D. Oppenheim and A. V. Johnson, "Discrete representation of signals," Proc. IEEE, vol. 60, no. 6, pp. 681-691, Jun. 1972.
    • (1972) Proc. IEEE , vol.60 , Issue.6 , pp. 681-691
    • Oppenheim, D.1    Johnson, A.V.2
  • 19
    • 0030149866 scopus 로고    scopus 로고
    • A maximum-likelihood approach to stochastic matching for robust speech recognition
    • PII S1063667696040680
    • A. Sankar and C.-H. Lee, "A maximum-likelihood approach to stochastic matching for robust speech recognition," IEEE Trans. Speech Audio Process., vol. 4, no. 3, pp. 190-202, May 1996. (Pubitemid 126753005)
    • (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.3 , pp. 190-202
    • Sankar, A.1    Lee, C.-H.2
  • 20
    • 51449094035 scopus 로고    scopus 로고
    • Rapid vocal tract length normalization using maximum likelihood estimation
    • T. Emori and K. Shinoda, "Rapid vocal tract length normalization using maximum likelihood estimation," in Proc. Eurospeech, 2001, pp. 1649-1652.
    • (2001) Proc. Eurospeech , pp. 1649-1652
    • Emori, T.1    Shinoda, K.2
  • 22
    • 70450169614 scopus 로고    scopus 로고
    • Acoustic class specific VTLN-warping using regression class trees
    • Brighton, U.K.
    • S. P. Rath and S. Umesh, "Acoustic class specific VTLN-warping using regression class trees," in Proc. Interspeech, Brighton, U.K., 2009, pp. 556-559.
    • (2009) Proc. Interspeech , pp. 556-559
    • Rath, S.P.1    Umesh, S.2
  • 23
    • 33947681606 scopus 로고    scopus 로고
    • Efficient vocal tract normalization in ASR
    • Cottbus, Germany
    • S. Molau, S. Kanthak, and H. Ney, "Efficient vocal tract normalization in ASR," in Proc. ESSV, Cottbus, Germany, 2000.
    • (2000) Proc. ESSV
    • Molau, S.1    Kanthak, S.2    Ney, H.3
  • 27
    • 70450202428 scopus 로고    scopus 로고
    • A studyon the influenceof co-variance adaptation on Jacobian compensation in vocal tract length normalization
    • Brighton, U.K.
    • D.R. Sanand, S.P.Rath, andS. Umesh,"A studyon the influenceof co-variance adaptation on Jacobian compensation in vocal tract length normalization," in Proc. Interspeech, Brighton, U.K., 2009, pp. 584-587.
    • (2009) Proc. Interspeech , pp. 584-587
    • Sanand, D.R.1    Rath, S.P.2    Umesh, S.3
  • 29
    • 79959858171 scopus 로고    scopus 로고
    • Roles of the average voice in speaker-adaptive HMM-based speech synthesis
    • Sep.
    • J. Yamagishi, O. Watts, S. King, and B. Usabaev, "Roles of the average voice in speaker-adaptive HMM-based speech synthesis," in Proc. In-terspeech, Sep. 2010, pp. 418-421.
    • (2010) Proc. In-terspeech , pp. 418-421
    • Yamagishi, J.1    Watts, O.2    King, S.3    Usabaev, B.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.