SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 53, Issue 6, 2011, Pages 914-923

Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis

(4) Yu, Kai a Zen, Heiga b Mairesse, François a Young, Steve a

a UNIVERSITY OF CAMBRIDGE (United Kingdom)

b TOSHIBA CORPORATION (Japan)

Author keywords

Context adaptive training; Factorized decision tree; HMM based speech synthesis; State clustering

Indexed keywords

ADAPTIVE TRAINING; COMBINATORIAL EXPLOSION; CONTEXT DEPENDENT; DATA COVERAGE; DATA SPARSITY PROBLEMS; FACTORIZED DECISION TREE; HIGH QUALITY; HMM-BASED SPEECH SYNTHESIS; NATURAL SPEECH; PARAMETER CLUSTERING; STATE CLUSTERING; SYNTHESIZED SPEECH; USE CONTEXT;

DECISION TREES; SPEECH SYNTHESIS;

SPEECH RECOGNITION;

EID: 79955538498 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2011.03.003 Document Type: Article

Times cited : (27)

References (28)

1
- 0030362995
- A compact model for speaker adaptive training
- Anastasakos, T., Mcdonough, J., Schwartz, R., Makhoul, J., 1996. A compact model for speaker adaptive training. In: Proc. ICSLP, pp. 1137-1140.
- (1996) Proc. ICSLP , pp. 1137-1140
- Anastasakos, T.¹ McDonough, J.² Schwartz, R.³ Makhoul, J.⁴

2
- 0032658258
- Decision tree state tying based on penalized Bayesian information criterion
- Chou, W., Reichl, W., 1999. Decision tree state tying based on penalized Bayesian information criterion. In: Proc. ICASSP, pp. 345-348.
- (1999) Proc. ICASSP , pp. 345-348
- Chou, W.¹ Reichl, W.²

3
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- Fukada, T., Tokuda, K., Kobayashi, T., Imai, S., 1992. An adaptive algorithm for mel-cepstral analysis of speech. In: Proc. ICASSP, pp. 137-140.
- (1992) Proc. ICASSP , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

4
- 0003940203
- The generation and use of regression class trees for MLLR adaptation
- Cambridge University Engineering Department
- Gales, M., 1996. The generation and use of regression class trees for MLLR adaptation. Tech. Rep. CUED/F-INFENG/TR263, Cambridge University Engineering Department.
- (1996) Tech. Rep. CUED/F-INFENG/TR263
- Gales, M.¹

5
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. Gales Maximum likelihood linear transformations for HMM-based speech recognition Comput. Speech Lang. 12 2 1998 75 98 (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

6
- 0034227757
- Cluster adaptive training of hidden Markov models
- M. Gales Cluster adaptive training of hidden Markov models IEEE Trans. Speech Audio Process. 8 4 2000 417 428
- (2000) IEEE Trans. Speech Audio Process. , vol.8 , Issue.4 , pp. 417-428
- Gales, M.¹

7
- 79959841827
- Canonical state models for automatic speech recognition
- Gales, M., Yu, K., 2010. Canonical state models for automatic speech recognition. In: Proc. Interspeech, pp. 58-61.
- (2010) Proc. Interspeech , pp. 58-61
- Gales, M.¹ Yu, K.²

8
- 0020596154
- Cepstral analysis synthesis on the mel frequency scale
- Imai, S., 1983. Cepstral analysis synthesis on the mel frequency scale. In: Proc. ICASSP, pp. 93-96. (Pubitemid 13585391)
- (1983) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 93-96
- Imai Satoshi¹

9
- 33746384049
- Statistical modelling of speech segment duration by constrained tree regression
- N. Iwahashi, and Y. Sagisaka Statistical modelling of speech segment duration by constrained tree regression Trans. IEICE E83-D 2000 1550 1559
- (2000) Trans. IEICE , vol.E83-D , pp. 1550-1559
- Iwahashi, N.¹ Sagisaka, Y.²

10
- 0032673049
- 0 extraction: Possible role of a repetitive structure in sounds
- 0 extraction: possible role of a repetitive structure in sounds Speech Comm. 27 1999 187 207
- (1999) Speech Comm. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigné, A.³

11
- 33646773080
- CMU ARCTIC databases for speech synthesis
- Carnegie Mellon University
- Kominek, J., Black, A., 2003. CMU ARCTIC databases for speech synthesis. Tech. Rep. CMU-LTI-03-177, Carnegie Mellon University.
- (2003) Tech. Rep. CMU-LTI-03-177
- Kominek, J.¹ Black, A.²

12
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- C. Leggetter, and P. Woodland Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models Comput. Speech Lang. 9 1995 171 185
- (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
- Leggetter, C.¹ Woodland, P.²

13
- 51449118125
- Acoustic modeling with contextual additive structure for HMM-based speech recognition
- Nankaku, Y., Nakamura, K., Zen, H., Tokuda, K., 2008. Acoustic modeling with contextual additive structure for HMM-based speech recognition. In: Proc. ICASSP, pp. 4469-4472.
- (2008) Proc. ICASSP , pp. 4469-4472
- Nankaku, Y.¹ Nakamura, K.² Zen, H.³ Tokuda, K.⁴

14
- 78049409301
- Subspace Gaussian mixture models for speech recognition
- Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N., Karafiat, M., Rastrow, A., Rose, R., Schwarz, P., Thomas, S., 2010. Subspace Gaussian mixture models for speech recognition. In: Proc. ICASSP, pp. 4330-4333.
- (2010) Proc. ICASSP , pp. 4330-4333
- Povey, D.¹ Burget, L.² Agarwal, M.³ Akyazi, P.⁴ Feng, K.⁵ Ghoshal, A.⁶ Glembek, O.⁷ Goel, N.⁸ Karafiat, M.⁹ Rastrow, A.¹⁰ Rose, R.¹¹ Schwarz, P.¹² Thomas, S.¹³

15
- 70450153447
- Master Thesis, Nagoya Institute of Technology (in Japanese)
- Saino, K., 2008. A clustering technique for factor analyzed voice models. Master Thesis, Nagoya Institute of Technology (in Japanese).
- (2008) A Clustering Technique for Factor Analyzed Voice Models
- Saino, K.¹

16
- 85135145174
- Acoustic modeling based on the MDL principle for speech recognition
- Shinoda, K., Watanabe, T., 1997. Acoustic modeling based on the MDL principle for speech recognition. In: Proc. EUROSPEECH, pp. 99-102.
- (1997) Proc. EUROSPEECH , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

17
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda, and K. Tokuda A speech parameter generation algorithm considering global variance for HMM-based speech synthesis IEICE Trans. Inform. Systems E90-D 5 2007 816 824
- (2007) IEICE Trans. Inform. Systems , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

18
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T., 2000. Speech parameter generation algorithms for HMM-based speech synthesis. In: Proc. ICASSP, pp. 1315-1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

19
- 79952258981
- Tokuda, K., Oura, K., Hashimoto, K., Zen, H., Yamagishi, J., Toda, T., Nose, T., Sako, S., Black, A. The HMM-based speech synthesis system. .
- The HMM-based Speech Synthesis System
- Tokuda, K.¹ Oura, K.² Hashimoto, K.³ Zen, H.⁴ Yamagishi, J.⁵ Toda, T.⁶ Nose, T.⁷ Sako, S.⁸ Black, A.⁹

20
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 1999. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis. In: Proc. Eurospeech, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

21
- 0002144369
- Tree-based state tying for high accuracy acoustic modeling
- Young, S., Odell, J., Woodland, P., 1994. Tree-based state tying for high accuracy acoustic modelling. In: ARPA Workshop on Human Language Technology, pp. 307-312.
- (1994) ARPA Workshop on Human Language Technology , pp. 307-312
- Young, S.¹ Odell, J.² Woodland, P.³

22
- 0003571976
- Cambridge University Engineering Department
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X.-Y. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland The HTK Book (for HTK version 3.4) 2009 Cambridge University Engineering Department
- (2009) The HTK Book (For HTK Version 3.4)
- Young, S.¹ Evermann, G.² Gales, M.³ Hain, T.⁴ Kershaw, D.⁵ Liu, X.-Y.⁶ Moore, G.⁷ Odell, J.⁸ Ollason, D.⁹ Povey, D.¹⁰ Valtchev, V.¹¹ Woodland, P.¹²

23
- 67650823157
- f0 in unvoiced regions in HMM based speech synthesis
- f0 in unvoiced regions in HMM based speech synthesis. In: Proc. ICASSP, pp. 3773-3776.
- (2009) Proc. ICASSP , pp. 3773-3776
- Yu, K.¹ Toda, T.² Gasic, M.³ Keizer, S.⁴ Mairesse, F.⁵ Thomson, B.⁶ Young, S.⁷

24
- 78049376926
- Word-level emphasis modelling in HMM-based speech synthesis
- Yu, K., Mairesse, F., Young, S., 2010. Word-level emphasis modelling in HMM-based speech synthesis. In: Proc. ICASSP, pp. 4238-4241.
- (2010) Proc. ICASSP , pp. 4238-4241
- Yu, K.¹ Mairesse, F.² Young, S.³

25
- 79959813917
- Speaker and language adaptive training for HMM-based polyglot speech synthesis
- Zen, H., 2010. Speaker and language adaptive training for HMM-based polyglot speech synthesis. In: Proc. Interspeech, pp. 410-413.
- (2010) Proc. Interspeech , pp. 410-413
- Zen, H.¹

26
- 70450161503
- 0 model for HMM-based speech synthesis
- 0 model for HMM-based speech synthesis. In: Proc. Interspeech, pp. 2091-2094.
- (2009) Proc. Interspeech , pp. 2091-2094
- Zen, H.¹ Braunschweiler, N.²

27
- 33846405723
- Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005
- DOI 10.1093/ietisy/e90-1.1.325
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005 IEICE Trans. Inform. Systems E-90D 1 2007 325 333 (Pubitemid 46145336)
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

28
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black Statistical parametric speech synthesis Speech Comm. 51 11 2009 1039 1064
- (2009) Speech Comm. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.