SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 20, Issue 6, 2012, Pages 1713-1724

Statistical parametric speech synthesis based on speaker and language factorization

(7) Zen, Heiga a Braunschweiler, Norbert b Buchholz, Sabine c Gales, Mark J F b Knill, Kate b Krstulović, Sacha d Latorre, Javier b

a GOOGLE (United Kingdom)

b TOSHIBA CORPORATION (Japan)

c SynapseWork (United Kingdom)

d NUANCE COMMUNICATIONS (United States)

Author keywords

Hidden Markov models (HMMs); Speaker and language factorization; Statistical parametric speech synthesis

Indexed keywords

HIDDEN MARKOV MODELS (HMMS); IN-BUILDINGS; LANGUAGE FACTORIZATION; MULTIPLE LANGUAGES; RECOGNITION SYSTEMS; SPEAKER CHARACTERISTICS;

DECISION TREES; FACTORIZATION; HIDDEN MARKOV MODELS;

SPEECH SYNTHESIS;

EID: 84859765673 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2187195 Document Type: Article

Times cited : (97)

References (42)

1
- 0003417482
- International phonetic association
- Cambridge Univ. Press
- International Phonetic Association,, Handbook of the International Phonetic Association, Cambridge Univ. Press, 1999.
- (1999) Handbook of the International Phonetic Association

2
- 0030362995
- A compact model for speaker adaptive training
- T. Anastasakos, J. McDonough, R. Schwartz, and J. Makhoul, "A compact model for speaker adaptive training," in Proc. ICSLP, 1996, pp. 1137-1140.
- (1996) Proc. ICSLP , pp. 1137-1140
- Anastasakos, T.¹ McDonough, J.² Schwartz, R.³ Makhoul, J.⁴

3
- 84865713971
- Crowdsourcing preference tests, and how to detect cheating
- S. Buchholz and J. Latorre, "Crowdsourcing preference tests, and how to detect cheating," in Proc. Interspeech, 2011, pp. 3053-3056.
- (2011) Proc. Interspeech , pp. 3053-3056
- Buchholz, S.¹ Latorre, J.²

4
- 0032050110
- Maximum likelihood linear transformations for HMM-based speech recognition
- M. Gales, "Maximum likelihood linear transformations for HMM-based speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998. (Pubitemid 128383747)
- (1998) Computer Speech and Language , vol.12 , Issue.2 , pp. 75-98
- Gales, M.J.F.¹

5
- 0034227757
- Cluster adaptive training of hidden markov models
- Jul
- M. Gales, "Cluster adaptive training of hidden Markov models," IEEE Trans. Speech Audio Process., vol. 8, no. 4, pp. 417-428, Jul. 2000.
- (2000) IEEE Trans. Speech Audio Process , vol.8 , Issue.4 , pp. 417-428
- Gales, M.¹

6
- 84962787636
- Acoustic factorisation
- M. Gales, "Acoustic factorisation," in Proc. ASRU, 2001, pp. 77-80.
- (2001) Proc. ASRU , pp. 77-80
- Gales, M.¹

7
- 0034320005
- Rapid speaker adaptation in eigenvoice space
- DOI 10.1109/89.876308
- R. Kuhn, J. Junqua, P. Nguyen, and N. Niedzielski, "Rapid speaker adaptation in eigenvoice space," IEEE Trans. Speech Audio Process., vol. 8, no. 6, pp. 695-707, Nov. 2000. (Pubitemid 32025317)
- (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.6 , pp. 695-707
- Kuhn, R.¹ Junqua, J.-C.² Nguyen, P.³ Niedzielski, N.⁴

8
- 33748468338
- New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer
- DOI 10.1016/j.specom.2006.05.003, PII S0167639306000483
- J. Latorre, K. Iwano, and S. Furui, "New approach to the polyglot speech generation by means of an HMM-based speaker adaptable synthesizer," Speech Commun., vol. 48, no. 10, pp. 1227-1242, 2006. (Pubitemid 44353817)
- (2006) Speech Communication , vol.48 , Issue.10 , pp. 1227-1242
- Latorre, J.¹ Iwano, K.² Furui, S.³

9
- 79959843446
- An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation
- H. Liang and J. Dines, "An analysis of language mismatch in HMM state mapping-based cross-lingual speaker adaptation," in Proc. Interspeech, 2010, pp. 622-625.
- (2010) Proc. Interspeech , pp. 622-625
- Liang, H.¹ Dines, J.²

10
- 51449118125
- Acoustic modeling with contextual additive structure for HMM-based speech recognition
- Y. Nankaku, K. Nakamura, H. Zen, and K. Tokuda, "Acoustic modeling with contextual additive structure for HMM-based speech recognition," in Proc. ICASSP, 2008, pp. 4469-4472.
- (2008) Proc. ICASSP , pp. 4469-4472
- Nankaku, Y.¹ Nakamura, K.² Zen, H.³ Tokuda, K.⁴

11
- 0003805597
- Ph.D. dissertation, Cambridge Univ., Cambridge, U.K
- J. Odell, "The use of context in large vocabulary speech recognition," Ph.D. dissertation, Cambridge Univ., Cambridge, U.K., 1995.
- (1995) The Use of Context in Large Vocabulary Speech Recognition
- Odell, J.¹

12
- 78651062051
- Cross-lingual speaker adaptation for HMM-based speech synthesis considering differences between language-dependent average voices
- X. Peng, K. Oura, Y. Nankaku, and K. Tokuda, "Cross-lingual speaker adaptation for HMM-based speech synthesis considering differences between language-dependent average voices," in Proc. ICSP, 2010, pp. 605-608.
- (2010) Proc. ICSP , pp. 605-608
- Peng, X.¹ Oura, K.² Nankaku, Y.³ Tokuda, K.⁴

13
- 85008020260
- A cross-language state sharing and mapping approach to bilingual (Mandarin-English) TTS
- Aug
- Y. Qian, H. Liang, and F. Soong, "A cross-language state sharing and mapping approach to bilingual (Mandarin-English) TTS," IEEE Trans. Audio Speech Lang. Process., vol. 17, no. 6, pp. 1231-1239, Aug. 2009.
- (2009) IEEE Trans. Audio Speech Lang. Process , vol.17 , Issue.6 , pp. 1231-1239
- Qian, Y.¹ Liang, H.² Soong, F.³

14
- 70450153447
- Japanese M.S. thesis, Nagoya Inst. of Technol., Nagoya, Japan
- K. Saino, "A clustering technique for factor analyzed voice models," (in Japanese) M.S. thesis, Nagoya Inst. of Technol., Nagoya, Japan, 2008.
- (2008) A Clustering Technique for Factor Analyzed Voice Models
- Saino, K.¹

15
- 1642370513
- Solving unsymmetric sparse systems of linear equations with PARDISO
- O. Schenk and K. Gärtner, "Solving unsymmetric sparse systems of linear equations with PARDISO," J. Future Gen. Comput. Syst., vol. 20, no. 3, pp. 475-487, 2004.
- (2004) J. Future Gen. Comput. Syst , vol.20 , Issue.3 , pp. 475-487
- Schenk, O.¹ Gärtner, K.²

16
- 85009274666
- Globalphone: A multilingual speech and text database developed at Karlsruhe University
- T. Schultz, "Globalphone: A multilingual speech and text database developed at Karlsruhe University," in Proc. ICSLP, 2002, pp. 345-348.
- (2002) Proc. ICSLP , pp. 345-348
- Schultz, T.¹

17
- 84865783757
- Separating speaker and environmental variability using factored transforms
- M. Seltzer and A. Acero, "Separating speaker and environmental variability using factored transforms," in Proc. Interspeech, 2011, pp. 1097-1100.
- (2011) Proc. Interspeech , pp. 1097-1100
- Seltzer, M.¹ Acero, A.²

18
- 85009257840
- Eigenvoices for HMM-based speech synthesis
- K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Eigenvoices for HMM-based speech synthesis," in Proc. ICSLP, 2002, pp. 1269-1272.
- (2002) Proc. ICSLP , pp. 1269-1272
- Shichiri, K.¹ Sawabe, A.² Tokuda, K.³ Masuko, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

19
- 85135145174
- Acoustic modeling based on the MDL criterion for speech recognition
- K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, 1997, pp. 99-102.
- (1997) Proc. Eurospeech , pp. 99-102
- Shinoda, K.¹ Watanabe, T.²

20
- 33947650089
- HMM state clustering based on efficient cross-validation
- T. Shinozaki, "HMM state clustering based on efficient cross-validation," in Proc. ICASSP, 2006, pp. 1157-1160.
- (2006) Proc. ICASSP , pp. 1157-1160
- Shinozaki, T.¹

21
- 33646806075
- Adaptation of precision matrix models on large vocabulary continuous speech recognition
- K. Sim and M. Gales, "Adaptation of precision matrix models on large vocabulary continuous speech recognition," in Proc. ICASSP, 2005, pp. 97-100.
- (2005) Proc. ICASSP , pp. 97-100
- Sim, K.¹ Gales, M.²

22
- 0004161686
- New York: Springer
- R. Sproat, Multilingual Text-to-Speech Synthesis: The Bell Labs Approach. New York: Springer, 1998.
- (1998) Multilingual Text-to-Speech Synthesis: The Bell Labs Approach
- Sproat, R.¹

23
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst., Vol. E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

24
- 0036522887
- Multi-space probability distribution HMM
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Multi-space probability distribution HMM," IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455-464, 2002. (Pubitemid 35353984)
- (2002) IEICE Transactions on Information and Systems , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

25
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, 2000, pp. 1315-1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

26
- 84966348891
- An HMM-based speech synthesis system applied to English
- Workshop, CD-ROM Proceeding
- K. Tokuda, H. Zen, and A. Black, "An HMM-based speech synthesis system applied to English," in Proc. IEEE Speech Synth. Workshop, 2002, CD-ROM Proceeding.
- (2002) Proc. IEEE Speech Synth
- Tokuda, K.¹ Zen, H.² Black, A.³

27
- 84856249636
- From multilingual to polyglot speech synthesis
- C. Traber, K. Huber, K. Nedir, B. Pfister, E. Keller, and B. Zellner, "From multilingual to polyglot speech synthesis," in Proc. Eurospeech, 1999, pp. 835-838.
- (1999) Proc. Eurospeech , pp. 835-838
- Traber, C.¹ Huber, K.² Nedir, K.³ Pfister, B.⁴ Keller, E.⁵ Zellner, B.⁶

28
- 80051617808
- Speaker and noise factorisation on AURORA4 task
- Y.Wang and M. Gales, "Speaker and noise factorisation on AURORA4 task," in Proc. ICASSP, 2011, pp. 4584-4587.
- (2011) Proc. ICASSP , pp. 4584-4587
- Wang, Y.¹ Gales, M.²

29
- 84859768642
- The EMIME Bilingual Database, Tech. Rep. EDI-INF-RR-1388
- M. Wester, "The EMIME Bilingual Database," Univ. of Edinburgh, 2010, Tech. Rep. EDI-INF-RR-1388.
- (2010) Univ. of Edinburgh
- Wester, M.¹

30
- 70450192740
- State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
- Y.Wu, Y. Nankaku, and K. Tokuda, "State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis," in Proc. Interspeech, 2009, pp. 528-531.
- (2009) Proc. Interspeech , pp. 528-531
- Wu, Y.¹ Nankaku, Y.² Tokuda, K.³

31
- 33846463597
- Ph.D. dissertation, Tokyo Inst. of Technol., Yokohama, Japan
- J. Yamagishi, "Average-voice-based speech synthesis," Ph.D. dissertation, Tokyo Inst. of Technol., Yokohama, Japan, 2006.
- (2006) Average-voice-based Speech Synthesis
- Yamagishi, J.¹

32
- 78049403515
- Simple methods for improving speakersimilarity of HMM-based speech synthesis
- J. Yamagishi and S. King, "Simple methods for improving speakersimilarity of HMM-based speech synthesis," in Proc. ICASSP, 2010, pp. 4610-4613.
- (2010) Proc. ICASSP , pp. 4610-4613
- Yamagishi, J.¹ King, S.²

33
- 4544291748
- Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis
- J. Yamagishi, M. Tachibana, T. Masuko, and T. Kobayashi, "Speaking style adaptation using context clustering decision tree for HMM-based speech synthesis," in Proc. ICASSP, 2004, pp. 5-8.
- (2004) Proc. ICASSP , pp. 5-8
- Yamagishi, J.¹ Tachibana, M.² Masuko, T.³ Kobayashi, T.⁴

34
- 84865777002
- The CSTR/EMIME HTS system for blizzard challenge
- J. Yamagishi and O.Watts, "The CSTR/EMIME HTS system for Blizzard Challenge," in Proc. Blizzard Challenge Workshop, 2010.
- (2010) Proc. Blizzard Challenge Workshop
- Yamagishi, J.¹ Watts, O.²

35
- 67650819492
- The HTS2007' system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge
- J. Yamagishi, H. Zen, Y.Wu, T. Toda, and K. Tokuda, "The HTS2007' system: Yet another evaluation of the speaker-adaptive HMM-based speech synthesis system in the 2008 Blizzard Challenge," in Proc. Blizzard Challenge Workshop, 2008.
- (2008) Proc. Blizzard Challenge Workshop
- Yamagishi, J.¹ Zen, H.² Wu, Y.³ Toda, T.⁴ Tokuda, K.⁵

36
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

37
- 4544253619
- Adaptive training using structured transforms
- K. Yu and M. Gales, "Adaptive training using structured transforms," in Proc. ICASSP, 2004, pp. 317-320.
- (2004) Proc. ICASSP , pp. 317-320
- Yu, K.¹ Gales, M.²

38
- 79955538498
- Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis
- K. Yu, H. Zen, F. Mairesse, and S. Young, "Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis," Speech Commun., vol. 53, no. 6, pp. 914-923, 2011.
- (2011) Speech Commun , vol.53 , Issue.6 , pp. 914-923
- Yu, K.¹ Zen, H.² Mairesse, F.³ Young, S.⁴

39
- 79959813917
- Speaker and language adaptive training for HMM-based polyglot speech synthesis
- H. Zen, "Speaker and language adaptive training for HMM-based polyglot speech synthesis," in Proc. Interspeech, 2010, pp. 410-413.
- (2010) Proc. Interspeech , pp. 410-413
- Zen, H.¹

40
- 70450161503
- o model for HMM-based speech synthesis
- o model for HMM-based speech synthesis," in Proc. Interspeech, 2009, pp. 2091-2094.
- (2009) Proc. Interspeech , pp. 2091-2094
- Zen, H.¹ Braunschweiler, N.²

41
- 84921798247
- HMM-based polyglot speech synthesis by speaker and language adaptive training
- H. Zen, N. Braunschweiler, S. Buchholz, K. Knill, S. Krstulovic', and J. Latorre, "HMM-based polyglot speech synthesis by speaker and language adaptive training," in Proc. ISCA SSW7, 2010, pp. 186-191.
- (2010) Proc. ISCA SSW7 , pp. 186-191
- Zen, H.¹ Braunschweiler, N.² Buchholz, S.³ Knill, K.⁴ Krstulovic, S.⁵ Latorre, J.⁶

42
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.