SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 2, 2013, Pages 280-290

A unified trajectory tiling approach to high quality speech rendering

(3) Qian, Yao a Soong, Frank K a Yan, Zhi Jie a

a MICROSOFT RESEARCH ASIA (China)

Author keywords

Cross lingual; speech synthesis; trajectory tiling; voice transformation

Indexed keywords

CROSS-LINGUAL; HIGH QUALITY; SPEECH DATABASE; SUBJECTIVE EVALUATIONS; SYNTHESIZED SPEECH; UNIFIED ALGORITHM; WAVE FORMS;

SPEECH SYNTHESIS;

TRAJECTORIES;

EID: 84871382567 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2221460 Document Type: Article

Times cited : (41)

References (45)

1
- 84871384055
- EMIME [Online] Available
- EMIME [Online]. Available: http://www. emime. org

2
- 34547612590
- HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion
- Z.-H. Ling and R.-H. Wang, "HMM-based hierarchical unit selection combining Kullback-Leibler divergence with likelihood criterion, " in Proc. ICASSP., 2007, pp. 1245-1248.
- (2007) Proc. ICASSP , pp. 1245-1248
- Ling, Z.-H.¹ Wang, R.-H.²

3
- 78049399368
- Rich-context unit selection (RUS) approach to high quality TTS
- Z.-J. Yan, Y. Qian, and F. K. Soong, "Rich-context unit selection (RUS) approach to high quality TTS, " in Proc. ICASSP, 2010, pp. 4798-4801.
- (2010) Proc. ICASSP , pp. 4798-4801
- Yan, Z.-J.¹ Qian, Y.² Soong, F.K.³

4
- 84871381392
- An HMM trajectory tiling (HTT) approach to high quality TTS-Microsoft entry to Blizzard challenge 2010
- Y. Qian, Z.-J. Yan, Y.-J. Wu, F. K. Soong, G.-L. Zhang, andL.-J. Wang, "An HMM trajectory tiling (HTT) approach to high quality TTS-Microsoft entry to Blizzard challenge 2010, " in Proc. Blizzard Challenge Workshop, 2010.
- (2010) Proc. Blizzard Challenge Workshop
- Qian, Y.¹ Yan, Z.-J.² Wu, Y.-J.³ Soong, F.K.⁴ Zhang, G.-L.⁵ Wang, L.-J.⁶

5
- 85063141494
- Using 5ms segments in concatenative speech synthesis
- T. Hirai and S. Tenpaku, "Using 5ms segments in concatenative speech synthesis, " in Proc. 5th Speech Synth. Workshop, 2004.
- (2004) Proc. 5th Speech Synth. Workshop
- Hirai, T.¹ Tenpaku, S.²

6
- 80051658497
- Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis
- T. Hirai, J. Yamagishi, and S. Tenpaku, "Utilization of an HMM-based feature generation module in 5 ms segment concatenative speech synthesis, " in Proc. ISCA SSW6, 2007.
- (2007) Proc. ISCA SSW6
- Hirai, T.¹ Yamagishi, J.² Tenpaku, S.³

7
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

8
- 70450208175
- Local minimum generation error criterion for hybrid HMM speech synthesis
- X. Gonzalvo, A. Gutkin, J. C. Socoró, I. Iriondo, and P. Taylor, "Local minimum generation error criterion for hybrid HMM speech synthesis, " in Proc. Interspeech, 2009, pp. 416-419.
- (2009) Proc. Interspeech , pp. 416-419
- Gonzalvo, X.¹ Gutkin, A.² Socoró, J.C.³ Iriondo, I.⁴ Taylor, P.⁵

9
- 70450161678
- Rich context modeling for high quality HMM-Based TTS
- Z.-J. Yan, Y. Qian, and F. K. Soong, "Rich context modeling for high quality HMM-Based TTS, " in Proc. Interspeech, 2009.
- (2009) Proc. Interspeech
- Yan, Z.-J.¹ Qian, Y.² Soong, F.K.³

10
- 34547526960
- Statistical parametric speech synthesis
- A. W. Black, H. Zen, and K. Tokuda, "Statistical parametric speech synthesis, " in Proc. ICASSP, 2007, pp. 1229-1232.
- (2007) Proc. ICASSP , pp. 1229-1232
- Black, A.W.¹ Zen, H.² Tokuda, K.³

11
- 33846410497
- Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "Speech parameter generation algorithm considering global variance for HMM-based speech synthesis, " in Proc. Interspeech, 2005.
- (2005) Proc. Interspeech
- Toda, T.¹ Tokuda, K.²

12
- 67650854725
- Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm
- Jan
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm, " IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 1, pp. 66-83, Jan. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

13
- 85008020260
- A cross-language state sharing and mapping approach to bilingual (Mandarin-English) TTS
- Aug
- Y. Qian, H. Liang, and F. K. Soong, "A cross-language state sharing and mapping approach to bilingual (Mandarin-English) TTS, " IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 6, pp. 1231-1239, Aug. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process , vol.17 , Issue.6 , pp. 1231-1239
- Qian, Y.¹ Liang, H.² Soong, F.K.³

14
- 70450192740
- State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis
- Y.-J. Wu, Y. Nankaku, and K. Tokuda, "State mapping based method for cross-lingual speaker adaptation in HMM-based speech synthesis, " in Proc. Interspeech, 2009, pp. 528-531.
- (2009) Proc. Interspeech , pp. 528-531
- Wu, Y.-J.¹ Nankaku, Y.² Tokuda, K.³

15
- 84859780529
- Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis usingKLD-based transform mapping
- Jul.
- K. Oura, J. Yamagishi, M. Wester, S. King, and K. Tokuda, "Analysis of unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis usingKLD-based transform mapping, " Speech Commun. vol. 54, no. 6, pp. 704-714, Jul. 2012.
- (2012) Speech Commun , vol.54 , Issue.6 , pp. 704-714
- Oura, K.¹ Yamagishi, J.² Wester, M.³ King, S.⁴ Tokuda, K.⁵

16
- 70349218937
- State mapping for cross-language speaker adaptation in TTS
- Y.-N. Chen, Y. Jiao, Y. Qian, and F. K. Soong, "State mapping for cross-language speaker adaptation in TTS, " in Proc. ICASSP, 2009, pp. 4273-4276.
- (2009) Proc. ICASSP , pp. 4273-4276
- Chen, Y.-N.¹ Jiao, Y.² Qian, Y.³ Soong, F.K.⁴

17
- 78049411002
- Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction
- M. Gibson, T. Hirsimaki, R. Karhila, M. Kurimo, andW. Byrne, "Unsupervised cross-lingual speaker adaptation for HMM-based speech synthesis using two-pass decision tree construction, " in Proc. ICASSP, 2010, pp. 4642-4645.
- (2010) Proc. ICASSP , pp. 4642-4645
- Gibson, M.¹ Hirsimaki, T.² Karhila, R.³ Kurimo, M.⁴ Byrne, W.⁵

18
- 84865786646
- Phonological knowledge guided HMM state mapping for cross-Lingual speaker adaptation
- H. Liang and J. Dines, "Phonological knowledge guided HMM state mapping for cross-Lingual speaker adaptation, " in Proc. Interspeech, 2011, pp. 1825-1828.
- (2011) Proc. Interspeech , pp. 1825-1828
- Liang, H.¹ Dines, J.²

19
- 80051608660
- A frame mapping based HMM approach to cross-lingual voice transformation
- Y. Qian, J. Xu, and F. K. Soong, "A frame mapping based HMM approach to cross-lingual voice transformation, " in Proc. ICASSP, 2011, pp. 5120-5123.
- (2011) Proc. ICASSP , pp. 5120-5123
- Qian, Y.¹ Xu, J.² Soong, F.K.³

20
- 0001810975
- Line spectrum representation of linear predictive coefficients of speech signals
- F. Itakura, "Line spectrum representation of linear predictive coefficients of speech signals, " J. Acoust. Soc. Amer., vol. 57, p. S35, 1975.
- (1975) J. Acoust. Soc. Amer , vol.57
- Itakura, F.¹

21
- 0002557614
- Line spectrum pair (LSP) and speech data compression
- F. K. Soong and B. H. Juang, "Line spectrum pair (LSP) and speech data compression, " in Proc. ICASSP, 1984, pp. 37-40.
- (1984) Proc. ICASSP , pp. 37-40
- Soong, F.K.¹ Juang, B.H.²

22
- 2942710378
- Linear prediction voice synthesizers: Line spectrum pairs (LSP) is the newest of the several techniques
- H. Wakita, "Linear prediction voice synthesizers: Line spectrum pairs (LSP) is the newest of the several techniques, " Speech Technol., vol. 1, pp. 17-22, 1981.
- (1981) Speech Technol , vol.1 , pp. 17-22
- Wakita, H.¹

23
- 38249015166
- On the use of line spectral frequency parameters for speech recognition
- K. K. Paliwal, "On the use of line spectral frequency parameters for speech recognition, " Digital Signal Process., vol. 2, pp. 80-87, 1992.
- (1992) Digital Signal Process , vol.2 , pp. 80-87
- Paliwal, K.K.¹

24
- 0032673049
- Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, andA. deCheveigne, "Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Commun., vol. 27, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² DeCheveigne, A.³

25
- 0001455934
- Amsterdam, The Netherlands: Elservier
- A. D. Talkin, Speech Coding and Synthesis, Chapter A, Robust Algorithm for Pitch Tracking (RAPT). Amsterdam, The Netherlands: Elservier, 1995.
- (1995) Speech Coding and Synthesis, Chapter A, Robust Algorithm for Pitch Tracking (RAPT)
- Talkin, A.D.¹

26
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis, " in Proc. Eurospeech, 1999.
- (1999) Proc. Eurospeech
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

27
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Y.-J. Wu and R. H. Wang, "Minimum generation error training for HMM-based speech synthesis, " in Proc. ICASSP, 2006, pp. 89-92.
- (2006) Proc. ICASSP , pp. 89-92
- Wu, Y.-J.¹ Wang, R.H.²

28
- 70450169782
- A minimum v/u error approach to F0 generation in HMM-based TTS
- Y. Qian, F. K. Soong, M.-M. Wang, and Z.-Z. Wu, "A minimum v/u error approach to F0 generation in HMM-based TTS, " in Proc. Interspeech, 2009.
- (2009) Proc. Interspeech
- Qian, Y.¹ Soong, F.K.² Wang, M.-M.³ Wu, Z.-Z.⁴

29
- 78049381954
- VTLN adaptation for statistical speech synthesis
- L. Saheer, P. N. Garner, J. Dines, and H. Liang, "VTLN adaptation for statistical speech synthesis, " in Proc. ICASSP, 2010, pp. 4838-4841.
- (2010) Proc. ICASSP , pp. 4838-4841
- Saheer, L.¹ Garner, P.N.² Dines, J.³ Liang, H.⁴

30
- 79959837023
- Formantbased frequency warping for improving speaker adaptation in HMM TTS
- X. Zhuang, Y. Qian, F. K. Soong, Y.-J. Wu, and B. Zhang, "Formantbased frequency warping for improving speaker adaptation in HMM TTS, " in Proc. Interspeech, 2010, pp. 817-820.
- (2010) Proc. Interspeech , pp. 817-820
- Zhuang, X.¹ Qian, Y.² Soong, F.K.³ Wu, Y.-J.⁴ Zhang, B.⁵

31
- 0026400231
- Robust and efficient quantization of speech LSP parameters using structured vector quantizers
- R. Laroia, N. Phamdo, and N. Farvardin, "Robust and efficient quantization of speech LSP parameters using structured vector quantizers, " in Proc. ICASSP, 1991, pp. 641-644.
- (1991) Proc. ICASSP , pp. 641-644
- Laroia, R.¹ Phamdo, N.² Farvardin, N.³

32
- 0035478160
- A new distortion measure for spectral quantization based on the LSP intermodal interlacing property
- M. S. Lee, H. K. Kim, and H. S. Lee, "A new distortion measure for spectral quantization based on the LSP intermodal interlacing property, " Speech Commun., vol. 35, pp. 191-201, 2001.
- (2001) Speech Commun , vol.35 , pp. 191-201
- Lee, M.S.¹ Kim, H.K.² Lee, H.S.³

33
- 77249139677
- An HMM-based Mandarin Chinese text-to-speech system
- Springer LNAI
- Y. Qian, F. K. Soong, Y. N. Chen, and M. Chu, "An HMM-based Mandarin Chinese text-to-speech system, " in Proc. ISCSLP, 2006, Springer LNAI Vol. 4274, pp. 223-232. .
- Proc. ISCSLP 2006 , vol.4274 , pp. 223-232
- Qian, Y.¹ Soong, F.K.² Chen, Y.N.³ Chu, M.⁴

34
- 0003418124
- Acoustic theory of speech production
- Mouton
- G. Fant, Acoustic Theory of Speech Production. The Hague, Netherlands: Mouton, 1960.
- (1960) The Hague Netherlands
- Fant, G.¹

35
- 0003425258
- Englewood Cliffs NJ Prentice-Hall
- L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals. Englewood Cliffs, NJ: Prentice-Hall, 1978.
- (1978) Digital Processing of Speech Signals
- Rabiner, L.R.¹ Schafer, R.W.²

36
- 67650851754
- USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method, " in Proc. Blizzard Challenge Workshop, 2006.
- (2006) Proc. Blizzard Challenge Workshop
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

37
- 0034854702
- Perceptual and objective detection of discontinuities in concatenative speech synthesis
- Y. Stylianou and A. K. Syrdal, "Perceptual and objective detection of discontinuities in concatenative speech synthesis, " in Proc. ICASSP, 2001, pp. 837-840. (Pubitemid 32839049)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.2 , pp. 837-840
- Stylianou, Y.¹ Syrdal, A.K.²

38
- 84871393401
- Optimal coupling of diphones
- A. D. Conkie and S. Isard, "Optimal coupling of diphones, " in Proc. Eurospeech, 1995.
- (1995) Proc. Eurospeech
- Conkie, A.D.¹ Isard, S.²

39
- 84871373443
- HTT-Based TTS, [Online] Available
- Demos of Synthesized Sentences, HTT-Based TTS, [Online]. Available: http://research. microsoft. com/en-us/projects/htt/default. aspx
- Demos of Synthesized Sentences

40
- 84871365228
- [Online]. Available
- Nancy Voice Provided by Lessac Technologies for the Blizzard Challenge 2011, [Online]. Available: http://www. synsig. org/index. php/Blizzard- Challenge-2011
- (2011) Nancy Voice Provided by Lessac Technologies for the Blizzard Challenge

41
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- K. Shinoda and T. Watanable, "MDL-based context-dependent subword modeling for speech recognition, " J. Acoust. Soc. Jpn(E), vol. 21, no. 2, pp. 79-86, 2000. (Pubitemid 30594111)
- (2000) Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) , vol.21 , Issue.2 , pp. 79-86
- Shinoda Koichi¹ Watanabe Takao²

42
- 84871361828
- [Online] Available
- [Online]. Available: http://www. synsig. org/index. php/Blizzard- Challenge-2010

43
- 0009589496
- Pittsburgh, PA: Carnegie Mellon Univ CMU-CS-97-148
- P. Zhan and A. Waibel, Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition. Pittsburgh, PA: Carnegie Mellon Univ., 1997, CMU-CS-97-148.
- (1997) Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition
- Zhan, P.¹ Waibel, A.²

44
- 84863484159
- Kullback-Leibler divergence between two hidden Markov models
- Tech. Rep.
- P. Liu and F. K. Soong, "Kullback-Leibler divergence between two hidden Markov models, " Microsoft Research Asia, 2005, Tech. Rep. .
- (2005) Microsoft Research Asia
- Liu, P.¹ Soong, F.K.²

45
- 84871376257
- [Online] Available
- Cross-Lingual Voice Transformation [Online]. Available: http://research. microsoft. com/en-us/projects/mixedlangtts/default. aspx
- Cross-Lingual Voice Transformation

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.