SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 19, Issue 6, 2011, Pages 1702-1710

Improved Prosody Generation by Maximizing Joint Probability of State and Longer Units

(4) Wu, Zhizheng a Qian, Yao b Soong, Frank K b Gao, Boyang c

a NANYANG TECHNOLOGICAL UNIVERSITY (Singapore)

b MICROSOFT RESEARCH ASIA (China)

c ECOLE CENTRALE DE LYON (France)

Author keywords

Discrete cosine transforms (DCTs); speech synthesis; statistical distributions

Indexed keywords

EID: 85008039410 PISSN: 15587916 EISSN: 15587924 Source Type: Journal
DOI: 10.1109/TASL.2010.2097248 Document Type: Article

Times cited : (37)

References (31)

1
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis,” in Proc. Eurospeech, 1999, pp. 2347–2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

2
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, “Speech parameter generation algorithms for HMM-based speech synthesis,” in Proc. ICASSP, 2000, pp. 1315–1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

3
- 85135109865
- ATR μ -talk speech synthesis system
- Y. Sagisaka, N. Kaiki, N. Iwahashi, and K. Mimura, “ATR μ -talk speech synthesis system,” in Proc. ICSLP, 1992, pp. 483–486.
- (1992) Proc. ICSLP , pp. 483-486
- Sagisaka, Y.¹ Kaiki, N.² Iwahashi, N.³ Mimura, K.⁴

4
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- A. Hunt and A. Black, “Unit selection in a concatenative speech synthesis system using a large speech database,” in Proc. ICASSP, 1996, pp. 373–376.
- (1996) Proc. ICASSP , pp. 373-376
- Hunt, A.¹ Black, A.²

5
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- J. Yamagishi and T. Kobayashi “Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training,” IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533–543, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

6
- 51449114529
- A style control technique for HMM-based expressive speech synthesis
- T. Nose, J. Yamagishi, and T. Kobayashi “A style control technique for HMM-based expressive speech synthesis,” IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406–1413, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Kobayashi, T.³

7
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black “Statistical parametric speech synthesis,” Speech Commun., vol. 51, no. 11, pp. 1039–1064, 2009.
- (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

8
- 33846410497
- Speech parameter generation algorithm considering global variance for HMM-Based speech synthesis
- T. Toda and K. Tokuda, “Speech parameter generation algorithm considering global variance for HMM-Based speech synthesis,” in Proc. Eurospeech, 2005, pp. 373–376.
- (2005) Proc. Eurospeech , pp. 373-376
- Toda, T.¹ Tokuda, K.²

9
- 34547497133
- Combining gaussian mixture model with global variance term to improve the quality of an HMM-based polyglot speech synthesizer
- J. Latorre, K. Iwano, and S. Furui, “Combining gaussian mixture model with global variance term to improve the quality of an HMM-based polyglot speech synthesizer,” in Proc. ICASSP, 2007, pp. 1241–1244.
- (2007) Proc. ICASSP , pp. 1241-1244
- Latorre, J.¹ Iwano, K.² Furui, S.³

10
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- H. Zen, K. Tokuda, and T. Kitamura “Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences,” Comput. Speech Lang., vol. 21, no. 1, pp. 153–173, 2007.
- (2007) Comput. Speech Lang. , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

11
- 34547517493
- Full HMM Training for Minimizing Generation Error in Synthesis
- Y.-J. Wu, R.-H. Wang, and F. K. Soong, “Full HMM Training for Minimizing Generation Error in Synthesis,” in Proc. ICASSP, 2007, pp. 517–520.
- (2007) Proc. ICASSP , pp. 517-520
- Wu, Y.-J.¹ Wang, R.-H.² Soong, F.K.³

12
- 0003684449
- New York: Springer
- T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical Learning, Data Mining, Inference, and Prediction. New York: Springer, 2001.
- (2001) The Elements of Statistical Learning, Data Mining, Inference, and Prediction
- Hastie, T.¹ Tibshirani, R.² Friedman, J.³

13
- 85009257781
- F0 generation for speech synthesis using a multi-tier approach
- X.-J. Sun, “F0 generation for speech synthesis using a multi-tier approach,” in Proc. ICSLP, 2002, pp. 2077–2080.
- (2002) Proc. ICSLP , pp. 2077-2080
- Sun, X.-J.¹

14
- 33646821329
- Additive modeling of english f0 contour for speech synthesis
- S. Sakai, “Additive modeling of english f0 contour for speech synthesis,” in Proc. ICASSP, 2005, pp. 277–280.
- (2005) Proc. ICASSP , pp. 277-280
- Sakai, S.¹

15
- 48549095974
- HMM-based trainable speech synthesis for Chinese
- Y.-J. Wu and R.-H. Wang “HMM-based trainable speech synthesis for Chinese,” J. Chinese Inf. Process., vol. 20, no. 4, pp. 75–81, 2006.
- (2006) J. Chinese Inf. Process. , vol.20 , Issue.4 , pp. 75-81
- Wu, Y.-J.¹ Wang, R.-H.²

16
- 84867200235
- Generating natural F0 trajectory with additive trees
- Y. Qian, H. Fiang, and F. K. Song, “Generating natural F0 trajectory with additive trees,” in Proc. Interspeech, 2008, pp. 2126–2129.
- (2008) Proc. Interspeech , pp. 2126-2129
- Qian, Y.¹ Fiang, H.² Song, F.K.³

17
- 41049090228
- Phone duration modeling using gradient tree boosting
- J. Yamagishi, H. Kawai, and T. Kobayashi “Phone duration modeling using gradient tree boosting,” Speech Commun., vol. 50, no. 5, pp. 405–415, 2008.
- (2008) Speech Commun. , vol.50 , Issue.5 , pp. 405-415
- Yamagishi, J.¹ Kawai, H.² Kobayashi, T.³

18
- 67650851610
- Improved prosody generation by maximizing joint likelihood of state and longer units
- Y. Qian, Z.-Z. Wu, and F. K. Song, “Improved prosody generation by maximizing joint likelihood of state and longer units,” in Proc. ICASSP, 2009, pp. 3781–3784.
- (2009) Proc. ICASSP , pp. 3781-3784
- Qian, Y.¹ Wu, Z.-Z.² Song, F.K.³

19
- 84867194192
- Multilevel parametric-base F0 model for speech synthesis
- J. Latorre and M. Akamine, “Multilevel parametric-base F0 model for speech synthesis,” in Proc. Interspeech, 2008, pp. 2274–2277.
- (2008) Proc. Interspeech , pp. 2274-2277
- Latorre, J.¹ Akamine, M.²

20
- 33846442604
- Investigation of state duration model based on gamma distribution for HMM-based speech synthesis
- Y. Ishimatsu, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Investigation of state duration model based on gamma distribution for HMM-based speech synthesis,” IEICE Tech. Rep., vol. 101, no. 352, pp. 57–62, 2001.
- (2001) IEICE Tech. Rep. , vol.101 , Issue.352 , pp. 57-62
- Ishimatsu, Y.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

21
- 24144455395
- Context-Dependent phoneme duration modeling with Tree-Based state tying
- S. J. Park, M. W. Koo, and C. S. Jhon “Context-Dependent phoneme duration modeling with Tree-Based state tying,” IEICE Trans. Inf. Syst., vol. E88-D, no. 3, pp. 662–666, 2005.
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 662-666
- Park, S.J.¹ Koo, M.W.² Jhon, C.S.³

22
- 84867218426
- Duration refinement by jointly optimizing state and longer unit likelihood
- B.-Y. Gao, Y. Qian, Z.-Z. Wu, and F. K. Soong, “Duration refinement by jointly optimizing state and longer unit likelihood,” in Proc. Interspeech, 2008, pp. 2266–2269.
- (2008) Proc. Interspeech , pp. 2266-2269
- Gao, B.-Y.¹ Qian, Y.² Wu, Z.-Z.³ Soong, F.K.⁴

23
- 51449117929
- Modelling and synthesising F0 contours with the discrete cosine transform
- J. Teutenberg, C. Watson, and P. Riddle, “Modelling and synthesising F0 contours with the discrete cosine transform,” in Proc. ICASSP, 2008, pp. 3973–3976.
- (2008) Proc. ICASSP , pp. 3973-3976
- Teutenberg, J.¹ Watson, C.² Riddle, P.³

24
- 60849112575
- Modeling and generating tone contour with phrase intonation for Mandarin Chinese speech
- Z.-Z. Wu, Y. Qian, F. K. Soong, and B. Zhang, “Modeling and generating tone contour with phrase intonation for Mandarin Chinese speech,” in Proc. ISCSLP, 2008, pp. 121–124.
- (2008) Proc. ISCSLP , pp. 121-124
- Wu, Z.-Z.¹ Qian, Y.² Soong, F.K.³ Zhang, B.⁴

25
- 85093445139
- Duration modeling for HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, “Duration modeling for HMM-based speech synthesis,” in Proc. ICSLP, 1998, pp. 29–32.
- (1998) Proc. ICSLP , pp. 29-32
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

26
- 0036522887
- Multi-space probability distribution HMM
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi “Multi-space probability distribution HMM,” IEICE Trans. Inf. Syst., vol. E85-D(3), pp. 455–464, 2002.
- (2002) IEICE Trans. Inf. Syst. , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

27
- 0003632935
- Ames: Iowa State Univ. Press
- G. W. Snedecor, Statistical Methods. Ames: Iowa State Univ. Press, 1989.
- (1989) Statistical Methods
- Snedecor, G.W.¹

28
- 0032673049
- Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- 4
- H. Kawahara, I. Masuda Katsuse, and A. de Cheveigne “Restructuring speech representations using pitch-adaptive time-frequency smoothing and instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds,” Speech Commun., vol. 27, no. 3–4, pp. 187–207, 1999.
- (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Katsuse, I.M.² de Cheveigne, A.³

29
- 0001455934
- chapter A robust algorithm for pitch tracking (RAPT)
- Amsterdam, The Netherlands: Elservier
- A. D. Talkin, “chapter A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis. Amsterdam, The Netherlands: Elservier, 1995.
- (1995) Speech Coding and Synthesis
- Talkin, A.D.¹

30
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- K. Shinoda and T. Watanable “MDL-based context-dependent subword modeling for speech recognition,” J. Acoust. Soc. Jpn(E), vol. 21, no. 2, pp. 79–86, 2000.
- (2000) J. Acoust. Soc. Jpn(E) , vol.21 , Issue.2 , pp. 79-86
- Shinoda, K.¹ Watanable, T.²

31
- 67650851754
- USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, “USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method,” in Proc. Blizzard Challenge 2006 Workshop, 2006.
- (2006) Proc. Blizzard Challenge 2006 Workshop
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.