SCOPUS 정보 검색 플랫폼

IEEE Journal on Selected Topics in Signal Processing

Volumn 8, Issue 2, 2014, Pages 239-250

Parameter generation methods with rich context models for high-quality and flexible text-to-speech Synthesis

(6) Takamichi, Shinnosuke a Toda, Tomoki a Shiga, Yoshinori b Sakti, Sakriani a Neubig, Graham a Nakamura, Satoshi a

a NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

b NATIONAL INSTITUTE OF INFORMATION AND COMMUNICATIONS TECHNOLOGY (Japan)

Author keywords

GMM; HMM based speech synthesis; over smoothing; parameter generation; rich context model

Indexed keywords

CONTEXT MODELING; GMM; HMM-BASED SPEECH SYNTHESIS; OVER-SMOOTHING; PARAMETER GENERATION;

ALGORITHMS; COMMUNICATION CHANNELS (INFORMATION THEORY); HIDDEN MARKOV MODELS; PROBABILITY DISTRIBUTIONS; SPEECH SYNTHESIS;

ITERATIVE METHODS;

EID: 84897862522 PISSN: 19324553 EISSN: None Source Type: Journal
DOI: 10.1109/JSTSP.2013.2288599 Document Type: Article

Times cited : (15)

References (28)

1
- 0023756465
- Speech synthesis by rule using an optimal selection of non-uniform synthesis units
- Y. Sagisaka, "Speech synthesis by rule using an optimal selection of non-uniform synthesis units," in Proc. ICASSP, New York, NY, USA, Apr. 1988, pp. 679-682 (Pubitemid 18666106)
- (1988) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , pp. 679-682
- Sagisaka Yoshinori¹

2
- 0027699809
- Speech segment selection for concatenative synthesis based on spectral distortion minimization
- N. Iwahashi, N. Kaiki, and Y. Sagisaka, "Speech segment selection for concatenative synthesis based on spectral distortion minimization," IEICE Trans., Fundamentals, vol. E76-A, no. 11, pp. 1942-1948, 1993
- (1993) IEICE Trans., Fundamentals , vol.76 , Issue.11 , pp. 1942-1948
- Iwahashi, N.¹ Kaiki, N.² Sagisaka, Y.³

3
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- May
- A. J.Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. ICASSP, Atlanta, GA, USA, May 1996, pp. 373-376
- (1996) Proc. ICASSP, Atlanta, GA, USA , pp. 373-376
- Hunt, A.J.¹ Black, A.²

4
- 85001632375
- Corpus-based techniques in the AT&T NextGen synthesis system
- Oct
- A. K. Syrdal, C.W.Wightman, A. Conkie, Y. Stylianou, M. Beutnagel, J. Schroeter, V. Strom, K.-S. Lee, and M. Makashay, "Corpus-based techniques in the AT&T NextGen synthesis system," in Proc. ICSLP, Beijing, China, Oct. 2000, pp. 410-415
- (2000) Proc. ICSLP, Beijing, China , pp. 410-415
- Syrdal, A.K.¹ Wightman, C.W.² Conkie, A.³ Stylianou, Y.⁴ Beutnagel, M.⁵ Schroeter, J.⁶ Strom, V.⁷ Lee, K.-S.⁸ Makashay, M.⁹

5
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

6
- 0034230270
- Speaker interpolation for HMM-based speech synthesis system
- T. Yoshimura, T. Masuko, K. Tokuda, T. Kobayashi, and T. Kitamura, "Speaker interpolation for HMM-based speech synthesis system," J. Acoust. Soc. Jpn. (E), vol. 21, no. 4, pp. 199-206, 2000
- (2000) J. Acoust. Soc. Jpn. (E) , vol.21 , Issue.4 , pp. 199-206
- Yoshimura, T.¹ Masuko, T.² Tokuda, K.³ Kobayashi, T.⁴ Kitamura, T.⁵

7
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- DOI 10.1093/ietisy/e90-d.2.533
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training," IEICE Trans., Inf. Syst., vol. E90-D, no. 2, pp. 533-543, 2007 (Pubitemid 46279829)
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

8
- 51449114529
- A style control technique forHMM-based expressive speech synthesis
- T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, "A style control technique forHMM-based expressive speech synthesis," IEICE Trans., Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007
- (2007) IEICE Trans., Inf. Syst , vol.90 , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

9
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans., vol. E90-D, no. 5, pp. 816-824, 2007
- (2007) IEICE Trans , vol.90 , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

10
- 84878419996
- The blizzard challenge 2011
- Turin, Italy, Sep
- S. King and V. Karaiskos, "The blizzard challenge 2011," in Proc. Blizzard Challenge Workshop, Turin, Italy, Sep. 2011
- (2011) Proc. Blizzard Challenge Workshop
- King, S.¹ Karaiskos, V.²

11
- 67650816595
- The USTC and iflytek speech synthesis systems for blizzard challenge 2007
- Aug
- Z. Ling, L. Qin, H. Lu, Y. Gao, L. Dai, R. Wang, Y. Jiang, Z. Zhao, J. Yang, J. Chen, and G. Hu, "The USTC and iflytek speech synthesis systems for blizzard challenge 2007," in Proc. Blizzard Challenge Workshop, Bonn, Germany, Aug. 2007
- (2007) Proc. Blizzard Challenge Workshop, Bonn, Germany
- Ling, Z.¹ Qin, L.² Lu, H.³ Gao, Y.⁴ Dai, L.⁵ Wang, R.⁶ Jiang, Y.⁷ Zhao, Z.⁸ Yang, J.⁹ Chen, J.¹⁰ Hu, G.¹¹

12
- 70450161678
- Rich context modeling for high quality HMM-based TTS
- Sep
- Z. Yan, Q. Yao, and S. K. Frank, "Rich context modeling for high quality HMM-based TTS," in Proc. INTERSPEECH, Brighton, U.K., Sep. 2009, pp. 1755-1758
- (2009) Proc. INTERSPEECH, Brighton, U.K , pp. 1755-1758
- Yan, Z.¹ Yao, Q.² Frank, S.K.³

13
- 79959852154
- An HMM trajectory tiling (HTT) approach to high quality TTS
- Y. Qian, Z. Yan, Y. Wu, and F. K. Soong, "An HMM trajectory tiling (HTT) approach to high quality TTS," in Proc. INTERSPEECH, Chiba, Japan, Sept. 2010, pp. 422-425
- (2010) Proc. INTERSPEECH, Chiba, Japan, Sept , pp. 422-425
- Qian, Y.¹ Yan, Z.² Wu, Y.³ Soong, F.K.⁴

14
- 4544270859
- Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis
- May
- T. Toda, H. Kawai, and M. Tsuzaki, "Optimizing sub-cost functions for segment selection based on perceptual evaluations in concatenative speech synthesis," in Proc. ICASSP,Montreal,QC, Canada, May 2004, pp. 657-660
- (2004) Proc. ICASSP,Montreal,QC, Canada , pp. 657-660
- Toda, T.¹ Kawai, H.² Tsuzaki, M.³

15
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
- Apr
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. EUROSPEECH, Budapest, Hungary, Apr. 1999, pp. 2347-2350
- (1999) Proc. EUROSPEECH, Budapest, Hungary , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

16
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Jpn.(E), vol. 21, no. 2, pp. 79-86, 2000 (Pubitemid 30594111)
- (2000) Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) , vol.21 , Issue.2 , pp. 79-86
- Shinoda Koichi¹ Watanabe Takao²

17
- 0036522887
- Multi-space probability distribution HMM
- K. Tokuda, T. Masuko, B. Miyazaki, and T. Kobayashi, "Multi-space probability distribution HMM," IEICE Trans., Inf. Syst., vol. E85-D, no. 3, pp. 455-464, 2002 (Pubitemid 35353984)
- (2002) IEICE Transactions on Information and Systems , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

18
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- Jun
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, Istanbul, Turkey, Jun. 2000, pp. 1315-1318
- (2000) Proc. ICASSP, Istanbul, Turkey , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

19
- 85009069251
- Decision tree backing-off in HMM-based speech synthesis
- Oct
- S. Kataoka, N. Mizutani, K. Tokuda, and T. Kitamura, "Decision tree backing-off in HMM-based speech synthesis," in Proc. INTERSPEECH, Jeju, Korea, Oct. 2004, vol. 2, pp. 1205-1208
- (2004) Proc. INTERSPEECH, Jeju, Korea , vol.2 , pp. 1205-1208
- Kataoka, S.¹ Mizutani, N.² Tokuda, K.³ Kitamura, T.⁴

20
- 34547503417
- HMM-based unit selection using frame sized speech segments
- Sep
- Z. Ling and R. Wang, "HMM-based unit selection using frame sized speech segments," in Proc. INTERSPEECH, Pittsburgh, PA, USA, Sep. 2013
- (2013) Proc. INTERSPEECH, Pittsburgh, PA, USA
- Ling, Z.¹ Wang, R.²

21
- 29144484191
- Concatenative speech synthesis based on the plural unit selection and fusion method
- DOI 10.1093/ietisy/e88-d.11.2565
- T.Mizutani and T. Kagoshima, "Concatenative speech synthesis based on the plural unit selection and fusion method," IEICE Trans. Inf. Syst., vol. E88-D, no. 11, pp. 2565-2572, 2005 (Pubitemid 41816802)
- (2005) IEICE Transactions on Information and Systems , vol.E88-D , Issue.11 , pp. 2565-2572
- Mizutani, T.¹ Kagoshima, T.²

22
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-185, 1995
- (1995) Comput. Speech Lang , vol.9 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

23
- 35549000218
- Cross-validation and aggregated EM training for robust parameter estimation
- DOI 10.1016/j.csl.2007.07.005, PII S0885230807000472
- T. Shinozaki and M. Ostendorf, "Cross-validation and aggregated EM training for robust parameter estimation," Comput. Speech Lang., vol. 22, pp. 185-195, 2008 (Pubitemid 350016715)
- (2008) Computer Speech and Language , vol.22 , Issue.2 , pp. 185-195
- Shinozaki, T.¹ Ostendorf, M.²

24
- 44449177634
- Hidden semimarkovmodel based speech synthesis system
- H. Zen, K. Tokuda, T. K. T. Masuko, and T. Kitamura, "Hidden semimarkovmodel based speech synthesis system," IEICE Trans., Inf. Syst., vol. E90-D, no. 5, pp. 825-834, 2007
- (2007) IEICE Trans., Inf. Syst , vol.90 , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.K.T.³ Kitamura, T.⁴

25
- 6644226630
- A large-scale Japanese speech database
- Nov
- Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuawhara, "A large-scale Japanese speech database," in Proc. ICSLP'90, Kobe, Japan, Nov. 1990, pp. 1089-1092
- (1990) Proc. ICSLP'90, Kobe, Japan , pp. 1089-1092
- Sagisaka, Y.¹ Takeda, K.² Abe, M.³ Katagiri, S.⁴ Umeda, T.⁵ Kuawhara, H.⁶

26
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- Sep
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT"," in Proc. MAVEBA ' 01, Florence, Italy, Sep. 2001, pp. 1-6
- (2001) Proc. MAVEBA ' 01, Florence, Italy , pp. 1-6
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

27
- 44949143155
- Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
- Sep
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation," in Proc. INTERSPEECH, Pittsburgh, PA, USA, Sep. 2006, pp. 2266-2269
- (2006) Proc. INTERSPEECH, Pittsburgh, PA, USA , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

28
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. D. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.D.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.