SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 1, 2013, Pages 207-219

Articulatory control of HMM-based parametric speech synthesis using feature-space-switched multiple regression

(3) Ling, Zhen Hua a Richmond, Korin b Yamagishi, Junichi b

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Articulatory features; Gaussian mixture model; multiple regression hidden Markov model; speech synthesis

Indexed keywords

ACOUSTICS; GAUSSIAN DISTRIBUTION; LINGUISTICS; MATHEMATICAL TRANSFORMATIONS; REGRESSION ANALYSIS; SPEECH SYNTHESIS; TRELLIS CODES;

ARTICULATORY FEATURES; ARTICULATORY SPACE; ELECTROMAGNETIC ARTICULOGRAPHY; EXPLANATORY VARIABLES; GAUSSIAN MIXTURE MODEL; MULTIPLE REGRESSIONS; REGRESSION MATRICES; SPEECH SYNTHESIZER;

HIDDEN MARKOV MODELS;

EID: 84869440340 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2215600 Document Type: Article

Times cited : (52)

References (34)

1
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

2
- 33645758767
- HMM-based approach to multilingual speech synthesis
- S.Narayanan and A.Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall
- K. Tokuda, H. Zen, and A.W. Black, "HMM-based approach to multilingual speech synthesis," in Text to Speech Synthesis: New Paradigms and Advances, S.Narayanan and A.Alwan, Eds. Upper Saddle River, NJ: Prentice-Hall, 2004.
- (2004) Text to Speech Synthesis: New Paradigms and Advances
- Tokuda, K.¹ Zen, H.² Black, A.W.³

3
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, 2000, vol. 3, pp. 1315-1318.
- (2000) Proc. ICASSP , vol.3 , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

4
- 33846405723
- Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005
- DOI 10.1093/ietisy/e90-1.1.325
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda, "Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007. (Pubitemid 46145336)
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

5
- 34547496747
- USTC system for Blizzard Challenge 2006: An improvedHMM-based speech synthesis method
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC system for Blizzard Challenge 2006: An improvedHMM-based speech synthesis method," in Blizzard Challenge Workshop, 2006.
- (2006) Blizzard Challenge Workshop
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

6
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- DOI 10.1093/ietisy/e90-d.2.533
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training," IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533-543, 2007. (Pubitemid 46279829)
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

7
- 85009257840
- Eigenvoices for HMM-based speech synthesis
- K. Shichiri, A. Sawabe, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Eigenvoices for HMM-based speech synthesis," in Proc. ICSLP, 2002, pp. 1269-1272.
- (2002) Proc. ICSLP , pp. 1269-1272
- Shichiri, K.¹ Sawabe, A.² Tokuda, K.³ Masuko, T.⁴ Kobayashi, T.⁵ Kitamura, T.⁶

8
- 24144497811
- Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis
- J. Yamagishi, K. Onishi, T. Masuko, and T. Kobayashi, "Acoustic modeling of speaking styles and emotional expressions in HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E88-D, no. 3, pp. 503-509, 2005.
- (2005) IEICE Trans. Inf. Syst. , vol.E88-D , Issue.3 , pp. 503-509
- Yamagishi, J.¹ Onishi, K.² Masuko, T.³ Kobayashi, T.⁴

9
- 29144475179
- Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing
- DOI 10.1093/ietisy/e88-d.11.2484
- M. Tachibana, J. Yamagishi, T. Masuko, and T. Kobayashi, "Speech synthesis with various emotional expressions and speaking styles by style interpolation and morphing," IEICE Trans. Inf. Syst., vol. E88-D, no. 11, pp. 2484-2491, 2005. (Pubitemid 41816793)
- (2005) IEICE Transactions on Information and Systems , vol.E88-D , Issue.11 , pp. 2484-2491
- Tachibana, M.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

10
- 51449114529
- A style control technique for HMM-based expressive speech synthesis
- T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

11
- 84867197177
- Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge
- Z.-H. Ling, K. Richmond, J.Yamagishi, and R.-H.Wang, "Articulatory control of HMM-based parametric speech synthesis driven by phonetic knowledge," in Proc. Interspeech '08, 2008, pp. 573-576.
- (2008) Proc. Interspeech '08 , pp. 573-576
- Ling, Z.-H.¹ Richmond, K.² Yamagishi, J.³ Wang, R.-H.⁴

12
- 68149157315
- Integrating articulatory features into HMM-based parametric speech synthesis
- Aug.
- Z.-H. Ling, K. Richmond, J. Yamagishi, and R.-H. Wang, "Integrating articulatory features into HMM-based parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 6, pp. 1171-1185, Aug. 2009.
- (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.6 , pp. 1171-1185
- Ling, Z.-H.¹ Richmond, K.² Yamagishi, J.³ Wang, R.-H.⁴

13
- 0033693063
- Conversational speech recognition using acoustic and articulatory input
- K. Kirchhoff, G. Fink, and G. Sagerer, "Conversational speech recognition using acoustic and articulatory input," in Proc. ICASSP, 2000, pp. 1435-1438.
- (2000) Proc. ICASSP , pp. 1435-1438
- Kirchhoff, K.¹ Fink, G.² Sagerer, G.³

14
- 84867602871
- Articulatory features for expressive speech synthesis
- A. Black, T. Bunnell, Y. Dou, P. Muthukumar, F. Metze, D. Perry, T. Polzehl, K. Prahallad, S. Steidl, and C. Vaughn, "Articulatory features for expressive speech synthesis," in Proc. ICASSP, 2012, pp. 4005-4008.
- (2012) Proc. ICASSP , pp. 4005-4008
- Black, A.¹ Bunnell, T.² Dou, Y.³ Muthukumar, P.⁴ Metze, F.⁵ Perry, D.⁶ Polzehl, T.⁷ Prahallad, K.⁸ Steidl, S.⁹ Vaughn, C.¹⁰

15
- 0023198186
- Electromagnetic articulography: Use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract
- P. W. Schönle, K. Gräbe, P. Wenig, J. Höhne, J. Schrader, and B. Conrad, "Electromagnetic articulography: Use of alternating magnetic fields for tracking movements of multiple points inside and outside the vocal tract," Brain Lang., vol. 31, pp. 26-35, 1987.
- (1987) Brain Lang. , vol.31 , pp. 26-35
- Schönle, P.W.¹ Gräbe, K.² Wenig, P.³ Höhne, J.⁴ Schrader, J.⁵ Conrad, B.⁶

16
- 0023135474
- Application of MRI to the analysis of speech production
- DOI 10.1016/0730-725X(87)90477-2
- T. Baer, J. C. Gore, S. Boyce, and P. W. Nye, "Application of MRI to the analysis of speech production," Magn. Resonance Imag., vol. 5, pp. 1-7, 1987. (Pubitemid 17059052)
- (1987) Magnetic Resonance Imaging , vol.5 , Issue.1 , pp. 1-7
- Baer, T.¹ Gore, J.C.² Boyce, S.³ Nye, P.W.⁴

17
- 0032293271
- Extraction and tracking of the tongue surface from ultrasound image sequences
- Y. Akgul, C. Kambhamettu, and M. Stone, "Extraction and tracking of the tongue surface from ultrasound image sequences," IEEE Comp. Vis. Pattern Recogn., vol. 124, pp. 298-303, 1998.
- (1998) IEEE Comp. Vis. Pattern Recogn. , vol.124 , pp. 298-303
- Akgul, Y.¹ Kambhamettu, C.² Stone, M.³

18
- 0034855363
- Multiple-regression hidden markov model
- K. Fujinaga, M. Nakai, H. Shimodaira, and S. Sagayama, "Multiple-regression hidden Markov model," in Proc. ICASSP, 2001, pp. 513-516. (Pubitemid 32839299)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 513-516
- Fujinaga, K.¹ Nakai, M.² Shimodaira, H.³ Sagayama, S.⁴

19
- 70349205575
- Emotional speech recognition based on style estimation and adaptation with multipleregression HMM
- Y. Ijima,M. Tachibana, T. Nose, and T. Kobayashi, "Emotional speech recognition based on style estimation and adaptation with multipleregression HMM," in Proc. ICASSP, 2009, pp. 4157-4160.
- (2009) Proc. ICASSP , pp. 4157-4160
- Ijima, Y.¹ Tachibana, M.² Nose, T.³ Kobayashi, T.⁴

20
- 4344601826
- Quantitative evaluation for skill controller based on comparison with human demonstration
- Jul.
- T. Nozaki, T. Suzuki, S. Okuma, K. Itabashi, and F. Fujiwara, "Quantitative evaluation for skill controller based on comparison with human demonstration," IEEE Trans. Control Syst. Technol., vol. 12, no. 4, pp. 609-619, Jul. 2004.
- (2004) IEEE Trans. Control Syst. Technol. , vol.12 , Issue.4 , pp. 609-619
- Nozaki, T.¹ Suzuki, T.² Okuma, S.³ Itabashi, K.⁴ Fujiwara, F.⁵

21
- 33646795077
- A quantitative model for formant dynamics and contextually assimilated reduction in fluent speech
- L. Deng, D. Yu, and A. Acero, "A quantitative model for formant dynamics and contextually assimilated reduction in fluent speech," in Proc. Interspeech, 2004, pp. 719-722.
- (2004) Proc. Interspeech , pp. 719-722
- Deng, L.¹ Yu, D.² Acero, A.³

22
- 79956259003
- Model-based reproduction of articulatory trajectories for consonant-vowel sequences
- Jul.
- P. Birkholz, B. Kroger, and C. Neuschaefer-Rube, "Model-based reproduction of articulatory trajectories for consonant-vowel sequences," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1422-1433, Jul. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1422-1433
- Birkholz, P.¹ Kroger, B.² Neuschaefer-Rube, C.³

23
- 2142659020
- Estimation of articulatory movements from speech acoustics using an HMM-based speech production model
- Mar.
- S. Hiroya and M. Honda, "Estimation of articulatory movements from speech acoustics using an HMM-based speech production model," IEEE Trans. Speech Audio Process., vol. 12, no. 2, pp. 175-185, Mar. 2004.
- (2004) IEEE Trans. Speech Audio Process. , vol.12 , Issue.2 , pp. 175-185
- Hiroya, S.¹ Honda, M.²

24
- 84946757881
- Cross-stream observation dependencies for multi-stream speech recognition
- Q. Cetin and M. Ostendorf, "Cross-stream observation dependencies for multi-stream speech recognition," in Proc. Eurospeech, 2003, pp. 2517-2520.
- (2003) Proc. Eurospeech , pp. 2517-2520
- Cetin, Q.¹ Ostendorf, M.²

25
- 0034227757
- Cluster adaptive training of hidden Markov model
- Jul.
- M. Gales, "Cluster adaptive training of hidden Markov model," IEEE Trans. Audio, Speech, Lang. Process., vol. 8, no. 4, pp. 417-428, Jul. 2000.
- (2000) IEEE Trans. Audio, Speech, Lang. Process. , vol.8 , Issue.4 , pp. 417-428
- Gales, M.¹

26
- 0036522887
- Multi-space probability distribution HMM
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Multi-space probability distribution HMM(invited paper)," IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455-464, 2002. (Pubitemid 35353984)
- (2002) IEICE Transactions on Information and Systems , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

27
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Japan (E), vol. 21, no. 2, pp. 79-86, 2000. (Pubitemid 30594111)
- (2000) Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) , vol.21 , Issue.2 , pp. 79-86
- Shinoda Koichi¹ Watanabe Takao²

28
- 78049502526
- The subspace Gaussian mixture model - A structured model for speech recognition
- Apr.
- D. Povey, L. Burget, M. Agarwal, P. Akyazi, F. Kai, A. Ghoshal, O. Glembek, N. Goel, M. Karafiát, A. Rastrow, R. C. Rose, P. Schwarz, and S. Thomas, "The subspace Gaussian mixture model-a structured model for speech recognition," Comput. Speech Lang., vol. 25, no. 2, pp. 404-439, Apr. 2011.
- (2011) Comput. Speech Lang. , vol.25 , Issue.2 , pp. 404-439
- Povey, D.¹ Burget, L.² Agarwal, M.³ Akyazi, P.⁴ Kai, F.⁵ Ghoshal, A.⁶ Glembek, O.⁷ Goel, N.⁸ Karafiát, M.⁹ Rastrow, A.¹⁰ Rose, R.C.¹¹ Schwarz, P.¹² Thomas, S.¹³

29
- 38649140222
- Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model
- DOI 10.1016/j.specom.2007.09.001, PII S0167639307001495
- T. Toda,W. A. Black, and K. Tokuda, "Statistical mapping between articulatory movements and acoustic spectrum using a Gaussian mixture model," Speech Commun., vol. 50, pp. 215-227, 2008. (Pubitemid 351172471)
- (2008) Speech Communication , vol.50 , Issue.3 , pp. 215-227
- Toda, T.¹ Black, A.W.² Tokuda, K.³

30
- 84865778430
- Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus
- K. Richmond, P. Hoole, and S. King, "Announcing the electromagnetic articulography (day 1) subset of the mngu0 articulatory corpus," in Proc. Interspeech, 2011, pp. 1505-1508.
- (2011) Proc. Interspeech , pp. 1505-1508
- Richmond, K.¹ Hoole, P.² King, S.³

31
- 0032673049
- Restructuring speech representations using pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I.Masuda-Katsuse, and A. deCheveigne, "Restructuring speech representations using pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Decheveigne, A.³

32
- 77955426622
- An analysis of HMMbased prediction of articulatory movements
- Z.-H. Ling, K. Richmond, and J. Yamagishi, "An analysis of HMMbased prediction of articulatory movements," Speech Commun., vol. 52, no. 10, pp. 834-846, 2010.
- (2010) Speech Commun. , vol.52 , Issue.10 , pp. 834-846
- Ling, Z.-H.¹ Richmond, K.² Yamagishi, J.³

33
- 84876492203
- Target-filtering model based articulatory movement prediction for articulatory control of HMM-based speech synthesis
- accepted for publication
- M.-Q. Cai, Z.-H. Ling, and L.-R. Dai, "Target-filtering model based articulatory movement prediction for articulatory control of HMM-based speech synthesis," in Proc. 11th Int. Conf. Signal Process., 2012, accepted for publication.
- (2012) Proc. 11th Int. Conf. Signal Process.
- Cai, M.-Q.¹ Ling, Z.-H.² Dai, L.-R.³

34
- 84865795806
- Feature-space transform tying in unified acoustic-articulatory modelling for articulatory control of HMM-based speech synthesis
- Z.-H. Ling, K. Richmond, and J. Yamagishi, "Feature-space transform tying in unified acoustic-articulatory modelling for articulatory control of HMM-based speech synthesis," in Proc. Interspeech, 2011, pp. 117-120.
- (2011) Proc. Interspeech , pp. 117-120
- Ling, Z.-H.¹ Richmond, K.² Yamagishi, J.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.