SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 3, 2013, Pages 587-597

Autoregressive models for statistical parametric speech synthesis

(3) Shannon, Matt a Zen, Heiga b,c Byrne, William a

a UNIVERSITY OF CAMBRIDGE (United Kingdom)

b TOSHIBA CORPORATION (Japan)

c GOOGLE (United Kingdom)

Author keywords

Acoustic modeling; autoregressive hidden Markov model; autoregressive processes; hidden Markov models (HMMs); speech; statistical parametric speech synthesis

Indexed keywords

ACOUSTIC MODELING; AUTO REGRESSIVE MODELS; AUTO REGRESSIVE PROCESS; AUTO-REGRESSIVE; EXPECTATION MAXIMIZATION; GENERATION ALGORITHM; HIDDEN MARKOV MODELS (HMMS); HIGH QUALITY; LOW LATENCY; NUMBER OF STATE; OBJECTIVE EVALUATION; SYNTHESIS ALGORITHMS;

ALGORITHMS; HIDDEN MARKOV MODELS; SPEECH; SPEECH SYNTHESIS; TRAJECTORIES;

PARAMETER ESTIMATION;

EID: 84872190545 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2012.2227740 Document Type: Article

Times cited : (61)

References (40)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP '00, 2000, pp. 1315-1318.
- (2000) Proc. ICASSP '00 , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

3
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- DOI 10.1016/j.csl.2006.01.002, PII S0885230806000052
- H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007. (Pubitemid 44537647)
- (2007) Computer Speech and Language , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

4
- 0023211846
- Explicit time correlation in hidden markov models for speech recognition
- C. Wellekens, "Explicit time correlation in hidden Markov models for speech recognition," in Proc. ICASSP '87, 1987, vol. 12, pp. 384-386. (Pubitemid 17596360)
- (1987) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , pp. 384-386
- Wellekens, C.J.¹

5
- 0025388113
- Linear predictive HMM for vector-valued observations with applications to speech recognition
- DOI 10.1109/29.103057
- P. Kenny, M. Lennig, and P. Mermelstein, "A linear predictive HMM for vector-valued observations with applications to speech recognition," IEEE Trans. Acoust., Speech, Signal Process., vol. 38, no. 2, pp. 220-225, Feb. 1990. (Pubitemid 20666463)
- (1990) IEEE Transactions on Acoustics, Speech, and Signal Processing , vol.38 , Issue.2 , pp. 220-225
- Kenny Patrick¹ Lennig Matthew² Mermelstein Paul³

6
- 85009267646
- Hidden Markov models using vector linear prediction and discriminative output distributions
- P. C. Woodland, "Hidden Markov models using vector linear prediction and discriminative output distributions," in Proc. ICASSP '92, 1992, pp. 509-512.
- (1992) Proc. ICASSP '92 , pp. 509-512
- Woodland, P.C.¹

7
- 0037841402
- Graphical models and automatic speech recognition
- M. Johnson, S. P. Khudanpur M. Ostendorf, and R. Rosenfeld, Eds. New York: Springer-Verlag
- J. Bilmes, "Graphical models and automatic speech recognition," in Mathematical Foundations of Speech and Language Processing, M. Johnson, S. P. Khudanpur, M. Ostendorf, and R. Rosenfeld, Eds. New York: Springer-Verlag, 2004.
- (2004) Mathematical Foundations of Speech and Language Processing
- Bilmes, J.¹

8
- 85009236696
- Maximum mutual information training of hidden Markov models with vector linear predictors
- K. K. Chin and P. C. Woodland, "Maximum mutual information training of hidden Markov models with vector linear predictors," in Proc. Interspeech '02, 2002, pp. 997-1000.
- (2002) Proc. Interspeech '02 , pp. 997-1000
- Chin, K.K.¹ Woodland, P.C.²

9
- 84985742249
- Linear predictive hidden Markov models and the speech signal
- A. Poritz, "Linear predictive hidden Markov models and the speech signal," in Proc. ICASSP '82, 1982, vol. 7, pp. 1291-1294.
- (1982) Proc. ICASSP '82 , vol.7 , pp. 1291-1294
- Poritz, A.¹

10
- 0022270364
- Mixture autoregressive hidden Markov models for speech signals
- Dec
- B. H. Juang and L. Rabiner, "Mixture autoregressive hidden Markov models for speech signals," IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. 6, pp. 1404-1413, Dec. 1985.
- (1985) IEEE Trans. Acoust., Speech, Signal Process , vol.33 , Issue.6 , pp. 1404-1413
- Juang, B.H.¹ Rabiner, L.²

11
- 70450175584
- Autoregressive HMMs for speech synthesis
- M. Shannon and W. Byrne, "Autoregressive HMMs for speech synthesis," in Proc. Interspeech '09, 2009, pp. 400-403.
- (2009) Proc. Interspeech '09 , pp. 400-403
- Shannon, M.¹ Byrne, W.²

12
- 79959849719
- Autoregressive clustering for HMM speech synthesis
- M. Shannon and W. Byrne, "Autoregressive clustering for HMM speech synthesis," in Proc. Interspeech '10, 2010, pp. 829-832.
- (2010) Proc. Interspeech '10 , pp. 829-832
- Shannon, M.¹ Byrne, W.²

13
- 84865801900
- The effect of using normalized models in statistical speech synthesis
- M. Shannon, H. Zen, and W. Byrne, "The effect of using normalized models in statistical speech synthesis," in Proc. Interspeech '11, 2011, pp. 121-124.
- (2011) Proc. Interspeech '11 , pp. 121-124
- Shannon, M.¹ Zen, H.² Byrne, W.³

14
- 84867625378
- AutoregressiveHMM speech synthesis
- C. Quillen, "AutoregressiveHMM speech synthesis," in Proc. ICASSP '12, 2012, pp. 4021-4024.
- Proc. ICASSP '12 , vol.2012 , pp. 4021-4024
- Quillen, C.¹

15
- 84872175773
- [Online] accessed 21 March 2012
- EMIME consortium, Tools [Online]. Available: http://www.emime. org/participate/tools, accessed 21 March, 2012
- EMIME Consortium, Tools

16
- 84872185805
- HTS working group, HMM-Based Speech Synthesis System (HTS) [Online]. Available accessed 21March 2012
- HTS working group, HMM-Based Speech Synthesis System (HTS) [Online]. Available: http://hts.sp.nitech.ac.jp/accessed 21March, 2012

17
- 85093445139
- Duration modeling for HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Duration modeling for HMM-based speech synthesis," in Proc. ICSLP '98, 1998.
- (1998) Proc. ICSLP '98
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

18
- 44449177634
- A hidden semi-Markov model-based speech synthesis system
- H. Zen, K. Tokuda, T. Masuko, T. Kobayasih, and T. Kitamura, "A hidden semi-Markov model-based speech synthesis system," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 825-834, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.³ Kobayasih, T.⁴ Kitamura, T.⁵

19
- 0037278070
- An efficient forward-backward algorithm for an explicit-duration hidden Markov model
- Jan
- S. Z. Yu and H. Kobayashi, "An efficient forward-backward algorithm for an explicit-duration hidden Markov model," IEEE Signal Process. Lett., vol. 10, no. 1, pp. 11-14, Jan. 2003.
- (2003) IEEE Signal Process. Lett , vol.10 , Issue.1 , pp. 11-14
- Yu, S.Z.¹ Kobayashi, H.²

20
- 69849091128
- Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm
- H. Zen, "Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm," Nagoya Inst. of Technol., Tech. Rep. TR-SP-0001, 2007.
- (2007) Nagoya Inst. of Technol., Tech. Rep. TR-SP-0001
- Zen, H.¹

21
- 79959847165
- Univ. of Cambridge, Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR.629 [Online]
- M. Shannon andW. Byrne, A formulation of the autoregressive HMM for speech synthesis Dept. of Eng., Univ. of Cambridge, Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR.629, 2009 [Online]. Available: http://mi.eng.cam.ac.uk/sms46/ papers/shannon2009fah.pdf
- (2009) A Formulation of the Autoregressive HMM for Speech Synthesis Dept. of Eng.
- Shannon, M.¹ Byrne, W.²

22
- 0002144369
- Tree-based state tying for high accuracy acoustic modelling
- S. Young, J. Odell, and P. Woodland, "Tree-based state tying for high accuracy acoustic modelling," in Proc. ARPA Human Lang. Technol. Workshop, 1994, pp. 307-312.
- (1994) Proc. ARPA Human Lang. Technol. Workshop , pp. 307-312
- Young, S.¹ Odell, J.² Woodland, P.³

23
- 0033906251
- MDL-based context-dependent subword modeling for speech recognition
- K. Shinoda and T.Watanabe, "MDL-based context-dependent subword modeling for speech recognition," J. Acoust. Soc. Jpn. (E), vol. 21, no. 2, pp. 79-86, 2000. (Pubitemid 30594111)
- (2000) Journal of the Acoustical Society of Japan (E) (English translation of Nippon Onkyo Gakkaishi) , vol.21 , Issue.2 , pp. 79-86
- Shinoda Koichi¹ Watanabe Takao²

24
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

25
- 0035483059
- Vector quantization of speech spectral parameters using statistics of static and dynamic features
- Autonomous Decentralized Systems and Systems Assurance
- K. Koishida, K. Tokuda, T. Masuko, and T. Kobayashi, "Vector quantization of speech spectral parameters using statistics of static and dynamic features," IEICE Trans. Inf. Syst., vol. E84-D, no. 10, pp. 1427-1434, 2001. (Pubitemid 33099747)
- (2001) IEICE Transactions on Information and Systems , vol.E84-D , Issue.10 , pp. 1427-1434
- Koishida, K.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴

26
- 84867211725
- Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory," in Proc. Interspeech '08, 2008, pp. 1076-1079.
- (2008) Proc. Interspeech '08 , pp. 1076-1079
- Muramatsu, T.¹ Ohtani, Y.² Toda, T.³ Saruwatari, H.⁴ Shikano, K.⁵

27
- 84867619546
- Improved minimum converted trajectory error training for real-time speech-to-lips conversion
- W. Han, L. Wang, F. Soong, and B. Yuan, "Improved minimum converted trajectory error training for real-time speech-to-lips conversion," in Proc. ICASSP '12, 2012, pp. 4513-4516.
- (2012) Proc. ICASSP '12 , pp. 4513-4516
- Han, W.¹ Wang, L.² Soong, F.³ Yuan, B.⁴

28
- 78049361102
- Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis," IEICE Trans. Inf. Syst. (Jpn. Ed.), vol. J87-D-II, no. 8, pp. 1565-1571, 2004.
- (2004) IEICE Trans. Inf. Syst. (Jpn. Ed.) , vol.J87-D-II , Issue.8 , pp. 1565-1571
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

29
- 67650851754
- USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method
- Z.-H. Ling, Y.-J. Wu, Y.-P. Wang, L. Qin, and R.-H. Wang, "USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method," in Proc. Blizzard Challenge Workshop '06, 2006.
- (2006) Proc. Blizzard Challenge Workshop '06
- Ling, Z.-H.¹ Wu, Y.-J.² Wang, Y.-P.³ Qin, L.⁴ Wang, R.-H.⁵

30
- 0003793552
- 1st ed Englewood Cliffs Prentice-Hall
- A. V. Oppenheim and R. W. Schafer, Digital Signal Processing, 1st ed. Englewood Cliffs: Prentice-Hall, 1975, p. 15.
- (1975) Digital Signal Processing , pp. 15
- Oppenheim, A.V.¹ Schafer, R.W.²

31
- 84872191197
- Dept. of Eng., Univ. of Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR.677 [Online]
- M. Shannon and W. Byrne, "Viewing the trajectory HMM as a generalized autoregressive HMM," Dept. of Eng., Univ. of Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR.677, 2012 [Online]. Available: http://mi.eng.cam.ac. uk/sms46/papers/shannon2012viewing.pdf
- (2012) Viewing the Trajectory HMM As A Generalized Autoregressive HMM
- Shannon, M.¹ Byrne, W.²

32
- 33745216749
- The Blizzard challenge - 2005: Evaluating corpus-based speech synthesis on common datasets
- 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
- A. W. Black and K. Tokuda, "The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common datasets," in Proc. Interspeech '05, 2005, pp. 77-80. (Pubitemid 43908005)
- (2005) 9th European Conference on Speech Communication and Technology , pp. 77-80
- Black, A.W.¹ Tokuda, K.²

33
- 0027247004
- Mel-Cepstral distance measure for objective speech quality assessment
- R. Kubichek, "Mel-cepstral distance measure for objective speech quality assessment," in Proc. IEEE Pacific Rim Conf. Commun., Comput., Signal Process., 1993, pp. 125-128. (Pubitemid 23713438)
- (1993) IEEE Pac Rim Conf Commun Comput Signal Process , pp. 125-128
- Kubichek Robert, F.¹

34
- 33646773080
- Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-LTI-03-177
- J. Kominek and A. W. Black, "The CMU ARCTIC databases for speech synthesis," Carnegie Mellon Univ., Pittsburgh, PA, Tech. Rep. CMU-LTI-03-177, 2003.
- (2003) The CMU ARCTIC Databases for Speech Synthesis
- Kominek, J.¹ Black, A.W.²

35
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," Proc. ICASSP '92, pp. 137-140, 1992.
- (1992) Proc. ICASSP '92 , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

36
- 33846405723
- Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005
- DOI 10.1093/ietisy/e90-1.1.325
- H. Zen, T. Toda, M. Nakamura, and K. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge '05," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007. (Pubitemid 46145336)
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

37
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
- (1999) Speech Commun , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

38
- 0036522887
- Multi-space probability distribution HMM
- K. Tokuda, T. Masuko, N. Miyazaki, and T. Kobayashi, "Multi-space probability distribution HMM," IEICE Trans. Inf. Syst., vol. E85-D, no. 3, pp. 455-464, 2002. (Pubitemid 35353984)
- (2002) IEICE Transactions on Information and Systems , vol.E85-D , Issue.3 , pp. 455-464
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

39
- 85008023596
- Continuous F0 modeling for HMMbased statistical parametric speech synthesis
- Jul.
- K. Yu, "Continuous F0 modeling for HMMbased statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1071-1079, Jul. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.5 , pp. 1071-1079
- Yu, K.¹

40
- 67650832556
- Statistical analysis of the Blizzard Challenge 2007 listening test results
- R. A. J. Clark, M. Podsiadlo, M. Fraser, C. Mayo, and S. King, "Statistical analysis of the Blizzard Challenge 2007 listening test results," in Proc. Blizzard Challenge Workshop '07, 2007.
- (2007) Proc. Blizzard Challenge Workshop '07
- Clark, R.A.J.¹ Podsiadlo, M.² Fraser, M.³ Mayo, C.⁴ King, S.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.