메뉴 건너뛰기




Volumn 21, Issue 3, 2013, Pages 587-597

Autoregressive models for statistical parametric speech synthesis

Author keywords

Acoustic modeling; autoregressive hidden Markov model; autoregressive processes; hidden Markov models (HMMs); speech; statistical parametric speech synthesis

Indexed keywords

ACOUSTIC MODELING; AUTO REGRESSIVE MODELS; AUTO REGRESSIVE PROCESS; AUTO-REGRESSIVE; EXPECTATION MAXIMIZATION; GENERATION ALGORITHM; HIDDEN MARKOV MODELS (HMMS); HIGH QUALITY; LOW LATENCY; NUMBER OF STATE; OBJECTIVE EVALUATION; SYNTHESIS ALGORITHMS;

EID: 84872190545     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2227740     Document Type: Article
Times cited : (60)

References (40)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 3
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
    • DOI 10.1016/j.csl.2006.01.002, PII S0885230806000052
    • H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007. (Pubitemid 44537647)
    • (2007) Computer Speech and Language , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 6
    • 85009267646 scopus 로고
    • Hidden Markov models using vector linear prediction and discriminative output distributions
    • P. C. Woodland, "Hidden Markov models using vector linear prediction and discriminative output distributions," in Proc. ICASSP '92, 1992, pp. 509-512.
    • (1992) Proc. ICASSP '92 , pp. 509-512
    • Woodland, P.C.1
  • 7
    • 0037841402 scopus 로고    scopus 로고
    • Graphical models and automatic speech recognition
    • M. Johnson, S. P. Khudanpur M. Ostendorf, and R. Rosenfeld, Eds. New York: Springer-Verlag
    • J. Bilmes, "Graphical models and automatic speech recognition," in Mathematical Foundations of Speech and Language Processing, M. Johnson, S. P. Khudanpur, M. Ostendorf, and R. Rosenfeld, Eds. New York: Springer-Verlag, 2004.
    • (2004) Mathematical Foundations of Speech and Language Processing
    • Bilmes, J.1
  • 8
    • 85009236696 scopus 로고    scopus 로고
    • Maximum mutual information training of hidden Markov models with vector linear predictors
    • K. K. Chin and P. C. Woodland, "Maximum mutual information training of hidden Markov models with vector linear predictors," in Proc. Interspeech '02, 2002, pp. 997-1000.
    • (2002) Proc. Interspeech '02 , pp. 997-1000
    • Chin, K.K.1    Woodland, P.C.2
  • 9
    • 84985742249 scopus 로고
    • Linear predictive hidden Markov models and the speech signal
    • A. Poritz, "Linear predictive hidden Markov models and the speech signal," in Proc. ICASSP '82, 1982, vol. 7, pp. 1291-1294.
    • (1982) Proc. ICASSP '82 , vol.7 , pp. 1291-1294
    • Poritz, A.1
  • 10
    • 0022270364 scopus 로고
    • Mixture autoregressive hidden Markov models for speech signals
    • Dec
    • B. H. Juang and L. Rabiner, "Mixture autoregressive hidden Markov models for speech signals," IEEE Trans. Acoust., Speech, Signal Process., vol. 33, no. 6, pp. 1404-1413, Dec. 1985.
    • (1985) IEEE Trans. Acoust., Speech, Signal Process , vol.33 , Issue.6 , pp. 1404-1413
    • Juang, B.H.1    Rabiner, L.2
  • 11
    • 70450175584 scopus 로고    scopus 로고
    • Autoregressive HMMs for speech synthesis
    • M. Shannon and W. Byrne, "Autoregressive HMMs for speech synthesis," in Proc. Interspeech '09, 2009, pp. 400-403.
    • (2009) Proc. Interspeech '09 , pp. 400-403
    • Shannon, M.1    Byrne, W.2
  • 12
    • 79959849719 scopus 로고    scopus 로고
    • Autoregressive clustering for HMM speech synthesis
    • M. Shannon and W. Byrne, "Autoregressive clustering for HMM speech synthesis," in Proc. Interspeech '10, 2010, pp. 829-832.
    • (2010) Proc. Interspeech '10 , pp. 829-832
    • Shannon, M.1    Byrne, W.2
  • 13
    • 84865801900 scopus 로고    scopus 로고
    • The effect of using normalized models in statistical speech synthesis
    • M. Shannon, H. Zen, and W. Byrne, "The effect of using normalized models in statistical speech synthesis," in Proc. Interspeech '11, 2011, pp. 121-124.
    • (2011) Proc. Interspeech '11 , pp. 121-124
    • Shannon, M.1    Zen, H.2    Byrne, W.3
  • 14
    • 84867625378 scopus 로고    scopus 로고
    • AutoregressiveHMM speech synthesis
    • C. Quillen, "AutoregressiveHMM speech synthesis," in Proc. ICASSP '12, 2012, pp. 4021-4024.
    • Proc. ICASSP '12 , vol.2012 , pp. 4021-4024
    • Quillen, C.1
  • 15
    • 84872175773 scopus 로고    scopus 로고
    • [Online] accessed 21 March 2012
    • EMIME consortium, Tools [Online]. Available: http://www.emime. org/participate/tools, accessed 21 March, 2012
    • EMIME Consortium, Tools
  • 16
    • 84872185805 scopus 로고    scopus 로고
    • HTS working group, HMM-Based Speech Synthesis System (HTS) [Online]. Available accessed 21March 2012
    • HTS working group, HMM-Based Speech Synthesis System (HTS) [Online]. Available: http://hts.sp.nitech.ac.jp/accessed 21March, 2012
  • 18
  • 19
    • 0037278070 scopus 로고    scopus 로고
    • An efficient forward-backward algorithm for an explicit-duration hidden Markov model
    • Jan
    • S. Z. Yu and H. Kobayashi, "An efficient forward-backward algorithm for an explicit-duration hidden Markov model," IEEE Signal Process. Lett., vol. 10, no. 1, pp. 11-14, Jan. 2003.
    • (2003) IEEE Signal Process. Lett , vol.10 , Issue.1 , pp. 11-14
    • Yu, S.Z.1    Kobayashi, H.2
  • 20
    • 69849091128 scopus 로고    scopus 로고
    • Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm
    • H. Zen, "Implementing an HSMM-based speech synthesis system using an efficient forward-backward algorithm," Nagoya Inst. of Technol., Tech. Rep. TR-SP-0001, 2007.
    • (2007) Nagoya Inst. of Technol., Tech. Rep. TR-SP-0001
    • Zen, H.1
  • 24
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 25
    • 0035483059 scopus 로고    scopus 로고
    • Vector quantization of speech spectral parameters using statistics of static and dynamic features
    • Autonomous Decentralized Systems and Systems Assurance
    • K. Koishida, K. Tokuda, T. Masuko, and T. Kobayashi, "Vector quantization of speech spectral parameters using statistics of static and dynamic features," IEICE Trans. Inf. Syst., vol. E84-D, no. 10, pp. 1427-1434, 2001. (Pubitemid 33099747)
    • (2001) IEICE Transactions on Information and Systems , vol.E84-D , Issue.10 , pp. 1427-1434
    • Koishida, K.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4
  • 26
    • 84867211725 scopus 로고    scopus 로고
    • Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Low-delay voice conversion based on maximum likelihood estimation of spectral parameter trajectory," in Proc. Interspeech '08, 2008, pp. 1076-1079.
    • (2008) Proc. Interspeech '08 , pp. 1076-1079
    • Muramatsu, T.1    Ohtani, Y.2    Toda, T.3    Saruwatari, H.4    Shikano, K.5
  • 27
    • 84867619546 scopus 로고    scopus 로고
    • Improved minimum converted trajectory error training for real-time speech-to-lips conversion
    • W. Han, L. Wang, F. Soong, and B. Yuan, "Improved minimum converted trajectory error training for real-time speech-to-lips conversion," in Proc. ICASSP '12, 2012, pp. 4513-4516.
    • (2012) Proc. ICASSP '12 , pp. 4513-4516
    • Han, W.1    Wang, L.2    Soong, F.3    Yuan, B.4
  • 28
    • 78049361102 scopus 로고    scopus 로고
    • Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis," IEICE Trans. Inf. Syst. (Jpn. Ed.), vol. J87-D-II, no. 8, pp. 1565-1571, 2004.
    • (2004) IEICE Trans. Inf. Syst. (Jpn. Ed.) , vol.J87-D-II , Issue.8 , pp. 1565-1571
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 31
    • 84872191197 scopus 로고    scopus 로고
    • Dept. of Eng., Univ. of Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR.677 [Online]
    • M. Shannon and W. Byrne, "Viewing the trajectory HMM as a generalized autoregressive HMM," Dept. of Eng., Univ. of Cambridge, U.K., Tech. Rep. CUED/F-INFENG/TR.677, 2012 [Online]. Available: http://mi.eng.cam.ac. uk/sms46/papers/shannon2012viewing.pdf
    • (2012) Viewing the Trajectory HMM As A Generalized Autoregressive HMM
    • Shannon, M.1    Byrne, W.2
  • 32
    • 33745216749 scopus 로고    scopus 로고
    • The Blizzard challenge - 2005: Evaluating corpus-based speech synthesis on common datasets
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • A. W. Black and K. Tokuda, "The Blizzard Challenge 2005: Evaluating corpus-based speech synthesis on common datasets," in Proc. Interspeech '05, 2005, pp. 77-80. (Pubitemid 43908005)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 77-80
    • Black, A.W.1    Tokuda, K.2
  • 33
    • 0027247004 scopus 로고
    • Mel-Cepstral distance measure for objective speech quality assessment
    • R. Kubichek, "Mel-cepstral distance measure for objective speech quality assessment," in Proc. IEEE Pacific Rim Conf. Commun., Comput., Signal Process., 1993, pp. 125-128. (Pubitemid 23713438)
    • (1993) IEEE Pac Rim Conf Commun Comput Signal Process , pp. 125-128
    • Kubichek Robert, F.1
  • 35
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," Proc. ICASSP '92, pp. 137-140, 1992.
    • (1992) Proc. ICASSP '92 , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 36
    • 33846405723 scopus 로고    scopus 로고
    • Details of the nitech HMM-based speech synthesis system for the blizzard challenge 2005
    • DOI 10.1093/ietisy/e90-1.1.325
    • H. Zen, T. Toda, M. Nakamura, and K. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge '05," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007. (Pubitemid 46145336)
    • (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.1 , pp. 325-333
    • Zen, H.1    Toda, T.2    Nakamura, M.3    Tokuda, K.4
  • 37
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, pp. 187-207, 1999.
    • (1999) Speech Commun , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 39
    • 85008023596 scopus 로고    scopus 로고
    • Continuous F0 modeling for HMMbased statistical parametric speech synthesis
    • Jul.
    • K. Yu, "Continuous F0 modeling for HMMbased statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1071-1079, Jul. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.5 , pp. 1071-1079
    • Yu, K.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.