메뉴 건너뛰기




Volumn 8, Issue 2, 2014, Pages 173-183

Statistical parametric speech synthesis based on gaussian process regression

Author keywords

Gaussian process regression; nonparametric Bayesian model; partially independent conditional (PIC) approximation; sparse Gaussian processes; statistical speech synthesis

Indexed keywords

ARTICULATORY INFORMATIONS; GAUSSIAN PROCESS REGRESSION; LINGUISTIC INFORMATION; MINIMUM GENERATION ERRORS; NON-PARAMETRIC BAYESIAN MODELING; PARTIALLY INDEPENDENT CONDITIONAL (PIC) APPROXIMATION; SPARSE GAUSSIAN PROCESS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS;

EID: 84897902941     PISSN: 19324553     EISSN: None     Source Type: Journal    
DOI: 10.1109/JSTSP.2013.2283461     Document Type: Article
Times cited : (37)

References (36)
  • 1
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. EUROSPEECH, 1999, pp. 2347-2350
    • (1999) Proc. EUROSPEECH , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 2
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009
    • (2009) Speech Commun , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.3
  • 3
    • 0028996993 scopus 로고
    • Speech parameter generation from HMM using dynamic features
    • K. Tokuda, T. Kobayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features," in Proc. ICASSP '95, 1995, pp. 660-663
    • (1995) Proc. ICASSP , vol.95 , pp. 660-663
    • Tokuda, K.1    Kobayashi, T.2    Imai, S.3
  • 4
    • 0003805597 scopus 로고
    • The use of context in large vocabulary speech recognition
    • Cambridge, U.K
    • J. J.Odell, "The use of context in large vocabulary speech recognition," Ph.D. dissertation, Univ. of Cambridge, Cambridge, U.K., 1995
    • (1995) Ph.D. Dissertation Univ. of Cambridge
    • Odell, J.J.1
  • 5
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • Y. J. Wu and R. H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP, 2006, vol. 1, pp. 889-892
    • (2006) Proc. ICASSP , vol.1 , pp. 889-892
    • Wu, Y.J.1    Wang, R.H.2
  • 6
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
    • DOI 10.1016/j.csl.2006.01.002, PII S0885230806000052
    • H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007 (Pubitemid 44537647)
    • (2007) Computer Speech and Language , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 7
    • 70450175584 scopus 로고    scopus 로고
    • Autoregressive HMMs for speech synthesis
    • M. Shannon and W. Byrne, "Autoregressive HMMs for speech synthesis," in Proc. Interspeech, 2009, vol. 2009, pp. 400-403
    • (2009) Proc. Interspeech , vol.2009 , pp. 400-403
    • Shannon, M.1    Byrne, W.2
  • 8
    • 34547514452 scopus 로고    scopus 로고
    • A novel HMM-based TTS system using both continuous HMMs and discrete HMMs
    • J. Yu, M. Zhang, J. Tao, and X. Wang, "A novel HMM-based TTS system using both continuous HMMs and discrete HMMs," in Proc. ICASSP, 2007, pp. 709-712
    • (2007) Proc. ICASSP , pp. 709-712
    • Yu, J.1    Zhang, M.2    Tao, J.3    Wang, X.4
  • 9
    • 70450161678 scopus 로고    scopus 로고
    • Rich context modeling for high quality HMM-based TTS
    • Z.-J. Yan, Y. Qian, and F. K. Soong, "Rich context modeling for high quality HMM-based TTS," in Proc. INTERSPEECH, 2009, pp. 1755-1758
    • (2009) Proc. INTERSPEECH , pp. 1755-1758
    • Yan, Z.-J.1    Qian, Y.2    Soong, F.K.3
  • 10
    • 56349089638 scopus 로고    scopus 로고
    • Gaussian process regression for voice activity detection and speech enhancement
    • S. Park and S. Choi, "Gaussian process regression for voice activity detection and speech enhancement," in Proc. IJCNN, 2008, pp. 2879-2882
    • (2008) Proc. IJCNN , pp. 2879-2882
    • Park, S.1    Choi, S.2
  • 11
    • 84865737668 scopus 로고    scopus 로고
    • Gaussian process experts for voice conversion
    • N. C. V. Pilkington, H. Zen, and M. J. F. Gales, "Gaussian process experts for voice conversion," in Proc. INTERSPEECH, 2011, pp. 2761-2764
    • (2011) Proc. INTERSPEECH , pp. 2761-2764
    • Pilkington, N.C.V.1    Zen, H.2    Gales, M.J.F.3
  • 13
    • 84867596846 scopus 로고    scopus 로고
    • Gaussian process dynamical models for nonparametric speech representation and synthesis
    • G. Henter, M. Frean, and W. Kleijn, "Gaussian process dynamical models for nonparametric speech representation and synthesis," in Proc. ICASSP, 2012, pp. 4505-4508
    • (2012) Proc. ICASSP , pp. 4505-4508
    • Henter, G.1    Frean, M.2    Kleijn, W.3
  • 16
    • 78649536452 scopus 로고    scopus 로고
    • A frame-based context-dependent acoustic modeling for speech recognition (Japanese)
    • R. Terashima, H. Zen, Y. Nankaku, and K. Tokuda, "A frame-based context-dependent acoustic modeling for speech recognition (Japanese)," IEEJ Trans. Electron., Inf., Syst., vol. 130, no. 10, pp. 1856-1864, 2010
    • (2010) IEEJ Trans. Electron., Inf., Syst , vol.130 , Issue.10 , pp. 1856-1864
    • Terashima, R.1    Zen, H.2    Nankaku, Y.3    Tokuda, K.4
  • 17
    • 84867599581 scopus 로고    scopus 로고
    • An F0 modeling technique based on prosodic events for spontaneous speech synthesis
    • T. Koriyama, T. Nose, and T. Kobayashi, "An F0 modeling technique based on prosodic events for spontaneous speech synthesis," in Proc. ICASSP, 2012, pp. 4589-4593
    • (2012) Proc. ICASSP , pp. 4589-4593
    • Koriyama, T.1    Nose, T.2    Kobayashi, T.3
  • 18
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, 2013, pp. 7962-7966
    • (2013) Proc. ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 19
    • 2642588255 scopus 로고    scopus 로고
    • Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition
    • T. Fukuda and T. Nitta, "Orthogonalized distinctive phonetic feature extraction for noise-robust automatic speech recognition," IEICE Trans. Inf. Syst., vol. 87, no. 5, pp. 1110-1118, 2004
    • (2004) IEICE Trans. Inf. Syst , vol.87 , Issue.5 , pp. 1110-1118
    • Fukuda, T.1    Nitta, T.2
  • 20
    • 0025475528 scopus 로고
    • ATR Japanese speech database as a tool of speech recognition and synthesis
    • Aug
    • A. Kurematsu, K. Takeda, Y. Sagisaka, S. Katagiri, H. Kuwabara, and K. Shikano, "ATR Japanese speech database as a tool of speech recognition and synthesis," Speech Commun., vol. 9, no. 4, pp. 357-363, Aug. 1990
    • (1990) Speech Commun , vol.9 , Issue.4 , pp. 357-363
    • Kurematsu, A.1    Takeda, K.2    Sagisaka, Y.3    Katagiri, S.4    Kuwabara, H.5    Shikano, K.6
  • 21
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999
    • (1999) Speech Commun , vol.27 , Issue.3-4 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigne, A.3
  • 24
    • 0027247004 scopus 로고
    • Mel-Cepstral distance measure for objective speech quality assessment
    • R. Kubichek, "Mel-cepstral distance measure for objective speech quality assessment," in Proc. IEEE Pacific Rim Conf. Commun., Comput., Signal Process., 1993, vol. 1, pp. 125-128 (Pubitemid 23713438)
    • (1993) IEEE Pac Rim Conf Commun Comput Signal Process , pp. 125-128
    • Kubichek Robert, F.1
  • 26
    • 79960113740 scopus 로고    scopus 로고
    • Local and global sparse Gaussian process approximations
    • E. Snelson and Z. Ghahramani, "Local and global sparse Gaussian process approximations," in Proc. AISTATS, 2007
    • (2007) Proc. AISTATS
    • Snelson, E.1    Ghahramani, Z.2
  • 27
    • 29144453489 scopus 로고    scopus 로고
    • A unifying view of sparse approximate Gaussian process regression
    • J. Qui nonero-Candela and C. E. Rasmussen, "A unifying view of sparse approximate Gaussian process regression," J. Mach. Learn. Res., vol. 6, pp. 1939-1959, 2005 (Pubitemid 41798128)
    • (2005) Journal of Machine Learning Research , vol.6 , pp. 1939-1959
    • Quinonero-Candela, J.1    Rasmussen, C.E.2
  • 28
    • 84864038646 scopus 로고    scopus 로고
    • Sparse Gaussian processes using pseudo-inputs
    • MIT Press
    • E. Snelson and Z. Ghahramani, "Sparse Gaussian processes using pseudo-inputs," in Proc. NIPS 18,MIT Press, 2006, pp. 1257C-1264
    • (2006) Proc. NIPS , vol.18
    • Snelson, E.1    Ghahramani, Z.2
  • 31
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007
    • (2007) IEICE Trans. Inf. Syst. , vol.90 , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 34
    • 84865801900 scopus 로고    scopus 로고
    • The effect of using normalized models in statistical speech synthesis
    • M. Shannon, H. Zen, and W. Byrne, "The effect of using normalized models in statistical speech synthesis," in Proc. Interspeech, 2011, 2011, pp. 121-124
    • (2011) Proc. Interspeech 2011 , pp. 121-124
    • Shannon, M.1    Zen, H.2    Byrne, W.3
  • 35
    • 84862860004 scopus 로고    scopus 로고
    • Nonstationary dependent Gaussian processes for data fusion in large-scale terrain modeling
    • S. Vasudevan, F. Ramos, E. Nettleton, and H. Durrant-Whyte, "Nonstationary dependent Gaussian processes for data fusion in large-scale terrain modeling," in Proc. ICRA, 2011, pp. 1875-1882
    • (2011) Proc. ICRA , pp. 1875-1882
    • Vasudevan, S.1    Ramos, F.2    Nettleton, E.3    Durrant-Whyte, H.4
  • 36
    • 84898943255 scopus 로고    scopus 로고
    • Warped Gaussian processes
    • MIT Press
    • E. Snelson, C. E. Rasmussen, and Z. Ghahramani, "Warped Gaussian processes," in Proc. NIPS 16,MIT Press, 2004, pp. 337-344.
    • (2004) Proc. NIPS , vol.16 , pp. 337-344
    • Snelson, E.1    Rasmussen, C.E.2    Ghahramani, Z.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.