메뉴 건너뛰기




Volumn , Issue , 2013, Pages 7962-7966

Statistical parametric speech synthesis using deep neural networks

Author keywords

Deep neural network; Hidden Markov model; Statistical parametric speech synthesis

Indexed keywords

CONTEXT DEPENDENCY; CONTEXT DEPENDENT; CONVENTIONAL APPROACH; DEEP NEURAL NETWORKS; HIDDEN MARKOV MODELS (HMMS); HMM-BASED SYSTEMS; PROBABILITY DENSITIES; STATISTICAL PARAMETRIC SPEECH SYNTHESIS;

EID: 84890490547     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2013.6639215     Document Type: Conference Paper
Times cited : (838)

References (40)
  • 1
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 2
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech syn-thesis system using a large speech database
    • A. Hunt and A. Black, "Unit selection in a concatenative speech syn-thesis system using a large speech database," in Proc. ICASSP, 1996, pp. 373-376.
    • (1996) Proc. ICASSP , pp. 373-376
    • Hunt, A.1    Black, A.2
  • 3
    • 0034842740 scopus 로고    scopus 로고
    • Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    • M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR," in Proc. ICASSP, 2001, pp. 805-808.
    • (2001) Proc. ICASSP , pp. 805-808
    • Tamura, M.1    Masuko, T.2    Tokuda, K.3    Kobayashi, T.4
  • 6
    • 51449114529 scopus 로고    scopus 로고
    • A style control technique for HMM-based expressive speech synthesis
    • T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.9 , pp. 1406-1413
    • Nose, T.1    Yamagishi, J.2    Masuko, T.3    Kobayashi, T.4
  • 8
    • 33846935000 scopus 로고    scopus 로고
    • HMM-based Korean speech synthesis system for hand-held devices
    • S.-J. Kim, J.-J. Kim, and M.-S. Hahn, "HMM-based Korean speech synthesis system for hand-held devices," IEEE Trans. Consum. Elec-tron., vol. 52, no. 4, pp. 1384-1390, 2006.
    • (2006) IEEE Trans. Consum. Elec-Tron. , vol.52 , Issue.4 , pp. 1384-1390
    • Kim, S.-J.1    Kim, J.-J.2    Hahn, M.-S.3
  • 9
    • 79959839868 scopus 로고    scopus 로고
    • Quantized HMMs for low footprint text-to-speech synthesis
    • A. Gutkin, X. Gonzalvo, S. Breuer, and P. Taylor, "Quantized HMMs for low footprint text-to-speech synthesis," in Proc. Interspeech, 2010, pp. 837-840.
    • (2010) Proc. Interspeech , pp. 837-840
    • Gutkin, A.1    Gonzalvo, X.2    Breuer, S.3    Taylor, P.4
  • 11
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech syn-thesis
    • H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech syn-thesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.3
  • 14
    • 85135145174 scopus 로고    scopus 로고
    • Acoustic modeling based on the MDL criterion for speech recognition
    • K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, 1997, pp. 99-102.
    • (1997) Proc. Eurospeech , pp. 99-102
    • Shinoda, K.1    Watanabe, T.2
  • 15
    • 0032658258 scopus 로고    scopus 로고
    • Decision tree state tying based on penalized Bayesian information criterion
    • W. Chou and W. Reichl, "Decision tree state tying based on penalized Bayesian information criterion," in Proc. ICASSP, 1999, vol. 1, pp. 345-348.
    • (1999) Proc. ICASSP , vol.1 , pp. 345-348
    • Chou, W.1    Reichl, W.2
  • 16
    • 33947650089 scopus 로고    scopus 로고
    • HMM state clustering based on efficient cross-validation
    • T. Shinozaki, "HMM state clustering based on efficient cross-validation," in Proc. ICASSP, 2006, pp. 1157-1160.
    • (2006) Proc. ICASSP , pp. 1157-1160
    • Shinozaki, T.1
  • 17
    • 80051615235 scopus 로고    scopus 로고
    • Decision tree-based context clustering based on cross validation and hierarchical priors
    • H. Zen and M.J.F. Gales, "Decision tree-based context clustering based on cross validation and hierarchical priors," in Proc. ICASSP, 2011, pp. 4560-4563.
    • (2011) Proc. ICASSP , pp. 4560-4563
    • Zen, H.1    Gales, M.J.F.2
  • 18
    • 34249043508 scopus 로고    scopus 로고
    • Anytime learning of decision trees
    • S. Esmeir and S. Markovitch, "Anytime learning of decision trees," J. Mach. Learn. Res., vol. 8, pp. 891-933, 2007.
    • (2007) J. Mach. Learn. Res. , vol.8 , pp. 891-933
    • Esmeir, S.1    Markovitch, S.2
  • 19
    • 79955538498 scopus 로고    scopus 로고
    • Context adaptive train-ing with factorized decision trees for HMM-based statistical parametric speech synthesis
    • K. Yu, H. Zen, F. Mairesse, and S. Young, "Context adaptive train-ing with factorized decision trees for HMM-based statistical parametric speech synthesis," Speech Commun., vol. 53, no. 6, pp. 914-923, 2011.
    • (2011) Speech Commun. , vol.53 , Issue.6 , pp. 914-923
    • Yu, K.1    Zen, H.2    Mairesse, F.3    Young, S.4
  • 25
    • 84867200235 scopus 로고    scopus 로고
    • Generating natural F0 trajectory with additive trees
    • Y. Qian, H. Liang, and F. Soong, "Generating natural F0 trajectory with additive trees," in Proc. Interspeech, 2008, pp. 2126-2129.
    • (2008) Proc. Interspeech , pp. 2126-2129
    • Qian, Y.1    Liang, H.2    Soong, F.3
  • 26
    • 51449118125 scopus 로고    scopus 로고
    • Acoustic modeling with contextual additive structure for HMM-based speech recognition
    • Y. Nankaku, K. Nakamura, H. Zen, and K. Tokuda, "Acoustic modeling with contextual additive structure for HMM-based speech recognition," in Proc. ICASSP, 2008, pp. 4469-4472.
    • (2008) Proc. ICASSP , pp. 4469-4472
    • Nankaku, Y.1    Nakamura, K.2    Zen, H.3    Tokuda, K.4
  • 28
  • 29
    • 78049376926 scopus 로고    scopus 로고
    • Word-level emphasis modelling in HMM-based speech synthesis
    • K. Yu, F. Mairesse, and S. Young, "Word-level emphasis modelling in HMM-based speech synthesis," in Proc. ICASSP, 2010, pp. 4238-4241.
    • (2010) Proc. ICASSP , pp. 4238-4241
    • Yu, K.1    Mairesse, F.2    Young, S.3
  • 30
    • 85032782045 scopus 로고    scopus 로고
    • Deep learning and its applications to signal and information processing
    • D. Yu and L. Deng, "Deep learning and its applications to signal and information processing," IEEE Signal Process. Magazine, vol. 28, no. 1, pp. 145-154, 2011.
    • (2011) IEEE Signal Process. Magazine , vol.28 , Issue.1 , pp. 145-154
    • Yu, D.1    Deng, L.2
  • 31
    • 0022667694 scopus 로고
    • Speaker independent isolated word recognition using dy-namic features of speech spectrum
    • S. Furui, "Speaker independent isolated word recognition using dy-namic features of speech spectrum," IEEE Trans. Acoust. Speech Signal Process., vol. 34, pp. 52-59, 1986.
    • (1986) IEEE Trans. Acoust. Speech Signal Process. , vol.34 , pp. 52-59
    • Furui, S.1
  • 32
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech syn-thesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech syn-thesis," in Proc. ICASSP, 2000, pp. 1315-1318.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 33
    • 33846405723 scopus 로고    scopus 로고
    • Details of the nitech hmm-based speech synthesis system for the blizzard challenge 2005
    • H. Zen, T. Toda, M. Nakamura, and T. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
    • Zen, H.1    Toda, T.2    Nakamura, M.3    Tokuda, T.4
  • 34
    • 85016140477 scopus 로고
    • An adaptive algo-rithm for mel-cepstral analysis of speech
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algo-rithm for mel-cepstral analysis of speech," in Proc. ICASSP, 1992, pp. 137-140.
    • (1992) Proc. ICASSP , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 35
  • 37
    • 85008023596 scopus 로고    scopus 로고
    • Continuous F0 modelling for HMM based sta-tistical parametric speech synthesis
    • K. Yu and S. Young, "Continuous F0 modelling for HMM based sta-tistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Process., vol. 19, no. 5, pp. 1071-1079, 2011.
    • (2011) IEEE Trans. Audio Speech Lang. Process. , vol.19 , Issue.5 , pp. 1071-1079
    • Yu, K.1    Young, S.2
  • 38
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 39
    • 78049361102 scopus 로고    scopus 로고
    • Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis," IEICE Trans. Inf. Syst., vol. J87-D-II, no. 8, pp. 1563-1571, 2004.
    • (2004) IEICE Trans. Inf. Syst. , vol.J87-D-II , Issue.8 , pp. 1563-1571
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 40
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 8, pp. 2222-2235, 2007
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.