메뉴 건너뛰기




Volumn , Issue , 2014, Pages 3844-3848

Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis

Author keywords

deep neural networks; hidden Markov models; mixture density networks; Statistical parametric speech synthesis

Indexed keywords

HIDDEN MARKOV MODELS; MIXTURES; PROBABILITY DENSITY FUNCTION; SIGNAL PROCESSING;

EID: 84905262874     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2014.6854321     Document Type: Conference Paper
Times cited : (215)

References (35)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commn., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commn. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.3
  • 2
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMMbased speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 3
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database
    • A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. ICASSP, 1996
    • (1996) Proc. ICASSP
    • Hunt, A.1    Black, A.2
  • 4
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features
    • H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007.
    • (2007) Comput. Speech Lang. , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 5
    • 84872190545 scopus 로고    scopus 로고
    • Autoregressive models for statistical parametric speech synthesis
    • M. Shannon, H. Zen, and W. Byrne, "Autoregressive models for statistical parametric speech synthesis," IEEE Trans. Acoust. Speech Lang. Process., vol. 21, no. 3, pp. 587-597, 2013.
    • (2013) IEEE Trans. Acoust. Speech Lang. Process. , vol.21 , Issue.3 , pp. 587-597
    • Shannon, M.1    Zen, H.2    Byrne, W.3
  • 6
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • Y.-J. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP, 2006, pp. 89-92.
    • (2006) Proc. ICASSP , pp. 89-92
    • Wu, Y.-J.1    Wang, R.-H.2
  • 7
    • 85008039410 scopus 로고    scopus 로고
    • Improved prosody generation by maximizing joint probability of state and longer units
    • Y. Qian, Z.-Z.Wu, B.-Y. Gao, and F. Soong, "Improved prosody generation by maximizing joint probability of state and longer units," IEEE Trans. Acoust. Speech Lang. Process., vol. 19, no. 6, pp. 1702-1710, 2011.
    • (2011) IEEE Trans. Acoust. Speech Lang. Process , vol.19 , Issue.6 , pp. 1702-1710
    • Qian, Y.1    Wu, Z.-Z.2    Gao, B.-Y.3    Soong, F.4
  • 8
  • 10
    • 84890527090 scopus 로고    scopus 로고
    • Multi-distribution deep belief network for speech synthesis
    • S.-Y. Kang, X.-J. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis," in Proc. ICASSP, 2013, pp. 8012-8016.
    • (2013) Proc. ICASSP , pp. 8012-8016
    • Kang X-J Qian, S.-Y.1    Meng, H.2
  • 11
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
    • Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Acoust. Speech Lang. Process., vol. 21, no. 10, pp. 2129-2139, 2013.
    • (2013) IEEE Trans. Acoust. Speech Lang. Process. , vol.21 , Issue.10 , pp. 2129-2139
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 13
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, 2013, pp. 7962-7966.
    • (2013) Proc. ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 14
    • 84890522099 scopus 로고    scopus 로고
    • F0 contour prediction with a deep belief network-Gaussian process hybrid model
    • R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "f0 contour prediction with a deep belief network-Gaussian process hybrid model," in Proc. ICASSP, 2013, pp. 6885-6889.
    • (2013) Proc. ICASSP , pp. 6885-6889
    • Fernandez, R.1    Rendel, A.2    Ramabhadran, B.3    Hoory, R.4
  • 15
    • 84929157442 scopus 로고    scopus 로고
    • Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
    • H. Lu, S. King, and O. Watts, "Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis," in Proc. ISCA SSW8, 2013, pp. 281-285.
    • (2013) Proc. ISCA SSW8 , pp. 281-285
    • Lu, H.1    King, S.2    Watts, O.3
  • 17
    • 84994214710 scopus 로고    scopus 로고
    • Deep learning in speech synthesis
    • H. Zen, "Deep learning in speech synthesis," in Keynote speech given at ISCA SSW8, 2013, http://research.google.com/pubs/ archive/41539.pdf.
    • (2013) Keynote Speech Given at ISCA SSW8
    • Zen, H.1
  • 18
    • 0004113976 scopus 로고
    • Tech. Rep. NCRG/94/004, Neural Computing Research Group, Aston University
    • C. Bishop, "Mixture density networks," Tech. Rep. NCRG/94/004, Neural Computing Research Group, Aston University, 1994.
    • (1994) Mixture Density Networks
    • Bishop, C.1
  • 20
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, 2000, pp. 1315-1318.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 21
    • 70450172128 scopus 로고    scopus 로고
    • Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems
    • K. Oura, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, "Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems," in Proc. Interspeech, 2009, pp. 1759-1762.
    • (2009) Proc. Interspeech , pp. 1759-1762
    • Oura, K.1    Zen, H.2    Nankaku, Y.3    Lee, A.4    Tokuda, K.5
  • 22
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 23
    • 38549178971 scopus 로고    scopus 로고
    • Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion
    • Springer
    • K. Richmond, "Trajectory mixture density networks with multiple mixtures for acoustic-articulatory inversion," in Advances in Nonlinear Speech Processing, pp. 263-272. Springer, 2007.
    • (2007) Advances in Nonlinear Speech Processing , pp. 263-272
    • Richmond, K.1
  • 25
    • 33846405723 scopus 로고    scopus 로고
    • Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge
    • H. Zen, T. Toda, M. Nakamura, and T. Tokuda, "Details of the Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005," IEICE Trans. Inf. Syst., vol. E90-D, no. 1, pp. 325-333, 2007.
    • (2005) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.1 , pp. 325-333
    • Zen, H.1    Toda, T.2    Nakamura, M.3    Tokuda, T.4
  • 26
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech," in Proc. ICASSP, 1992, pp. 137-140.
    • (1992) Proc. ICASSP , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 27
  • 29
    • 85135145174 scopus 로고    scopus 로고
    • Acoustic modeling based on the MDL criterion for speech recognition
    • K. Shinoda and T. Watanabe, "Acoustic modeling based on the MDL criterion for speech recognition," in Proc. Eurospeech, 1997, pp. 99-102.
    • (1997) Proc. Eurospeech , pp. 99-102
    • Shinoda, K.1    Watanabe, T.2
  • 30
    • 85008023596 scopus 로고    scopus 로고
    • Continuous F0 modelling for HMM based statistical parametric speech synthesis
    • K. Yu and S. Young, "Continuous F0 modelling for HMM based statistical parametric speech synthesis," IEEE Trans. Audio Speech Lang. Process., vol. 19, no. 5, pp. 1071-1079, 2011.
    • (2011) IEEE Trans. Audio Speech Lang. Process. , vol.19 , Issue.5 , pp. 1071-1079
    • Yu, K.1    Young, S.2
  • 31
    • 84887388950 scopus 로고    scopus 로고
    • An empirical study of learning rates in deep neural networks for speech recognition
    • A. Senior, G. Heigold, M. Ranzato, and K. Yang, "An empirical study of learning rates in deep neural networks for speech recognition," in Proc. ICASSP, 2013, pp. 6724-6728.
    • (2013) Proc. ICASSP , pp. 6724-6728
    • Senior, A.1    Heigold, G.2    Ranzato, M.3    Yang, K.4
  • 32
    • 80052250414 scopus 로고    scopus 로고
    • Adaptive subgradient methods for online learning and stochastic optimization
    • J. Duchi, E. Hazan, and Y. Singer, "Adaptive subgradient methods for online learning and stochastic optimization," The Journal of Machine Learning Research, pp. 2121-2159, 2011.
    • (2011) The Journal of Machine Learning Research , pp. 2121-2159
    • Duchi, J.1    Hazan, E.2    Singer, Y.3
  • 34
    • 78049361102 scopus 로고    scopus 로고
    • Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMMbased text-to-speech synthesis," IEICE Trans. Inf. Syst., vol. J87-D-II, no. 8, pp. 1563-1571, 2004.
    • (2004) IEICE Trans. Inf. Syst. , vol.J87-D-II , Issue.8 , pp. 1563-1571
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 35
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory," IEEE Trans. Audio Speech Lang. Process., vol. 15, no. 8, pp. 2222-2235, 2007.
    • (2007) IEEE Trans. Audio Speech Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.