메뉴 건너뛰기




Volumn 2015-August, Issue , 2015, Pages 4215-4219

Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis

Author keywords

adaptive cepstral analysis; neural network; Statistical parametric speech synthesis

Indexed keywords

DECISION TREES; EXTRACTION; FEATURE EXTRACTION; NEURAL NETWORKS; SPEECH COMMUNICATION; SPEECH SYNTHESIS; TREES (MATHEMATICS);

EID: 84946077883     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2015.7178765     Document Type: Conference Paper
Times cited : (40)

References (23)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commn., vol. 51, no. 11, pp. 1039-1064,2009
    • (2009) Speech Commn , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.3
  • 2
    • 0003323711 scopus 로고
    • Unbiased estimation of log spectrum
    • S. Imai and C. Furuichi, "Unbiased estimation of log spectrum," in Proc. EURASIP, 1988, pp. pp.203-206
    • (1988) Proc. EURASIP , pp. 203-206
    • Imai, S.1    Furuichi, C.2
  • 3
    • 0001810975 scopus 로고
    • Line spectrum representation of linear predictor coefficients of speech signals
    • F. Itakura, "Line spectrum representation of linear predictor coefficients of speech signals," The Journal of the Acoust. SocietyofAmerica, vol. 57, no. SI, pp. S35-S35, 1975
    • (1975) The Journal of the Acoust. SocietyofAmerica , vol.57 , Issue.51 , pp. S35-S35
    • Itakura, F.1
  • 4
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
    • H. Kawahara,]' Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight," in Proc. MAVEBA, 2001, pp. 13-15
    • (2001) Proc. MAVEBA , pp. 13-15
    • Kawahara Estill, H.1    Fujimura, O.2
  • 5
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 6
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP. 2013, pp. 7962-7966
    • (2013) Proc. ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 7
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • Y-]. Wu and R.-H. Wang, "Minimum generation error training for HMM-based speech synthesis," in Proc. ICASSP. 2006, pp. 89-92
    • (2006) Proc. ICASSP , pp. 89-92
    • Wu, Y.1    Wang, R.-H.2
  • 8
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features
    • H. Zen, K. Tokuda, and T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic features," Comput. Speech Lang., vol. 21, no. 1, pp. 153-173, 2007
    • (2007) Comput. Speech Lang , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 9
    • 0009553788 scopus 로고
    • A statistical method for estimation of speech spectral density and formant frequencies
    • F. Itakura and S. Saito, "A statistical method for estimation of speech spectral density and formant frequencies," IEICE Trans. Fundamentals Uapanese Edition), vol. ]53-A. no. 1, pp. 35-42, 1970
    • (1970) IEICE Trans. Fundamentals Uapanese Edition) , vol.53-A , Issue.1 , pp. 35-42
    • Itakura, F.1    Saito, S.2
  • 11
    • 84867214032 scopus 로고    scopus 로고
    • Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis
    • Y-]. Wu and K. Tokuda, "Minimum generation error training with direct log spectral distortion on LSPs for HMM-based speech synthesis," in Proc. Interspeech, 2008, pp. 577-580
    • (2008) Proc. Interspeech , pp. 577-580
    • Wu, Y.1    Tokuda, K.2
  • 12
    • 51449096059 scopus 로고    scopus 로고
    • Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory hmm
    • T. Toda and K. Tokuda, "Statistical approach to vocal tract transfer function estimation based on factor analyzed trajectory hmm," in Proc. ICASSP, 2008, pp. 3925-3928
    • (2008) Proc. ICASSP , pp. 3925-3928
    • Toda, T.1    Tokuda, K.2
  • 13
    • 84865797109 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters
    • R. Maia, H. Zen, and M. Gales, "Statistical parametric speech synthesis with joint estimation of acoustic and excitation model parameters," in Proc. ISCA SSW?, 2010, pp. 88-93
    • (2010) Proc. ISCA SSW? , pp. 88-93
    • Maia, R.1    Zen, H.2    Gales, M.3
  • 14
    • 84901760651 scopus 로고    scopus 로고
    • Integration of spectral feature extraction and modeling for HMM-based speech synthesis
    • K. Nakamura, K. Hashimoto, Y Nankaku, and K. Tokuda, "Integration of spectral feature extraction and modeling for HMM-based speech synthesis," IEICE Trans Inf Syst., vol. 97, no. 6, pp. 1438-1448,2014
    • (2014) IEICE Trans Inf Syst , vol.97 , Issue.6 , pp. 1438-1448
    • Nakamura, K.1    Hashimoto, K.2    Nankaku, Y.3    Tokuda, K.4
  • 16
    • 84994214710 scopus 로고    scopus 로고
    • Deep learning in speech synthesis
    • H. Zen, "Deep learning in speech synthesis," in Keynote speech given at ISCA SSW8, 2013, http://research. google. com/pubs/archive/41539.pdf
    • (2013) Keynote Speech Given at ISCA SSW8
    • Zen, H.1
  • 19
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for lowlatency speech synthesis
    • (accepted)
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for lowlatency speech synthesis," in Proc. ICASSP, 2015 (accepted)
    • (2015) Proc. ICASSP
    • Zen, H.1    Sak, H.2
  • 20
    • 78049361102 scopus 로고    scopus 로고
    • Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporation of mixed excitation model and postfilter into HMM-based text-to-speech synthesis," IEICE Trans. Inf Syst., vol. ]87-D-II, no. 8, pp. 1563-1571, 2004
    • (2004) IEICE Trans. Inf Syst , vol.87-D-11 , Issue.8 , pp. 1563-1571
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 22
    • 0001609567 scopus 로고
    • An efficient gradient-based algorithm for on-line training of recurrent network trajectories
    • R. Williams and J. Peng, "An efficient gradient-based algorithm for on-line training of recurrent network trajectories," Neural Comput., vol. 2, no. 4, pp. 490-501,1990
    • (1990) Neural Comput , vol.2 , Issue.4 , pp. 490-501
    • Williams, R.1    Peng, J.2
  • 23
    • 84910046405 scopus 로고    scopus 로고
    • Long short-term memory recurrent neural network architectures for large scale acoustic modeling
    • H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Proc. Interspeech, 2014
    • (2014) Proc. Interspeech
    • Sak, H.1    Senior, A.2    Beaufays, F.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.