메뉴 건너뛰기




Volumn 2017-August, Issue , 2017, Pages 1368-1372

Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system

Author keywords

Excitation modeling; Statistical parametric speech synthesis

Indexed keywords

DEEP NEURAL NETWORKS; ERRORS; INVERSE PROBLEMS; SPEECH; SPEECH SYNTHESIS;

EID: 85039167642     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: 10.21437/Interspeech.2017-848     Document Type: Conference Paper
Times cited : (8)

References (33)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 3
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • May
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. of ICASSP, May 2013, pp. 7962-7966.
    • (2013) Proc. of ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 4
    • 85032750981 scopus 로고    scopus 로고
    • Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
    • May
    • Z.-H. Ling, S.-Y. Kang, H. Zen, A. Senior, M. Schuster, X.-J. Qian, H. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " Signal Processing Magazine, IEEE, vol. 32, no. 3, pp. 35-52, May 2015.
    • (2015) Signal Processing Magazine, IEEE , vol.32 , Issue.3 , pp. 35-52
    • Ling, Z.-H.1    Kang, S.-Y.2    Zen, H.3    Senior, A.4    Schuster, M.5    Qian, X.-J.6    Meng, H.7    Deng, L.8
  • 5
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks." in Interspeech, 2014, pp. 1964-1968.
    • (2014) Interspeech , pp. 1964-1968
    • Fan, Y.1    Qian, Y.2    Xie, F.-L.3    Soong, F.K.4
  • 6
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
    • IEEE
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4470-4474.
    • (2015) Acoustics, Speech and Signal Processing (ICASSP 2015 IEEE International Conference on , pp. 4470-4474
    • Zen, H.1    Sak, H.2
  • 7
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
    • H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, " in MAVEBA, 2001.
    • (2001) MAVEBA
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 8
    • 85023745327 scopus 로고    scopus 로고
    • An autoregressive recurrent mixture density network for parametric speech synthesis
    • IEEE
    • X. Wang, S. Takaki, and J. Yamagishi, "An autoregressive recurrent mixture density network for parametric speech synthesis, " in Proc. of ICASSP. IEEE, 2017, pp. 4895-4899.
    • (2017) Proc. of ICASSP , pp. 4895-4899
    • Wang, X.1    Takaki, S.2    Yamagishi, J.3
  • 9
    • 85023752230 scopus 로고    scopus 로고
    • Generative adversarial network-based postfilter for statistical parametric speech synthesis
    • March
    • T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2017, pp. 4910-4914.
    • (2017) Proc. of ICASSP , pp. 4910-4914
    • Kaneko, T.1    Kameoka, H.2    Hojo, N.3    Ijima, Y.4    Hiramatsu, K.5    Kashino, K.6
  • 10
    • 84946077883 scopus 로고    scopus 로고
    • Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis
    • April
    • K. Tokuda and H. Zen, "Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis, " in Proc. of ICASSP, April 2015, pp. 4215-4219.
    • (2015) Proc. of ICASSP , pp. 4215-4219
    • Tokuda, K.1    Zen, H.2
  • 11
    • 84973307947 scopus 로고    scopus 로고
    • Directly modeling voiced and unvoiced components in speech waveforms by neural networks
    • March
    • -, "Directly modeling voiced and unvoiced components in speech waveforms by neural networks, " in Proc. of ICASSP, March 2016, pp. 5640-5644.
    • (2016) Proc. of ICASSP , pp. 5640-5644
  • 12
    • 84973309345 scopus 로고    scopus 로고
    • A deep auto-encoder based lowdimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis
    • March
    • S. Takaki and J. Yamagishi, "A deep auto-encoder based lowdimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2016, pp. 5535-5539.
    • (2016) Proc. of ICASSP , pp. 5535-5539
    • Takaki, S.1    Yamagishi, J.2
  • 13
    • 85039167413 scopus 로고    scopus 로고
    • Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis
    • submitted
    • S. Takaki, H. Kameoka, and J. Yamagishi, "Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis, " in Interspeech (submitted), 2017.
    • (2017) Interspeech
    • Takaki, S.1    Kameoka, H.2    Yamagishi, J.3
  • 18
    • 84973293681 scopus 로고    scopus 로고
    • Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
    • Mar
    • L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network, " in "Proc. of ICASSP", Mar. 2016, pp. 5120-5124.
    • (2016) Proc. of ICASSP , pp. 5120-5124
    • Juvela, L.1    Bollepalli, B.2    Airaksinen, M.3    Alku, P.4
  • 20
    • 84994228803 scopus 로고    scopus 로고
    • Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks
    • L. Juvela, X. Wang, S. Takaki, M. Airaksinen, J. Yamagishi, and P. Alku, "Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks, " in "Proc. of Interspeech", Sep. 2016, pp. 2283-2287.
    • (2016) Proc. of Interspeech , pp. 2283-2287
    • Juvela, L.1    Wang, X.2    Takaki, S.3    Airaksinen, M.4    Yamagishi, J.5    Alku, P.6
  • 22
    • 70450180978 scopus 로고    scopus 로고
    • Robust LTS rules with the Combilex speech technology lexicon
    • September
    • K. Richmond, R. A. Clark, and S. Fitt, "Robust LTS rules with the Combilex speech technology lexicon, " in Proc. of Interspeech, Brighton, September 2009, pp. 1295-1298.
    • (2009) Proc. of Interspeech, Brighton , pp. 1295-1298
    • Richmond, K.1    Clark, R.A.2    Fitt, S.3
  • 25
    • 0001455934 scopus 로고
    • A robust algorithm for pitch tracking (RAPT)
    • D. Talkin, "A robust algorithm for pitch tracking (RAPT), " Speech coding and synthesis, vol. 495, p. 518, 1995.
    • (1995) Speech Coding and Synthesis , vol.495 , pp. 518
    • Talkin, D.1
  • 28
    • 84865718521 scopus 로고    scopus 로고
    • Comparison of formant enhancement methods for HMM-based speech synthesis
    • T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Comparison of formant enhancement methods for HMM-based speech synthesis." in SSW, 2010, pp. 334-339.
    • (2010) SSW , pp. 334-339
    • Raitio, T.1    Suni, A.2    Pulakka, H.3    Vainio, M.4    Alku, P.5
  • 29
    • 0029254163 scopus 로고
    • Non-parametric techniques for pitch-scale and time-scale modification of speech
    • E. Moulines and J. Laroche, "Non-parametric techniques for pitch-scale and time-scale modification of speech, " Speech communication, vol. 16, no. 2, pp. 175-205, 1995.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 175-205
    • Moulines, E.1    Laroche, J.2
  • 30
    • 85039167524 scopus 로고    scopus 로고
    • ETSI TS 126 090: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions
    • "ETSI TS 126 090: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions" , " European Telecommunications Standards Institute, Technical specification, 2016.
    • (2016) European Telecommunications Standards Institute, Technical Specification
  • 31
    • 84930639546 scopus 로고    scopus 로고
    • Introducing CURRENNT: The Munich opensource CUDA recurrent neural network toolkit
    • F. Weninger, "Introducing CURRENNT: The Munich opensource CUDA recurrent neural network toolkit, " Journal of Machine Learning Research, vol. 16, pp. 547-551, 2015.
    • (2015) Journal of Machine Learning Research , vol.16 , pp. 547-551
    • Weninger, F.1
  • 32
    • 85039147615 scopus 로고    scopus 로고
    • CrowdFlower Inc. (2017) Crowd-sourcing platform. [Online]. Available: https://www.crowdflower.com/
    • (2017) Crowd-sourcing Platform


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.