메뉴 건너뛰기




Volumn 2017-August, Issue , 2017, Pages 3394-3398

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

Author keywords

DNN; GAN; Glottal souce modelling; TTS

Indexed keywords

ADDITIVE NOISE; DEEP NEURAL NETWORKS; SPEECH; SPEECH SYNTHESIS; STOCHASTIC MODELS; STOCHASTIC SYSTEMS; VOCODERS;

EID: 85039171110     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: 10.21437/Interspeech.2017-1288     Document Type: Conference Paper
Times cited : (47)

References (38)
  • 2
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 3
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • May
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. of ICASSP, May 2013, pp. 7962-7966.
    • (2013) Proc. of ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 4
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks." in Interspeech, 2014, pp. 1964-1968.
    • (2014) Inter Speech , pp. 1964-1968
    • Fan, Y.1    Qian, Y.2    Xie, F.-L.3    Soong, F.K.4
  • 5
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Proc. of ICASSP. IEEE, 2015, pp. 4470-4474.
    • (2015) Proc. of ICASSP. IEEE , pp. 4470-4474
    • Zen, H.1    Sak, H.2
  • 6
    • 84978086501 scopus 로고    scopus 로고
    • Improving trajectory modelling for DNNbased speech synthesis by using stacked bottleneck features and minimum generation error training
    • Z. Wu and S. King, "Improving trajectory modelling for DNNbased speech synthesis by using stacked bottleneck features and minimum generation error training, " IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 7, pp. 1255-1265, 2016.
    • (2016) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.24 , Issue.7 , pp. 1255-1265
    • Wu, Z.1    King, S.2
  • 7
    • 85023745327 scopus 로고    scopus 로고
    • An autoregressive recurrent mixture density network for parametric speech synthesis
    • X. Wang, S. Takaki, and J. Yamagishi, "An autoregressive recurrent mixture density network for parametric speech synthesis, " in Proc. of ICASSP. IEEE, 2017, pp. 4895-4899.
    • (2017) Proc. of ICASSP. IEEE , pp. 4895-4899
    • Wang, X.1    Takaki, S.2    Yamagishi, J.3
  • 8
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. De Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
    • (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigne, A.3
  • 9
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
    • H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, " in MAVEBA, 2001.
    • (2001) MAVEBA
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 12
    • 0035127703 scopus 로고    scopus 로고
    • Applying the harmonic plus noise model in concatenative speech synthesis
    • Y. Stylianou, "Applying the harmonic plus noise model in concatenative speech synthesis, " IEEE Transactions on Speech and Audio Processing, vol. 9, no. 1, pp. 21-29, 2001.
    • (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.1 , pp. 21-29
    • Stylianou, Y.1
  • 13
    • 84897865577 scopus 로고    scopus 로고
    • Harmonics plus noise model based vocoder for statistical parametric speech synthesis
    • April
    • D. Erro, I. Sainz, E. Navas, and I. Hernaez, "Harmonics plus noise model based vocoder for statistical parametric speech synthesis, " IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 2, pp. 184-194, April 2014.
    • (2014) IEEE Journal of Selected Topics in Signal Processing , vol.8 , Issue.2 , pp. 184-194
    • Erro, D.1    Sainz, I.2    Navas, E.3    Hernaez, I.4
  • 14
    • 0026881384 scopus 로고
    • Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
    • Eurospeech '91
    • P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, " Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992, Eurospeech '91.
    • (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 109-118
    • Alku, P.1
  • 15
    • 84973293681 scopus 로고    scopus 로고
    • Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
    • Mar
    • L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network, " in "Proc. of ICASSP", Mar. 2016, pp. 5120-5124.
    • (2016) Proc. of ICASSP , pp. 5120-5124
    • Juvela, L.1    Bollepalli, B.2    Airaksinen, M.3    Alku, P.4
  • 16
    • 85017575778 scopus 로고    scopus 로고
    • Glottal vocoding with frequency-warped time-weighted linear prediction
    • April
    • M. Airaksinen, B. Bollepalli, J. Pohjalainen, and P. Alku, "Glottal vocoding with frequency-warped time-weighted linear prediction, " IEEE Signal Processing Letters, vol. 24, no. 4, pp. 446-450, April 2017.
    • (2017) IEEE Signal Processing Letters , vol.24 , Issue.4 , pp. 446-450
    • Airaksinen, M.1    Bollepalli, B.2    Pohjalainen, J.3    Alku, P.4
  • 17
    • 84910068090 scopus 로고    scopus 로고
    • Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
    • September
    • T. Raitio, A. Suni, L. Juvela, M. Vainio, and P. Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, " in Proc. of Interspeech, Singapore, September 2014, pp. 1969-1973.
    • (2014) Proc. of Interspeech, Singapore , pp. 1969-1973
    • Raitio, T.1    Suni, A.2    Juvela, L.3    Vainio, M.4    Alku, P.5
  • 22
    • 84902548165 scopus 로고    scopus 로고
    • Glottal source processing: From analysis to applications
    • T. Drugman, P. Alku, A. Alwan, and B. Yegnanarayana, "Glottal source processing: From analysis to applications, " Computer Speech & Language, vol. 28, no. 5, pp. 1117-1138, 2014.
    • (2014) Computer Speech & Language , vol.28 , Issue.5 , pp. 1117-1138
    • Drugman, T.1    Alku, P.2    Alwan, A.3    Yegnanarayana, B.4
  • 23
    • 84856294347 scopus 로고    scopus 로고
    • Glottal inverse filtering analysis of human voice production-A review of estimation and parameterization methods of the glottal excitation and their applications. (invited article)
    • P. Alku, "Glottal inverse filtering analysis of human voice production-a review of estimation and parameterization methods of the glottal excitation and their applications. (invited article), " Sadhana-Academy Proceedings in Engineering Sciences, vol. 36, no. 5, pp. 623-650, 2011.
    • (2011) Sadhana-Academy Proceedings in Engineering Sciences , vol.36 , Issue.5 , pp. 623-650
    • Alku, P.1
  • 24
    • 85023752230 scopus 로고    scopus 로고
    • Generative adversarial network-based postfilter for statistical parametric speech synthesis
    • March
    • T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2017, pp. 4910-4914.
    • (2017) Proc. of ICASSP , pp. 4910-4914
    • Kaneko, T.1    Kameoka, H.2    Hojo, N.3    Ijima, Y.4    Hiramatsu, K.5    Kashino, K.6
  • 25
    • 85023772724 scopus 로고    scopus 로고
    • Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis
    • New Orleans, USA
    • Y. Saito, S. Takamichi, and H. Saruwatari, "Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis, " in ICASSP, New Orleans, USA, 2017, pp. 4900-4904.
    • (2017) ICASSP , pp. 4900-4904
    • Saito, Y.1    Takamichi, S.2    Saruwatari, H.3
  • 31
    • 70450180978 scopus 로고    scopus 로고
    • Robust LTS rules with the Combilex speech technology lexicon
    • September
    • K. Richmond, R. A. Clark, and S. Fitt, "Robust LTS rules with the Combilex speech technology lexicon, " in Proc. of Interspeech, Brighton, September 2009, pp. 1295-1298.
    • (2009) Proc. of Interspeech, Brighton , pp. 1295-1298
    • Richmond, K.1    Clark, R.A.2    Fitt, S.3
  • 33
    • 85039167642 scopus 로고    scopus 로고
    • Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system
    • L. Juvela, B. Bollepalli, J. Yamagishi, and P. Alku, "Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system, " in "Submitted to Interspeech", 2017.
    • (2017) Submitted to Interspeech
    • Juvela, L.1    Bollepalli, B.2    Yamagishi, J.3    Alku, P.4
  • 35
    • 0029254163 scopus 로고
    • Non-parametric techniques for pitch-scale and time-scale modification of speech
    • E. Moulines and J. Laroche, "Non-parametric techniques for pitch-scale and time-scale modification of speech, " Speech communication, vol. 16, no. 2, pp. 175-205, 1995.
    • (1995) Speech Communication , vol.16 , Issue.2 , pp. 175-205
    • Moulines, E.1    Laroche, J.2
  • 36
    • 0003450846 scopus 로고    scopus 로고
    • 800, methods for subjective determination of transmission quality
    • Recommendation ITUTP, "800, methods for subjective determination of transmission quality, " International Telecommunication Union, 1996.
    • (1996) International Telecommunication Union
    • Itutp, R.1
  • 37
    • 85039147615 scopus 로고    scopus 로고
    • CrowdFlower Inc
    • CrowdFlower Inc. (2017) Crowd-sourcing platform. [Online]. Available: https://www.crowdflower.com/
    • (2017) Crowd-sourcing Platform


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.