메뉴 건너뛰기




Volumn 26, Issue 1, 2018, Pages 84-96

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

Author keywords

deep neural networks; generative adversarial networks; over smoothing; Statistical parametric speech synthesis; text tospeech synthesis; voice conversion

Indexed keywords

DEEP NEURAL NETWORKS; SPEECH PROCESSING; SPEECH SYNTHESIS;

EID: 85031781820     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2017.2761547     Document Type: Article
Times cited : (214)

References (53)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, A. Black, "Statistical parametric speech synthesis, " Speech Communi., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Communi. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.3
  • 2
    • 0023756465 scopus 로고
    • Speech synthesis by rule using an optimal selection of non-uniform synthesis units
    • New York, NY, USA Apr.
    • Y. Sagisaka, "Speech synthesis by rule using an optimal selection of non-uniform synthesis units, " in Proc. Int. Conf. Acoust., Speech, Signal Process., New York, NY, USA, Apr. 1988, pp. 679-682.
    • (1988) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 679-682
    • Sagisaka, Y.1
  • 3
    • 0032026483 scopus 로고
    • Continuous probabilistic transform for voice conversion
    • Mar.
    • Y. Stylianou, O. Cappé, E. Moulines, "Continuous probabilistic transform for voice conversion, " IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1988.
    • (1988) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
    • Stylianou, Y.1    Cappé, O.2    Moulines, E.3
  • 4
    • 85032750981 scopus 로고    scopus 로고
    • Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
    • May
    • Z.-H. Ling, et al., "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " IEEE Signal Process. Mag., vol. 32, no. 3, pp. 35-52, May 2015.
    • (2015) IEEE Signal Process. Mag. , vol.32 , Issue.3 , pp. 35-52
    • Ling, Z.-H.1
  • 5
    • 84876687945 scopus 로고    scopus 로고
    • Speech synthesis based on hiddenMarkovmodels
    • Apr.
    • K. Tokuda, Y Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura, "Speech synthesis based on hiddenMarkovmodels, " Proc. IEEE, vol. 101, no. 5, pp. 1234-1252, Apr. 2013.
    • (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1234-1252
    • Tokuda, K.1    Nankaku, Y.2    Toda, T.3    Zen, H.4    Yamagishi, J.5    Oura, K.6
  • 6
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
    • Nov.
    • T. Toda, A. W. Black, K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 7
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM based speech synthesis
    • Toulouse, France May
    • Y. J. Wu and R. H. Wang, "Minimum generation error training for HMMbased speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Toulouse, France, May 2006, pp. 89-92.
    • (2006) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 89-92
    • Wu, Y.J.1    Wang, R.H.2
  • 8
    • 84978086501 scopus 로고    scopus 로고
    • Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training
    • Jul.
    • Z. Wu and S. King, "Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training, " IEEE Trans. Audio, Speech, Lang. Process., vol. 24, no. 7, pp. 1255-1265, Jul. 2016.
    • (2016) IEEE Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.7 , pp. 1255-1265
    • Wu, Z.1    King, S.2
  • 9
    • 84994361374 scopus 로고    scopus 로고
    • The voice conversion challenge 2016
    • San Francisco, CA, USA, Sep.
    • T. Toda, et al., "The voice conversion challenge 2016, " in Proc. INTERSPEECH, San Francisco, CA, USA, Sep. 2016, pp. 1632-1636.
    • (2016) Proc. INTERSPEECH , pp. 1632-1636
    • Toda, T.1
  • 10
    • 84994234512 scopus 로고    scopus 로고
    • Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis
    • San Francisco, CA, USA, Sep.
    • Y. Ijima, T. Asami, H. Mizuno, "Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis, " in Proc. INTERSPEECH, San Francisco, CA, USA, Sep. 2016, pp. 337-341.
    • (2016) Proc. INTERSPEECH , pp. 337-341
    • Ijima, Y.1    Asami, T.2    Mizuno, H.3
  • 11
    • 84878387899 scopus 로고    scopus 로고
    • Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP
    • Portland, OR, USA, Sep.
    • Y. Ohtani, M. Tamura, M. Morita, T. Kagoshima, M. Akamine, "Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP, " in Proc. INTERSPEECH, Portland, OR, USA, Sep. 2012, pp. 1155-1158.
    • (2012) Proc. INTERSPEECH , pp. 1155-1158
    • Ohtani, Y.1    Tamura, M.2    Morita, M.3    Kagoshima, T.4    Akamine, M.5
  • 13
    • 84946033919 scopus 로고    scopus 로고
    • Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion
    • Brisbane, QLD, Australia, Apr.
    • S. Takamichi, T. Toda, A. W. Black, S. Nakamura, "Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Brisbane, QLD, Australia, Apr. 2015, pp. 4859-4863.
    • (2015) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 4859-4863
    • Takamichi, S.1    Toda, T.2    Black, A.W.3    Nakamura, S.4
  • 14
    • 84973375140 scopus 로고    scopus 로고
    • Trajectory training considering global variance for speech synthesis based on neural networks
    • Shanghai, China, Mar.
    • K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, "Trajectory training considering global variance for speech synthesis based on neural networks, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Shanghai, China, Mar. 2016, pp. 5600-5604.
    • (2016) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 5600-5604
    • Hashimoto, K.1    Oura, K.2    Nankaku, Y.3    Tokuda, K.4
  • 15
    • 84910088495 scopus 로고    scopus 로고
    • Analysis of spectral enhancement using global variance in HMM-based speech synthesis
    • MAX Atria, Singapore, May
    • T. Nose and A. Ito, "Analysis of spectral enhancement using global variance in HMM-based speech synthesis, " in Proc. INTERSPEECH, MAX Atria, Singapore, May 2014, pp. 2917-2921.
    • (2014) Proc. INTERSPEECH , pp. 2917-2921
    • Nose, T.1    Ito, A.2
  • 16
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • Vancouver, BC, Canada, May
    • H. Zen, A. Senior, M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. Int. Conf. Acoust., Speech, Signal Process, Vancouver, BC, Canada, May 2013, pp. 7962-7966.
    • (2013) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 18
    • 84962901047 scopus 로고    scopus 로고
    • Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, human performance
    • Apr.
    • Z. Wu, et al., "Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, human performance, " IEEE Trans. Audio, Speech, Lang. Process., vol. 24, no. 4, pp. 768-783, Apr. 2016.
    • (2016) IEEE Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.4 , pp. 768-783
    • Wu, Z.1
  • 19
    • 84959178048 scopus 로고    scopus 로고
    • Robust deep feature for spoofing detection the SJTU system for ASVspoof 2015 Challenge
    • Dresden, Germany, Sep.
    • N. Chen, Y. Qian, H. Dinkel, B. Chen, K. Yu, "Robust deep feature for spoofing detection the SJTU system for ASVspoof 2015 Challenge, " in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2097-2101.
    • (2015) Proc. INTERSPEECH , pp. 2097-2101
    • Chen, N.1    Qian, Y.2    Dinkel, H.3    Chen, B.4    Yu, K.5
  • 20
    • 85008023596 scopus 로고    scopus 로고
    • Continuous F0 modeling for HMM based statistical parametric speech synthesis
    • Jul.
    • K. Yu and S. Young, "Continuous F0 modeling for HMM based statistical parametric speech synthesis, " IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1071-1079, Jul. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1071-1079
    • Yu, K.1    Young, S.2
  • 21
    • 85064210846 scopus 로고    scopus 로고
    • Durational variability in speech and the rhythm class hypothesis
    • Berlin, Germany: Mouton de Gruyter
    • E. Grabe and E. L. Low, "Durational variability in speech and the rhythm class hypothesis, " in Papers in Laboratory Phonology 7. Berlin, Germany: Mouton de Gruyter, 2002, pp. 515-546.
    • (2002) Papers in Laboratory Phonology , vol.7 , pp. 515-546
    • Grabe, E.1    Low, E.L.2
  • 22
    • 85018914753 scopus 로고    scopus 로고
    • F-GAN: Training generative neural samplers using variational divergence minimization
    • Dec.
    • S. Nowozin, B. Cseke, R. Tomioka, "f-GAN: Training generative neural samplers using variational divergence minimization, " in Proc. Int. Conf. Neural Inf. Process. Syst., Dec. 2016, pp. 271-279.
    • (2016) Proc. Int. Conf. Neural Inf. Process. Syst. , pp. 271-279
    • Nowozin, S.1    Cseke, B.2    Tomioka, R.3
  • 26
    • 33847655586 scopus 로고    scopus 로고
    • A generalized divergence measure for nonnegative matrix factorization
    • Mar.
    • R. Kompass, "A generalized divergence measure for nonnegative matrix factorization, " Neural Comput., vol. 19, no. 3, pp. 780-891, Mar. 2007.
    • (2007) Neural Comput. , vol.19 , Issue.3 , pp. 780-891
    • Kompass, R.1
  • 27
    • 33748099812 scopus 로고    scopus 로고
    • Information theory and statistics: A tutorial
    • I. Csiszár and P. C. Shields, "Information theory and statistics: A tutorial, " Found. Trends Commun. Inf. Theory, vol. 1, no. 4, pp. 417-518, 2004.
    • (2004) Found. Trends Commun. Inf. Theory , vol.1 , Issue.4 , pp. 417-518
    • Csiszár, I.1    Shields, P.C.2
  • 29
    • 84959090360 scopus 로고    scopus 로고
    • Multi-task learning deep neural networks for speech feature denoising
    • Dresden, Germany, Sep.
    • B. Huang, D. Ke, H. Zheng, B. Xu, Y. Xu, K. Su, "Multi-task learning deep neural networks for speech feature denoising, " in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2464-2468.
    • (2015) Proc. INTERSPEECH , pp. 2464-2468
    • Huang, B.1    Ke, D.2    Zheng, H.3    Xu, B.4    Xu, Y.5    Su, K.6
  • 31
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
    • Brisbane, QLD, Australia, Apr.
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Brisbane, QLD, Australia, Apr. 2015, pp. 4470-4474.
    • (2015) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 4470-4474
    • Zen, H.1    Sak, H.2
  • 32
    • 33746600649 scopus 로고    scopus 로고
    • Reducing the dimensionality of data with neural networks
    • G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks, " Science, vol. 313, no. 5786, pp. 504-507, 2006.
    • (2006) Science , vol.313 , Issue.5786 , pp. 504-507
    • Hinton, G.E.1    Salakhutdinov, R.R.2
  • 36
    • 83755163018 scopus 로고    scopus 로고
    • Detecting novel associations in large data sets
    • D. N. Reshef, et al., "Detecting novel associations in large data sets, " Science, vol. 334, no. 6062, pp. 1518-1524, 2011.
    • (2011) Science , vol.334 , Issue.6062 , pp. 1518-1524
    • Reshef, D.N.1
  • 38
    • 84901764355 scopus 로고    scopus 로고
    • A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation
    • Jun.
    • K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura, "A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation, " IEICE Trans. Inf. Syst., vol. E97-D, no. 6, pp. 1429-1437, Jun. 2014.
    • (2014) IEICE Trans. Inf. Syst. , vol.E97-D , Issue.6 , pp. 1429-1437
    • Tanaka, K.1    Toda, T.2    Neubig, G.3    Sakti, S.4    Nakamura, S.5
  • 39
    • 84910030421 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using weighted multi-distribution deep belief network
    • Max Atria, Singapore, Sep.
    • S. Kang and H. Meng, "Statistical parametric speech synthesis using weighted multi-distribution deep belief network, " in Proc. INTERSPEECH, Max Atria, Singapore, Sep. 2014, pp. 1959-1963.
    • (2014) Proc. INTERSPEECH , pp. 1959-1963
    • Kang, S.1    Meng, H.2
  • 40
    • 85040306596 scopus 로고    scopus 로고
    • Stack GAN: Text to photo-realistic image synthesis with stacked generative adversarial networks
    • H. Zhang, et al., "StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, " IEEE Int. Conf. Comput. Vision (ICCV), 2017, pp. 5907-5915.
    • (2017) IEEE Int. Conf. Comput. Vision (ICCV) , pp. 5907-5915
    • Zhang, H.1
  • 41
    • 84950159800 scopus 로고    scopus 로고
    • Modeling f0 trajectories in hierarchically structured deep neural networks
    • X. Yin, et al., "Modeling f0 trajectories in hierarchically structured deep neural networks, " Speech Commun., vol. 76, pp. 82-92, 2016.
    • (2016) Speech Commun. , vol.76 , pp. 82-92
    • Yin, X.1
  • 43
    • 84994314564 scopus 로고    scopus 로고
    • Fast, compact, high quality LSTM-RNN based statistical parametric speech synthesizer for mobile devices
    • San Francisco, CA, USA, Sep.
    • H. Zen, Y. Agiomyrgiannakis, N. Egberts, F. Henderson, P. Szczepaniak, "Fast, compact, high quality LSTM-RNN based statistical parametric speech synthesizer for mobile devices, " in Proc. INTERSPEECH, San Francisco, CA, USA, Sep. 2016, pp. 2273-2277.
    • (2016) Proc. INTERSPEECH , pp. 2273-2277
    • Zen, H.1    Agiomyrgiannakis, Y.2    Egberts, N.3    Henderson, F.4    Szczepaniak, P.5
  • 44
    • 84973307947 scopus 로고    scopus 로고
    • Directly modeling voiced and unvoiced components in speech waveforms by neural networks
    • Shanghai, China, Mar.
    • K. Tokuda and H. Zen, "Directly modeling voiced and unvoiced components in speech waveforms by neural networks, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Shanghai, China, Mar. 2016, pp. 5640-5644.
    • (2016) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 5640-5644
    • Tokuda, K.1    Zen, H.2
  • 47
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
    • Firentze, Italy Sep.
    • H. Kawahara, Jo Estill, O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT, " in Proc. Int. Workshop Models Anal. Vocal Emissions Biomed. Appl., Firentze, Italy, Sep. 2001, pp. 1-6.
    • (2001) Proc. Int. Workshop Models Anal. Vocal Emissions Biomed. Appl. , pp. 1-6
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 48
    • 44949143155 scopus 로고    scopus 로고
    • Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
    • Pittsburgh, PA, USA Sep.
    • Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation, " in Proc. INTERSPEECH, Pittsburgh, PA, USA, Sep. 2006, pp. 2266-2269.
    • (2006) Proc. INTERSPEECH , pp. 2266-2269
    • Ohtani, Y.1    Toda, T.2    Saruwatari, H.3    Shikano, K.4
  • 49
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • Apr.
    • H. Kawahara, I. Masuda-Katsuse, A. D. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Commun., vol. 27, no. 3/4, pp. 187-207, Apr. 1999.
    • (1999) Speech Commun. , vol.27 , Issue.3-4 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    Cheveigne, A.D.3
  • 51
    • 84878390910 scopus 로고    scopus 로고
    • Implementation of computationally efficient real-time voice conversion
    • Portland, OR, USA, Sep.
    • T. Toda, T. Muramatsu, H. Banno, "Implementation of computationally efficient real-time voice conversion, " in Proc. INTERSPEECH, Portland, OR, USA, Sep. 2012, pp. 94-97.
    • (2012) Proc. INTERSPEECH , pp. 94-97
    • Toda, T.1    Muramatsu, T.2    Banno, H.3
  • 52
    • 84959126237 scopus 로고    scopus 로고
    • A comparison of features for synthetic speech detection
    • Dresden, Germany, Sep.
    • M. Sahidullah, T. Kinnunen, C. Hanilçi, "A comparison of features for synthetic speech detection, " in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2087-2091.
    • (2015) Proc. INTERSPEECH , pp. 2087-2091
    • Sahidullah, M.1    Kinnunen, T.2    Hanilçi, C.3
  • 53
    • 85039171110 scopus 로고    scopus 로고
    • Generative adversarial networkbased glottal waveform model for statistical parametric speech synthesis
    • Stockholm, Sweden, Aug.
    • B. Bollepalli, L. Juvela, P. Alku, "Generative adversarial networkbased glottal waveform model for statistical parametric speech synthesis, " in Proc. INTERSPEECH, Stockholm, Sweden, Aug. 2017, pp. 3394-3398.
    • (2017) Proc. INTERSPEECH , pp. 3394-3398
    • Bollepalli, B.1    Juvela, L.2    Alku, P.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.