메뉴 건너뛰기




Volumn , Issue , 2017, Pages 4910-4914

Generative adversarial network-based postfilter for statistical parametric speech synthesis

Author keywords

deep neural network; generative adversarial network; postfilter; Statistical parametric speech synthesis

Indexed keywords


EID: 85023752230     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2017.7953090     Document Type: Conference Paper
Times cited : (156)

References (30)
  • 1
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 2
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, 2013, pp. 7962-7966.
    • (2013) Proc. ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 3
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database
    • A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. ICASSP, 1996, pp. 373-376.
    • (1996) Proc. ICASSP , pp. 373-376
    • Hunt, A.1    Black, A.2
  • 4
    • 0034842740 scopus 로고    scopus 로고
    • Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR
    • M. Tamura, T. Masuko, K. Tokuda, and T. Kobayashi, "Adaptation of pitch and spectrum for HMM-based speech synthesis using MLLR," in Proc. ICASSP, 2001, pp. 805-808.
    • (2001) Proc. ICASSP , pp. 805-808
    • Tamura, M.1    Masuko, T.2    Tokuda, K.3    Kobayashi, T.4
  • 5
    • 51449114529 scopus 로고    scopus 로고
    • A style control technique for HMM-based expressive speech synthesis
    • T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis," IEICE Trans. Inf. Syst., Vol. E90-D, no. 2, pp. 533-543, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
    • Nose, T.1    Yamagishi, J.2    Masuko, T.3    Kobayashi, T.4
  • 7
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis," Speech Commn., Vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commn. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.3
  • 9
    • 27144515530 scopus 로고    scopus 로고
    • Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporating a mixed excitation model and postfilter into HMM-based text-to-speech synthesis," Syst. Comput. Jpn., Vol. 36, no. 12, pp. 43-50, 2005.
    • (2005) Syst. Comput. Jpn. , vol.36 , Issue.12 , pp. 43-50
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 10
    • 67650851754 scopus 로고    scopus 로고
    • USTC system for blizzard challenge 2006 an improved HMM-based speech synthesis method
    • Z. Ling, Y. Wu, Y. Wang, L. Qin, and R. Wang, "USTC system for Blizzard Challenge 2006 an improved HMM-based speech synthesis method," in Proc. Blizzard Challenge Workshop, 2006.
    • (2006) Proc. Blizzard Challenge Workshop
    • Ling, Z.1    Wu, Y.2    Wang, Y.3    Qin, L.4    Wang, R.5
  • 11
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inf. Syst., Vol. E90-D, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 12
    • 84878384520 scopus 로고    scopus 로고
    • Ways to implement global variance in statistical speech synthesis
    • H. Silén, E. Heiander, J. Nurminen, and M. Gabbouj, "Ways to implement global variance in statistical speech synthesis," in Proc. Interspeech, 2012, pp. 1436-1439.
    • (2012) Proc. Interspeech , pp. 1436-1439
    • Silén, H.1    Heiander, E.2    Nurminen, J.3    Gabbouj, M.4
  • 13
    • 84905234422 scopus 로고    scopus 로고
    • A postfilter to modify the modulation spectrum in HMM-based speech synthesis
    • S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "A postfilter to modify the modulation spectrum in HMM-based speech synthesis," in Proc. ICASSP, 2014, pp. 290-294.
    • (2014) Proc. ICASSP , pp. 290-294
    • Takamichi, S.1    Toda, T.2    Neubig, G.3    Sakti, S.4    Nakamura, S.5
  • 16
    • 84946074523 scopus 로고    scopus 로고
    • The effect of neural networks in statistical parametric speech synthesis
    • K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, "The effect of neural networks in statistical parametric speech synthesis," in Proc. ICASSP, 2015, pp. 4455-4459.
    • (2015) Proc. ICASSP , pp. 4455-4459
    • Hashimoto, K.1    Oura, K.2    Nankaku, Y.3    Tokuda, K.4
  • 18
    • 84965143571 scopus 로고    scopus 로고
    • Deep generative image models using a laplacian pyramid of adversarial networks
    • E. Denton, S. Chintala, Szlam A., and R. Fergus, "Deep generative image models using a Laplacian pyramid of adversarial networks," in Proc. NIPS, 2015, pp. 1486-1494.
    • (2015) Proc. NIPS , pp. 1486-1494
    • Denton, E.1    Chintala, S.2    Szlam, A.3    Fergus, R.4
  • 19
    • 85083950271 scopus 로고    scopus 로고
    • Unsupervised representation learning with deep convolutional generative adversarial networks
    • A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks," in Proc. ICLR, 2016.
    • (2016) Proc. ICLR
    • Radford, A.1    Metz, L.2    Chintala, S.3
  • 20
    • 84999041243 scopus 로고    scopus 로고
    • Autoencoding beyond pixels using a learned similarity metric
    • A. B. L. Larsen, S. K. Sønderby, and O. Winther, "Autoencoding beyond pixels using a learned similarity metric," in Proc. ICML, 2016.
    • (2016) Proc. ICML
    • Larsen, A.B.L.1    Sønderby, S.K.2    Winther, O.3
  • 22
    • 84986274465 scopus 로고    scopus 로고
    • Deep residual learning for image recognition
    • K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proc. CVPR, 2016.
    • (2016) Proc. CVPR
    • He, K.1    Zhang, X.2    Ren, S.3    Sun, J.4
  • 23
    • 84945230598 scopus 로고    scopus 로고
    • Fully convolutional networks for semantic segmentation
    • J. Long, E. Shelhamer, and T. Darrell, "Fully convolutional networks for semantic segmentation," in Proc. CVPR, 2015, pp. 3431-3440.
    • (2015) Proc. CVPR , pp. 3431-3440
    • Long, J.1    Shelhamer, E.2    Darrell, T.3
  • 24
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commn., Vol. 27, no. 3, pp. 187-207, 1999.
    • (1999) Speech Commn. , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 25
    • 77956509090 scopus 로고    scopus 로고
    • Rectified linear units improve restricted boltzmann machines
    • V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," in Proc. ICML, 2010, pp. 807-814.
    • (2010) Proc. ICML , pp. 807-814
    • Nair, V.1    Hinton, G.E.2
  • 26
    • 84893676344 scopus 로고    scopus 로고
    • Rectifier nonlinearities improve neural network acoustic models
    • A. Maas, A. Y. Hannun, and A. Y. Ng, "Rectifier nonlinearities improve neural network acoustic models," in Proc. ICML, 2013.
    • (2013) Proc. ICML
    • Maas, A.1    Hannun, A.Y.2    Ng, A.Y.3
  • 28
    • 84969991188 scopus 로고    scopus 로고
    • Adam: A method for stochastic optimization
    • D. P. Kingma and M. Welling, "Adam: A method for stochastic optimization," in Proc. ICLR, 2015.
    • (2015) Proc. ICLR
    • Kingma, D.P.1    Welling, M.2
  • 29
    • 85083950260 scopus 로고    scopus 로고
    • A note on the evaluation of generative models
    • L. Theis, A. Oord, and M. Bethge, "A note on the evaluation of generative models," in Proc. ICLR, 2016.
    • (2016) Proc. ICLR
    • Theis, L.1    Oord, A.2    Bethge, M.3
  • 30
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks," in Proc. Interspeech, 2014, pp. 1964-1968.
    • (2014) Proc. Interspeech , pp. 1964-1968
    • Fan, Y.1    Qian, Y.2    Xie, F.-L.3    Soong, F.K.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.