메뉴 건너뛰기




Volumn 23, Issue 11, 2015, Pages 2003-2014

A deep generative architecture for postfiltering in statistical parametric speech synthesis

Author keywords

Deep generative architecture; Hidden Markov model (HMM); Modulation spectrum; Postfilter; Segmental quality; Speech synthesis

Indexed keywords

ASSOCIATIVE PROCESSING; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MEMORY ARCHITECTURE; MODULATION; NETWORK ARCHITECTURE; PLASMA DIAGNOSTICS; SPEECH SYNTHESIS; STRAIN MEASUREMENT; SYNTHESIS (CHEMICAL);

EID: 84942607168     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2015.2461448     Document Type: Article
Times cited : (44)

References (43)
  • 2
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis", Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 3
    • 33947682675 scopus 로고    scopus 로고
    • The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases
    • A. Black and K. Tokuda, "The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases", in Proc. Interspeech, 2005.
    • (2005) Proc. Interspeech
    • Black, A.1    Tokuda, K.2
  • 4
    • 85032750981 scopus 로고    scopus 로고
    • Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
    • May
    • Z.-H. Ling, S.-Y. Kang, H. Zen, A. Senior, M. Schuster, X.-J. Qian, H. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends", IEEE Signal Process. Mag., vol. 32, no. 3, pp. 35-52, May 2015.
    • (2015) IEEE Signal Process. Mag. , vol.32 , Issue.3 , pp. 35-52
    • Ling, Z.-H.1    Kang, S.-Y.2    Zen, H.3    Senior, A.4    Schuster, M.5    Qian, X.-J.6    Meng, H.7    Deng, L.8
  • 5
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks", in Proc. ICASSP, 2013, pp. 7962-7966.
    • (2013) Proc. ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 6
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
    • Oct.
    • Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis", IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2129-2139, Oct. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.10 , pp. 2129-2139
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 7
    • 84890527090 scopus 로고    scopus 로고
    • Multi-distribution deep belief network for speech synthesis
    • Apr.
    • S. Kang, X. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis", in Proc. ICASSP, Apr. 2013, pp. 8012-8016.
    • (2013) Proc. ICASSP , pp. 8012-8016
    • Kang, S.1    Qian, X.2    Meng, H.3
  • 8
    • 84890522099 scopus 로고    scopus 로고
    • F0 contour prediction with a deep belief network-Gaussian process hybrid model
    • Apr.
    • R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "F0 contour prediction with a deep belief network-gaussian process hybrid model", in Proc. ICASSP, Apr. 2013, pp. 6885-6889.
    • (2013) Proc. ICASSP , pp. 6885-6889
    • Fernandez, R.1    Rendel, A.2    Ramabhadran, B.3    Hoory, R.4
  • 9
    • 78049412607 scopus 로고    scopus 로고
    • An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech
    • R. Vishnubhotla, S. Fernandez, and B. Ramabhadran, "An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech", in Proc. ICASSP, 2010, pp. 4614-4617.
    • (2010) Proc. ICASSP , pp. 4614-4617
    • Vishnubhotla, R.1    Fernandez, S.2    Ramabhadran, B.3
  • 10
    • 84910068142 scopus 로고    scopus 로고
    • Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks
    • R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks", in Proc. Interspeech, 2014, pp. 2268-2272.
    • (2014) Proc. Interspeech , pp. 2268-2272
    • Fernandez, R.1    Rendel, A.2    Ramabhadran, B.3    Hoory, R.4
  • 11
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Y. Fan, Y. Qian, F. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks", in Proc. Interspeech, 2014, pp. 1964-1968.
    • (2014) Proc. Interspeech , pp. 1964-1968
    • Fan, Y.1    Qian, Y.2    Xie, F.3    Soong, F.K.4
  • 12
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
    • Apr.
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis", in Proc. ICASSP, Apr. 2015, pp. 4470-4474.
    • (2015) Proc. ICASSP , pp. 4470-4474
    • Zen, H.1    Sak, H.2
  • 13
    • 67650851754 scopus 로고    scopus 로고
    • USTC system for blizzard challenge 2006an improved HMM-based speech synthesis method
    • Pittsburgh, PA, USA, Sep.
    • Z. Ling, Y. Wu, Y. Wang, L. Qin, and R. Wang, "USTC system for blizzard challenge 2006an improved HMM-based speech synthesis method", in Proc. Blizzard Challenge Workshop, Pittsburgh, PA, USA, Sep. 2006.
    • (2006) Proc. Blizzard Challenge Workshop
    • Ling, Z.1    Wu, Y.2    Wang, Y.3    Qin, L.4    Wang, R.5
  • 14
    • 27144515530 scopus 로고    scopus 로고
    • Incorporating a mixed excitation model and postfilter into HMMbased text-to-speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporating a mixed excitation model and postfilter into HMMbased text-to-speech synthesis", Syst. Comput. Jpn., vol. 36, no. 12, pp. 43-50, 2005.
    • (2005) Syst. Comput. Jpn. , vol.36 , Issue.12 , pp. 43-50
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 15
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 16
    • 84905234422 scopus 로고    scopus 로고
    • A postfilter to modify the modulation spectrum in HMM-based speech synthesis
    • May
    • S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "A postfilter to modify the modulation spectrum in HMM-based speech synthesis", in Proc. ICASSP, May 2014, pp. 290-294.
    • (2014) Proc. ICASSP , pp. 290-294
    • Takamichi, S.1    Toda, T.2    Neubig, G.3    Sakti, S.4    Nakamura, S.5
  • 17
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximumlikelihood estimation of spectral parameter trajectory
    • Nov.
    • T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximumlikelihood estimation of spectral parameter trajectory", IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.2    Tokuda, K.3
  • 18
    • 0031623661 scopus 로고    scopus 로고
    • Spectral voice conversion for text-to-speech synthesis
    • A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis", in Proc. ICASSP, 1998, pp. 285-288.
    • (1998) Proc. ICASSP , pp. 285-288
    • Kain, A.1    Macon, M.2
  • 19
    • 0028996842 scopus 로고
    • CELP coding based on mel-cepstral analysis
    • K. Koishida, K. Tokuda, T. Kobayashi, and S. Imai, "CELP coding based on mel-cepstral analysis", in Proc. ICASSP, 1995, vol. 1, pp. 33-36.
    • (1995) Proc. ICASSP , vol.1 , pp. 33-36
    • Koishida, K.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 20
    • 79959847301 scopus 로고    scopus 로고
    • Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis
    • Z.-H. Ling, Y. Hu, and L.-R. Dai, "Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis", in Proc. Interspeech, 2010, pp. 825-828.
    • (2010) Proc. Interspeech , pp. 825-828
    • Ling, Z.-H.1    Hu, Y.2    Dai, L.-R.3
  • 21
    • 84876687945 scopus 로고    scopus 로고
    • Speech synthesis based on hidden Markov models
    • May
    • K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models", Proc. IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
    • (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1234-1252
    • Tokuda, K.1    Nankaku, Y.2    Toda, T.3    Zen, H.4    Yamagishi, J.5    Oura, K.6
  • 22
    • 85016140477 scopus 로고
    • An adaptive algorithm for mel-cepstral analysis of speech
    • San Francisco, CA, USA, Mar.
    • T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech", in Proc. ICASSP, San Francisco, CA, USA, Mar. 1992, vol. 1, pp. 137-140.
    • (1992) Proc. ICASSP , vol.1 , pp. 137-140
    • Fukada, T.1    Tokuda, K.2    Kobayashi, T.3    Imai, S.4
  • 24
    • 84910104946 scopus 로고    scopus 로고
    • Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes
    • L.-H. Chen, Z.-H. Ling, and L.-R. Dai, "Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes", in Proc. Interspeech, 2014, pp. 2313-2317.
    • (2014) Proc. Interspeech , pp. 2313-2317
    • Chen, L.-H.1    Ling, Z.-H.2    Dai, L.-R.3
  • 25
    • 84921735339 scopus 로고    scopus 로고
    • Voice conversion using deep neural networks with layer-wise generative training
    • Dec.
    • L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, "Voice conversion using deep neural networks with layer-wise generative training", IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1859-1872, Dec. 2014.
    • (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.22 , Issue.12 , pp. 1859-1872
    • Chen, L.-H.1    Ling, Z.-H.2    Liu, L.-J.3    Dai, L.-R.4
  • 26
    • 0000329993 scopus 로고
    • Information processing in dynamical systems: Foundations of harmony theory
    • D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA, USA: MIT Press, ch. 6
    • P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory", in Parallel distributed processing: Explorations in the microstructure of cognition, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA, USA: MIT Press, 1986, vol. 1, ch. 6, pp. 194-281.
    • (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol.1 , pp. 194-281
    • Smolensky, P.1
  • 27
    • 0023861743 scopus 로고
    • Bidirectional associative memories
    • B. Kosko, "Bidirectional associative memories", IEEE Trans. Systems, Man, Cybern., vol. 18, no. 1, pp. 49-60, 1988.
    • (1988) IEEE Trans. Systems, Man, Cybern. , vol.18 , Issue.1 , pp. 49-60
    • Kosko, B.1
  • 28
    • 33746600649 scopus 로고    scopus 로고
    • Reducing the dimensionality of data with neural networks
    • G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks", Science, vol. 313, no. 5786, pp. 504-507, 2006.
    • (2006) Science , vol.313 , Issue.5786 , pp. 504-507
    • Hinton, G.E.1    Salakhutdinov, R.R.2
  • 29
    • 0013344078 scopus 로고    scopus 로고
    • Training products of experts by minimizing contrastive divergence
    • G. Hinton, "Training products of experts by minimizing contrastive divergence", Neural Comput., vol. 12, no. 14, pp. 1711-1800, 2002.
    • (2002) Neural Comput. , vol.12 , Issue.14 , pp. 1711-1800
    • Hinton, G.1
  • 30
    • 84872506495 scopus 로고    scopus 로고
    • A practical guide to training restricted Boltzmann machines
    • NY, USA: Springer
    • G. E. Hinton, "A practical guide to training restricted Boltzmann machines", in Neural Networks: Tricks of the Trade. New York, NY, USA: Springer, 2012, pp. 599-619.
    • (2012) Neural Networks: Tricks of the Trade. New York , pp. 599-619
    • Hinton, G.E.1
  • 31
    • 78651276374 scopus 로고    scopus 로고
    • Ph. D. dissertation, Univ. of Toronto, Toronto, ON, Canada
    • R. Salakhutdinov, "Learning deep generative models", Ph. D. dissertation, Univ. of Toronto, Toronto, ON, Canada, 2009.
    • (2009) Learning Deep Generative Models
    • Salakhutdinov, R.1
  • 32
    • 0020118274 scopus 로고
    • Neural networks and physical systems with emergent collective computational abilities
    • J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities", in Proc. Nat. Acad. Sci., 1982, vol. 79, no. 8, pp. 2554-2558.
    • (1982) Proc. Nat. Acad. Sci. , vol.79 , Issue.8 , pp. 2554-2558
    • Hopfield, J.J.1
  • 33
    • 84905223323 scopus 로고    scopus 로고
    • Using bidirectional associative memories for joint spectral envelope modeling in voice conversion
    • May
    • L.-J. Liu, L.-H. Chen, Z.-H. Ling, and L.-R. Dai, "Using bidirectional associative memories for joint spectral envelope modeling in voice conversion", in Proc. ICASSP, May 2014, pp. 7884-7888.
    • (2014) Proc. ICASSP , pp. 7884-7888
    • Liu, L.-J.1    Chen, L.-H.2    Ling, Z.-H.3    Dai, L.-R.4
  • 35
    • 69349090197 scopus 로고    scopus 로고
    • Learning deep architectures for AI
    • Y. Bengio, "Learning deep architectures for AI", Foundations Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, 2009.
    • (2009) Foundations Trends Mach. Learn. , vol.2 , Issue.1 , pp. 1-127
    • Bengio, Y.1
  • 37
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis", in Proc. ICASSP, 2000, vol. 3, pp. 1315-1318.
    • (2000) Proc. ICASSP , vol.3 , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 38
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds", Speech Commun., vol. 27, pp. 187-207, 1999.
    • (1999) Speech Commun. , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    Cheveigné, A.3
  • 39
    • 85131821539 scopus 로고
    • Mel-generalized cepstral analysis-A unified approach to speech spectral estimation
    • Yokohama, Japan, Sep.
    • K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Mel-generalized cepstral analysis-A unified approach to speech spectral estimation", in Proc. ICSLP, Yokohama, Japan, Sep. 1994, vol. 3, pp. 1043-1046.
    • (1994) Proc. ICSLP , vol.3 , pp. 1043-1046
    • Tokuda, K.1    Kobayashi, T.2    Masuko, T.3    Imai, S.4
  • 40
    • 13344250603 scopus 로고    scopus 로고
    • Method for the subjective assessment of intermediate quality level of coding systems
    • Geneva, Switzerland, Mar.
    • "Method for the subjective assessment of intermediate quality level of coding systems", ITU Rec. ITU-R BS.1534-1, Int. Telecomm. Union Radiocommunication Assembly. Geneva, Switzerland, Mar. 2003.
    • (2003) ITU Rec. ITU-R BS.1534-1, Int. Telecomm. Union Radiocommunication Assembly
  • 41
    • 0014568991 scopus 로고
    • IEEE recommended practice for speech quality measurement
    • Sep.
    • IEEE, "IEEE recommended practice for speech quality measurement", IEEE Trans. Audio Electroacoust., vol. AE-17, no. 3, pp. 225-246, Sep. 1969.
    • (1969) IEEE Trans. Audio Electroacoust. , vol.AE-17 , Issue.3 , pp. 225-246
    • IEEE1
  • 43
    • 84896734479 scopus 로고    scopus 로고
    • Deep scattering spectrum
    • Aug.
    • J. Anden and S. Mallat, "Deep scattering spectrum", IEEE Trans. Signal Process., vol. 62, no. 16, pp. 4114-4128, Aug. 2014.
    • (2014) IEEE Trans. Signal Process. , vol.62 , Issue.16 , pp. 4114-4128
    • Anden, J.1    Mallat, S.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.