SCOPUS 정보 검색 플랫폼

IEEE/ACM Transactions on Audio Speech and Language Processing

Volumn 23, Issue 11, 2015, Pages 2003-2014

A deep generative architecture for postfiltering in statistical parametric speech synthesis

(5) Chen, Ling Hui a,b Raitio, Tuomo c Valentini Botinhao, Cassia d Ling, Zhen Hua a Yamagishi, Junichi d

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b IFLYTEK RESEARCH (China)

c AALTO UNIVERSITY (Finland)

d UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Deep generative architecture; Hidden Markov model (HMM); Modulation spectrum; Postfilter; Segmental quality; Speech synthesis

Indexed keywords

ASSOCIATIVE PROCESSING; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MEMORY ARCHITECTURE; MODULATION; NETWORK ARCHITECTURE; PLASMA DIAGNOSTICS; SPEECH SYNTHESIS; STRAIN MEASUREMENT; SYNTHESIS (CHEMICAL);

BI-DIRECTIONAL ASSOCIATIVE MEMORY; CONDITIONAL PROBABILITIES; HIDDEN MARKOV MODEL (HMM)BASED STATISTICAL PARAMETRIC SPEECH SYNTHESIS; MODULATION SPECTRUM; POSTFILTERS; RESTRICTED BOLTZMANN MACHINE; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; SUBJECTIVE EVALUATIONS;

SPEECH;

EID: 84942607168 PISSN: 23299290 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2015.2461448 Document Type: Article

Times cited : (44)

References (43)

1
- 84910100893
- DNN-based stochastic postfilter for HMM-based speech synthesis
- L.-H. Chen, T. Raitio, C. Valentini-Botinhao, J. Yamagishi, and Z.-H. Ling, "DNN-based stochastic postfilter for HMM-based speech synthesis", in Proc. Interspeech, 2014, pp. 1954-1958.
- (2014) Proc. Interspeech , pp. 1954-1958
- Chen, L.-H.¹ Raitio, T.² Valentini-Botinhao, C.³ Yamagishi, J.⁴ Ling, Z.-H.⁵

2
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis", Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

3
- 33947682675
- The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases
- A. Black and K. Tokuda, "The blizzard challenge 2005: Evaluating corpus-based speech synthesis on common databases", in Proc. Interspeech, 2005.
- (2005) Proc. Interspeech
- Black, A.¹ Tokuda, K.²

4
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- May
- Z.-H. Ling, S.-Y. Kang, H. Zen, A. Senior, M. Schuster, X.-J. Qian, H. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends", IEEE Signal Process. Mag., vol. 32, no. 3, pp. 35-52, May 2015.
- (2015) IEEE Signal Process. Mag. , vol.32 , Issue.3 , pp. 35-52
- Ling, Z.-H.¹ Kang, S.-Y.² Zen, H.³ Senior, A.⁴ Schuster, M.⁵ Qian, X.-J.⁶ Meng, H.⁷ Deng, L.⁸

5
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks", in Proc. ICASSP, 2013, pp. 7962-7966.
- (2013) Proc. ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

6
- 84901237776
- Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Oct.
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis", IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2129-2139, Oct. 2013.
- (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

7
- 84890527090
- Multi-distribution deep belief network for speech synthesis
- Apr.
- S. Kang, X. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis", in Proc. ICASSP, Apr. 2013, pp. 8012-8016.
- (2013) Proc. ICASSP , pp. 8012-8016
- Kang, S.¹ Qian, X.² Meng, H.³

8
- 84890522099
- F0 contour prediction with a deep belief network-Gaussian process hybrid model
- Apr.
- R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "F0 contour prediction with a deep belief network-gaussian process hybrid model", in Proc. ICASSP, Apr. 2013, pp. 6885-6889.
- (2013) Proc. ICASSP , pp. 6885-6889
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

9
- 78049412607
- An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech
- R. Vishnubhotla, S. Fernandez, and B. Ramabhadran, "An autoencoder neural-network based low-dimensionality approach to excitation modeling for HMM-based text-to-speech", in Proc. ICASSP, 2010, pp. 4614-4617.
- (2010) Proc. ICASSP , pp. 4614-4617
- Vishnubhotla, R.¹ Fernandez, S.² Ramabhadran, B.³

10
- 84910068142
- Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks
- R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks", in Proc. Interspeech, 2014, pp. 2268-2272.
- (2014) Proc. Interspeech , pp. 2268-2272
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

11
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Y. Fan, Y. Qian, F. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks", in Proc. Interspeech, 2014, pp. 1964-1968.
- (2014) Proc. Interspeech , pp. 1964-1968
- Fan, Y.¹ Qian, Y.² Xie, F.³ Soong, F.K.⁴

12
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- Apr.
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis", in Proc. ICASSP, Apr. 2015, pp. 4470-4474.
- (2015) Proc. ICASSP , pp. 4470-4474
- Zen, H.¹ Sak, H.²

13
- 67650851754
- USTC system for blizzard challenge 2006an improved HMM-based speech synthesis method
- Pittsburgh, PA, USA, Sep.
- Z. Ling, Y. Wu, Y. Wang, L. Qin, and R. Wang, "USTC system for blizzard challenge 2006an improved HMM-based speech synthesis method", in Proc. Blizzard Challenge Workshop, Pittsburgh, PA, USA, Sep. 2006.
- (2006) Proc. Blizzard Challenge Workshop
- Ling, Z.¹ Wu, Y.² Wang, Y.³ Qin, L.⁴ Wang, R.⁵

14
- 27144515530
- Incorporating a mixed excitation model and postfilter into HMMbased text-to-speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Incorporating a mixed excitation model and postfilter into HMMbased text-to-speech synthesis", Syst. Comput. Jpn., vol. 36, no. 12, pp. 43-50, 2005.
- (2005) Syst. Comput. Jpn. , vol.36 , Issue.12 , pp. 43-50
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

15
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis", IEICE Trans. Inf. Syst., vol. E90-D, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

16
- 84905234422
- A postfilter to modify the modulation spectrum in HMM-based speech synthesis
- May
- S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "A postfilter to modify the modulation spectrum in HMM-based speech synthesis", in Proc. ICASSP, May 2014, pp. 290-294.
- (2014) Proc. ICASSP , pp. 290-294
- Takamichi, S.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

17
- 57749193836
- Voice conversion based on maximumlikelihood estimation of spectral parameter trajectory
- Nov.
- T. Toda, A. Black, and K. Tokuda, "Voice conversion based on maximumlikelihood estimation of spectral parameter trajectory", IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.² Tokuda, K.³

18
- 0031623661
- Spectral voice conversion for text-to-speech synthesis
- A. Kain and M. Macon, "Spectral voice conversion for text-to-speech synthesis", in Proc. ICASSP, 1998, pp. 285-288.
- (1998) Proc. ICASSP , pp. 285-288
- Kain, A.¹ Macon, M.²

19
- 0028996842
- CELP coding based on mel-cepstral analysis
- K. Koishida, K. Tokuda, T. Kobayashi, and S. Imai, "CELP coding based on mel-cepstral analysis", in Proc. ICASSP, 1995, vol. 1, pp. 33-36.
- (1995) Proc. ICASSP , vol.1 , pp. 33-36
- Koishida, K.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

20
- 79959847301
- Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis
- Z.-H. Ling, Y. Hu, and L.-R. Dai, "Global variance modeling on the log power spectrum of LSPs for HMM-based speech synthesis", in Proc. Interspeech, 2010, pp. 825-828.
- (2010) Proc. Interspeech , pp. 825-828
- Ling, Z.-H.¹ Hu, Y.² Dai, L.-R.³

21
- 84876687945
- Speech synthesis based on hidden Markov models
- May
- K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models", Proc. IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
- (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

22
- 85016140477
- An adaptive algorithm for mel-cepstral analysis of speech
- San Francisco, CA, USA, Mar.
- T. Fukada, K. Tokuda, T. Kobayashi, and S. Imai, "An adaptive algorithm for mel-cepstral analysis of speech", in Proc. ICASSP, San Francisco, CA, USA, Mar. 1992, vol. 1, pp. 137-140.
- (1992) Proc. ICASSP , vol.1 , pp. 137-140
- Fukada, T.¹ Tokuda, K.² Kobayashi, T.³ Imai, S.⁴

23
- 77953707533
- Spectral mapping using artificial neural networks for voice conversion
- Jul
- S. Desai, A. Black, B. Yegnanarayana, and K. Prahallad, "Spectral mapping using artificial neural networks for voice conversion", IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 5, pp. 954-964, Jul. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.5 , pp. 954-964
- Desai, S.¹ Black, A.² Yegnanarayana, B.³ Prahallad, K.⁴

24
- 84910104946
- Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes
- L.-H. Chen, Z.-H. Ling, and L.-R. Dai, "Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes", in Proc. Interspeech, 2014, pp. 2313-2317.
- (2014) Proc. Interspeech , pp. 2313-2317
- Chen, L.-H.¹ Ling, Z.-H.² Dai, L.-R.³

25
- 84921735339
- Voice conversion using deep neural networks with layer-wise generative training
- Dec.
- L.-H. Chen, Z.-H. Ling, L.-J. Liu, and L.-R. Dai, "Voice conversion using deep neural networks with layer-wise generative training", IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1859-1872, Dec. 2014.
- (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.22 , Issue.12 , pp. 1859-1872
- Chen, L.-H.¹ Ling, Z.-H.² Liu, L.-J.³ Dai, L.-R.⁴

26
- 0000329993
- Information processing in dynamical systems: Foundations of harmony theory
- D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA, USA: MIT Press, ch. 6
- P. Smolensky, "Information processing in dynamical systems: Foundations of harmony theory", in Parallel distributed processing: Explorations in the microstructure of cognition, D. E. Rumelhart and J. L. McClelland, Eds. Cambridge, MA, USA: MIT Press, 1986, vol. 1, ch. 6, pp. 194-281.
- (1986) Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol.1 , pp. 194-281
- Smolensky, P.¹

27
- 0023861743
- Bidirectional associative memories
- B. Kosko, "Bidirectional associative memories", IEEE Trans. Systems, Man, Cybern., vol. 18, no. 1, pp. 49-60, 1988.
- (1988) IEEE Trans. Systems, Man, Cybern. , vol.18 , Issue.1 , pp. 49-60
- Kosko, B.¹

28
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks", Science, vol. 313, no. 5786, pp. 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

29
- 0013344078
- Training products of experts by minimizing contrastive divergence
- G. Hinton, "Training products of experts by minimizing contrastive divergence", Neural Comput., vol. 12, no. 14, pp. 1711-1800, 2002.
- (2002) Neural Comput. , vol.12 , Issue.14 , pp. 1711-1800
- Hinton, G.¹

30
- 84872506495
- A practical guide to training restricted Boltzmann machines
- NY, USA: Springer
- G. E. Hinton, "A practical guide to training restricted Boltzmann machines", in Neural Networks: Tricks of the Trade. New York, NY, USA: Springer, 2012, pp. 599-619.
- (2012) Neural Networks: Tricks of the Trade. New York , pp. 599-619
- Hinton, G.E.¹

31
- 78651276374
- Ph. D. dissertation, Univ. of Toronto, Toronto, ON, Canada
- R. Salakhutdinov, "Learning deep generative models", Ph. D. dissertation, Univ. of Toronto, Toronto, ON, Canada, 2009.
- (2009) Learning Deep Generative Models
- Salakhutdinov, R.¹

32
- 0020118274
- Neural networks and physical systems with emergent collective computational abilities
- J. J. Hopfield, "Neural networks and physical systems with emergent collective computational abilities", in Proc. Nat. Acad. Sci., 1982, vol. 79, no. 8, pp. 2554-2558.
- (1982) Proc. Nat. Acad. Sci. , vol.79 , Issue.8 , pp. 2554-2558
- Hopfield, J.J.¹

33
- 84905223323
- Using bidirectional associative memories for joint spectral envelope modeling in voice conversion
- May
- L.-J. Liu, L.-H. Chen, Z.-H. Ling, and L.-R. Dai, "Using bidirectional associative memories for joint spectral envelope modeling in voice conversion", in Proc. ICASSP, May 2014, pp. 7884-7888.
- (2014) Proc. ICASSP , pp. 7884-7888
- Liu, L.-J.¹ Chen, L.-H.² Ling, Z.-H.³ Dai, L.-R.⁴

34
- 79959842828
- Binary coding of speech spectrograms using a deep auto-encoder
- Sep.
- L. Deng, M. Seltzer, D. Yu, A. Acero, A. Rahman Mohamed, and G. Hinton, "Binary coding of speech spectrograms using a deep auto-encoder", in Proc. Interspeech, Sep. 2010.
- (2010) Proc. Interspeech
- Deng, L.¹ Seltzer, M.² Yu, D.³ Acero, A.⁴ Rahman Mohamed, A.⁵ Hinton, G.⁶

35
- 69349090197
- Learning deep architectures for AI
- Y. Bengio, "Learning deep architectures for AI", Foundations Trends Mach. Learn., vol. 2, no. 1, pp. 1-127, 2009.
- (2009) Foundations Trends Mach. Learn. , vol.2 , Issue.1 , pp. 1-127
- Bengio, Y.¹

36
- 0242564704
- Products of experts
- G. Hinton, "Products of experts", in Proc. 9th Int. Conf. Artif. Neural Netw., 1999, vol. 1, pp. 825-828.
- (1999) Proc. 9th Int. Conf. Artif. Neural Netw. , vol.1 , pp. 825-828
- Hinton, G.¹

37
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis", in Proc. ICASSP, 2000, vol. 3, pp. 1315-1318.
- (2000) Proc. ICASSP , vol.3 , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

38
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds", Speech Commun., vol. 27, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigné, A.³

39
- 85131821539
- Mel-generalized cepstral analysis-A unified approach to speech spectral estimation
- Yokohama, Japan, Sep.
- K. Tokuda, T. Kobayashi, T. Masuko, and S. Imai, "Mel-generalized cepstral analysis-A unified approach to speech spectral estimation", in Proc. ICSLP, Yokohama, Japan, Sep. 1994, vol. 3, pp. 1043-1046.
- (1994) Proc. ICSLP , vol.3 , pp. 1043-1046
- Tokuda, K.¹ Kobayashi, T.² Masuko, T.³ Imai, S.⁴

40
- 13344250603
- Method for the subjective assessment of intermediate quality level of coding systems
- Geneva, Switzerland, Mar.
- "Method for the subjective assessment of intermediate quality level of coding systems", ITU Rec. ITU-R BS.1534-1, Int. Telecomm. Union Radiocommunication Assembly. Geneva, Switzerland, Mar. 2003.
- (2003) ITU Rec. ITU-R BS.1534-1, Int. Telecomm. Union Radiocommunication Assembly

41
- 0014568991
- IEEE recommended practice for speech quality measurement
- Sep.
- IEEE, "IEEE recommended practice for speech quality measurement", IEEE Trans. Audio Electroacoust., vol. AE-17, no. 3, pp. 225-246, Sep. 1969.
- (1969) IEEE Trans. Audio Electroacoust. , vol.AE-17 , Issue.3 , pp. 225-246
- IEEE¹

42
- 85039149607
- The USTC system for blizzard challenge 2014
- Singapore, Sep.
- L.-H. Chen, Z.-H. Ling, Y.-Q. Zu, R.-Q. Yan, Y. Jiang, X.-J. Xia, and Y. Wang, "The USTC system for blizzard challenge 2014", in Proc. Blizzard Challenge Workshop, Singapore, Sep. 2014.
- (2014) Proc. Blizzard Challenge Workshop
- Chen, L.-H.¹ Ling, Z.-H.² Zu, Y.-Q.³ Yan, R.-Q.⁴ Jiang, Y.⁵ Xia, X.-J.⁶ Wang, Y.⁷

43
- 84896734479
- Deep scattering spectrum
- Aug.
- J. Anden and S. Mallat, "Deep scattering spectrum", IEEE Trans. Signal Process., vol. 62, no. 16, pp. 4114-4128, Aug. 2014.
- (2014) IEEE Trans. Signal Process. , vol.62 , Issue.16 , pp. 4114-4128
- Anden, J.¹ Mallat, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.