SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2016-May, Issue , 2016, Pages 5535-5539

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis

(2) Takaki, Shinji a Yamagishi, Junichi a,b

a NATIONAL INSTITUTE OF INFORMATICS (Japan)

b UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Deep auto encoder; Spectral envelope; Statistical parametric speech synthesis; Vocoder

Indexed keywords

EID: 84973309345 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2016.7472736 Document Type: Conference Paper

Times cited : (37)

References (29)

1
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " Proceedings of ICASSP, pp. 7962-7966, 2013
- (2013) Proceedings of ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

2
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, pp. 2129-2139, 2013
- (2013) Audio, Speech, and Language Processing, IEEE Transactions on , vol.21 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

3
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Y. Fan, Y. Qian, F. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks, " Proceedings of Interspeech, pp. 1964-1968, 2014
- (2014) Proceedings of Interspeech , pp. 1964-1968
- Fan, Y.¹ Qian, Y.² Xie, F.³ Soong, F.K.⁴

4
- 84910068142
- Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks
- R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks, " Proceedings of Interspeech, pp. 2268-2272, 2014
- (2014) Proceedings of Interspeech , pp. 2268-2272
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

5
- 84973331276
- A function-wise pre-training technique for constructing a deep neural network based spectral model in statistical parametric speech synthesis
- S. Takaki, W. Zhenzhou, and J Yamagishi, "A function-wise pre-training technique for constructing a deep neural network based spectral model in statistical parametric speech synthesis, " Machine Learning in Spoken Language Processing (MLSLP), 2015
- (2015) Machine Learning in Spoken Language Processing (MLSLP)
- Takaki, S.¹ Zhenzhou, W.² Yamagishi, J.³

6
- 84910100893
- DNN-based stochastic postfilter for HMM-based speech synthesis
- L.-H. Chen, T. Raitio, C. Valentini-Botinhao, J. Yamagishi, and Z.-H. Ling, "DNN-based stochastic postfilter for HMM-based speech synthesis, " Proceedings of Interspeech, pp. 1954-1958, 2014
- (2014) Proceedings of Interspeech , pp. 1954-1958
- Chen, L.-H.¹ Raitio, T.² Valentini-Botinhao, C.³ Yamagishi, J.⁴ Ling, Z.-H.⁵

7
- 84959098005
- Multiple feed-forward deep neural networks for statistical parametric speech synthesis
- S. Takaki, S.-J. Kim, J. Yamagishi, and j.-J Kim, "Multiple feed-forward deep neural networks for statistical parametric speech synthesis, " Proceedings of Interspeech, pp. 2242-2246, 2015
- (2015) Proceedings of Interspeech , pp. 2242-2246
- Takaki, S.¹ Kim, S.-J.² Yamagishi, J.³ Kim, J.-J.⁴

8
- 84910065702
- Acoustic modeling with deep neural networks using raw time signal for lvcsr
- Z. Tuske, P. Golik, R. Schluter, and H. Ney, "Acoustic modeling with deep neural networks using raw time signal for lvcsr, " Proceedings of Interspeech, pp. 890-894, 2014
- (2014) Proceedings of Interspeech , pp. 890-894
- Tuske, Z.¹ Golik, P.² Schluter, R.³ Ney, H.⁴

9
- 84973386429
- Convolutional neural networks-based continuous speech recognition using raw speech signal2
- D. Palaz, M. Magimai.-Doss, and Collobert R., "Convolutional neural networks-based continuous speech recognition using raw speech signal2, journal =, "
- Journal
- Palaz, D.¹ Magimai-Doss, M.² Collobert, R.³

10
- 84959168440
- Learning the speech front-end with raw waveform cldnns
- T. N. Sainath, R. J. Weiss, A. Senior, K. W. Wilson, and O. Vinyals, "Learning the speech front-end with raw waveform cldnns, " Proceedings of Interspeech, pp. 1-5, 2015
- (2015) Proceedings of Interspeech , pp. 1-5
- Sainath, T.N.¹ Weiss, R.J.² Senior, A.³ Wilson, K.W.⁴ Vinyals, O.⁵

11
- 0032673049
- Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Communication, vol. 27, pp. 187-207, 1999
- (1999) Speech Communication , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.³

12
- 85010837662
- An attempt to develop a singing synthesizer by collaborative creation
- M. Morise, "An attempt to develop a singing synthesizer by collaborative creation, " the Stockholm Music Acoustics Conference 2013 (SMAC2013), pp. 289-292, 2015
- (2015) The Stockholm Music Acoustics Conference 2013 (SMAC2013) , pp. 289-292
- Morise, M.¹

13
- 84908519225
- Cheaptrick, a spectral envelope estimator for high-quality speech synthesis
- M. Morise, "Cheaptrick, a spectral envelope estimator for high-quality speech synthesis, " Speech Communication, vol. 67, pp. 1-7, 2015
- (2015) Speech Communication , vol.67 , pp. 1-7
- Morise, M.¹

14
- 84937621994
- Error evaluation of an f0-adaptive spectral envelope estimator in robustness against the additive noise and f0 error
- M. Morise, "Error evaluation of an f0-adaptive spectral envelope estimator in robustness against the additive noise and f0 error, " IEICE transactions on information and systems, vol. E98-D, no. 7, pp. 1405-1408, 2015
- (2015) IEICE Transactions on Information and Systems , vol.E98-D , Issue.7 , pp. 1405-1408
- Morise, M.¹

15
- 84867593213
- Autoencoder bottleneck features using deep belief networks
- T. N. Sainath, B. Kingsbury, and B. Ramabhadran, "Autoencoder bottleneck features using deep belief networks, " Proceedings of ICASSP, pp. 4153-4156, 2012
- (2012) Proceedings of ICASSP , pp. 4153-4156
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³

16
- 84890482429
- Extracting deep bottleneck features using stacked auto-encoders
- J. Gehring, Y. Miao, F. Metze, and A. Waibel, "Extracting deep bottleneck features using stacked auto-encoders, " Proceedings of ICASSP, pp. 3377-3381, 2013
- (2013) Proceedings of ICASSP , pp. 3377-3381
- Gehring, J.¹ Miao, Y.² Metze, F.³ Waibel, A.⁴

17
- 84878409063
- Recurrent neural networks for noise reduction in robust ASR
- A. L. Maas, Q. V. Le, T. M. ONeil, O. Vinyals, P. Nguyen, and A. Ng Andrew, "Recurrent neural networks for noise reduction in robust ASR, " Proceedings of Interspeech, pp. 22-25, 2012
- (2012) Proceedings of Interspeech , pp. 22-25
- Maas, A.L.¹ Le, Q.V.² O'Neil, T.M.³ Vinyals, O.⁴ Nguyen, P.⁵ Ng Andrew, A.⁶

18
- 84906237188
- Reverberant speech recognition based on denoising autoencoder
- T. Ishii, H. Komiyama, T. Shinozaki, Y. Horiuchi, and S Kuroiwa, "Reverberant speech recognition based on denoising autoencoder, " Proceedings of Interspeech, pp. 3512-3516, 2013
- (2013) Proceedings of Interspeech , pp. 3512-3516
- Ishii, T.¹ Komiyama, H.² Shinozaki, T.³ Horiuchi, Y.⁴ Kuroiwa, S.⁵

19
- 84905259759
- Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition
- X. Feng, Y. Zhang, and J. Glass, "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition, " Proceedings of ICASSP, pp. 1778-1782, 2014
- (2014) Proceedings of ICASSP , pp. 1778-1782
- Feng, X.¹ Zhang, Y.² Glass, J.³

20
- 79959842828
- Binary coding of speech spectrograms using a deep auto-encoder
- L. Deng, M. Seltzer1, D. Yu, A. Acero, A. Mohamed, and G. Hinton, "Binary coding of speech spectrograms using a deep auto-encoder, " Proceedings of Interspeech, pp. 1692-1695, 2010
- (2010) Proceedings of Interspeech , pp. 1692-1695
- Deng, L.¹ Seltzer Yu, D.M.² Acero, A.³ Mohamed, A.⁴ Hinton, G.⁵

21
- 84906262433
- Speech enhancement based on deep denoising autoencoder
- X. Lu, Y. Tsao, S. Matsuda1, and C. Hori, "Speech enhancement based on deep denoising autoencoder, " Proceedings of Interspeech, pp. 436-440, 2013
- (2013) Proceedings of Interspeech , pp. 436-440
- Lu, X.¹ Tsao, Y.² Matsuda, S.³ Hori, C.⁴

22
- 78049412607
- An autoencoder neural-network based low-dimensionality approach to excitation modeling for hmm-based text-to-speech
- R. Vishnubhotla, S. Fernandez and B. Ramabhadran, "An autoencoder neural-network based low-dimensionality approach to excitation modeling for hmm-based text-to-speech, " Proceedings of ICASSP, pp. 4614-4617, 2010
- (2010) Proceedings of ICASSP , pp. 4614-4617
- Vishnubhotla, R.¹ Fernandez, S.² Ramabhadran, B.³

23
- 84910068090
- Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
- T. Raitio, A. Suni, L. Juvela, M. Vainio, and P. Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, " Proceedings of Interspeech, pp. 1969-1973, 2014
- (2014) Proceedings of Interspeech , pp. 1969-1973
- Raitio, T.¹ Suni, A.² Juvela, L.³ Vainio, M.⁴ Alku, P.⁵

24
- 84973366354
- A deep learning approach to data-driven parameterizations for statistical parametric speech synthesis
- abs/1409. 8558
- P. K. Muthukumar and Black. A., "A deep learning approach to data-driven parameterizations for statistical parametric speech synthesis, " CoRR, vol. Abs/1409. 8558, 2014
- (2014) CoRR
- Muthukumar, P.K.¹ Black, A.²

25
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks, " Science 28, vol. 313, no. 5786, pp. 504-507, 2006
- (2006) Science 28 , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.²

26
- 84878419996
- S King and V. Karaiskos, "The blizzard challenge 2011, " 2011
- (2011) The Blizzard Challenge 2011
- King, S.¹ Karaiskos, V.²

27
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, pp. 1039-1064, 2009
- (2009) Speech Communication , vol.51 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

28
- 33745200051
- Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "Speech parameter generation algorithm considering global variance for HMM-based speech synthesis, " Proceedings of Interspeech 2005, pp. 2801-2804, 2005
- (2005) Proceedings of Interspeech 2005 , pp. 2801-2804
- Toda, T.¹ Tokuda, K.²

29
- 79959836077
- On generating combilex pronunciations via morphological analysis
- K. Richmond, R. Clark, and S. Fitt, "On generating combilex pronunciations via morphological analysis, " Proceedings of Interspeech, pp. 1974-1977, 2010.
- (2010) Proceedings of Interspeech , pp. 1974-1977
- Richmond, K.¹ Clark, R.² Fitt, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.