메뉴 건너뛰기




Volumn 2016-May, Issue , 2016, Pages 5535-5539

A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis

Author keywords

Deep auto encoder; Spectral envelope; Statistical parametric speech synthesis; Vocoder

Indexed keywords


EID: 84973309345     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2016.7472736     Document Type: Conference Paper
Times cited : (37)

References (29)
  • 1
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " Proceedings of ICASSP, pp. 7962-7966, 2013
    • (2013) Proceedings of ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 2
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
    • Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, pp. 2129-2139, 2013
    • (2013) Audio, Speech, and Language Processing, IEEE Transactions on , vol.21 , pp. 2129-2139
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 3
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Y. Fan, Y. Qian, F. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks, " Proceedings of Interspeech, pp. 1964-1968, 2014
    • (2014) Proceedings of Interspeech , pp. 1964-1968
    • Fan, Y.1    Qian, Y.2    Xie, F.3    Soong, F.K.4
  • 4
    • 84910068142 scopus 로고    scopus 로고
    • Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks
    • R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks, " Proceedings of Interspeech, pp. 2268-2272, 2014
    • (2014) Proceedings of Interspeech , pp. 2268-2272
    • Fernandez, R.1    Rendel, A.2    Ramabhadran, B.3    Hoory, R.4
  • 5
    • 84973331276 scopus 로고    scopus 로고
    • A function-wise pre-training technique for constructing a deep neural network based spectral model in statistical parametric speech synthesis
    • S. Takaki, W. Zhenzhou, and J Yamagishi, "A function-wise pre-training technique for constructing a deep neural network based spectral model in statistical parametric speech synthesis, " Machine Learning in Spoken Language Processing (MLSLP), 2015
    • (2015) Machine Learning in Spoken Language Processing (MLSLP)
    • Takaki, S.1    Zhenzhou, W.2    Yamagishi, J.3
  • 7
    • 84959098005 scopus 로고    scopus 로고
    • Multiple feed-forward deep neural networks for statistical parametric speech synthesis
    • S. Takaki, S.-J. Kim, J. Yamagishi, and j.-J Kim, "Multiple feed-forward deep neural networks for statistical parametric speech synthesis, " Proceedings of Interspeech, pp. 2242-2246, 2015
    • (2015) Proceedings of Interspeech , pp. 2242-2246
    • Takaki, S.1    Kim, S.-J.2    Yamagishi, J.3    Kim, J.-J.4
  • 8
    • 84910065702 scopus 로고    scopus 로고
    • Acoustic modeling with deep neural networks using raw time signal for lvcsr
    • Z. Tuske, P. Golik, R. Schluter, and H. Ney, "Acoustic modeling with deep neural networks using raw time signal for lvcsr, " Proceedings of Interspeech, pp. 890-894, 2014
    • (2014) Proceedings of Interspeech , pp. 890-894
    • Tuske, Z.1    Golik, P.2    Schluter, R.3    Ney, H.4
  • 9
    • 84973386429 scopus 로고    scopus 로고
    • Convolutional neural networks-based continuous speech recognition using raw speech signal2
    • D. Palaz, M. Magimai.-Doss, and Collobert R., "Convolutional neural networks-based continuous speech recognition using raw speech signal2, journal =, "
    • Journal
    • Palaz, D.1    Magimai-Doss, M.2    Collobert, R.3
  • 11
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. Cheveigne, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Communication, vol. 27, pp. 187-207, 1999
    • (1999) Speech Communication , vol.27 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    Cheveigne, A.3
  • 13
    • 84908519225 scopus 로고    scopus 로고
    • Cheaptrick, a spectral envelope estimator for high-quality speech synthesis
    • M. Morise, "Cheaptrick, a spectral envelope estimator for high-quality speech synthesis, " Speech Communication, vol. 67, pp. 1-7, 2015
    • (2015) Speech Communication , vol.67 , pp. 1-7
    • Morise, M.1
  • 14
    • 84937621994 scopus 로고    scopus 로고
    • Error evaluation of an f0-adaptive spectral envelope estimator in robustness against the additive noise and f0 error
    • M. Morise, "Error evaluation of an f0-adaptive spectral envelope estimator in robustness against the additive noise and f0 error, " IEICE transactions on information and systems, vol. E98-D, no. 7, pp. 1405-1408, 2015
    • (2015) IEICE Transactions on Information and Systems , vol.E98-D , Issue.7 , pp. 1405-1408
    • Morise, M.1
  • 15
    • 84867593213 scopus 로고    scopus 로고
    • Autoencoder bottleneck features using deep belief networks
    • T. N. Sainath, B. Kingsbury, and B. Ramabhadran, "Autoencoder bottleneck features using deep belief networks, " Proceedings of ICASSP, pp. 4153-4156, 2012
    • (2012) Proceedings of ICASSP , pp. 4153-4156
    • Sainath, T.N.1    Kingsbury, B.2    Ramabhadran, B.3
  • 16
    • 84890482429 scopus 로고    scopus 로고
    • Extracting deep bottleneck features using stacked auto-encoders
    • J. Gehring, Y. Miao, F. Metze, and A. Waibel, "Extracting deep bottleneck features using stacked auto-encoders, " Proceedings of ICASSP, pp. 3377-3381, 2013
    • (2013) Proceedings of ICASSP , pp. 3377-3381
    • Gehring, J.1    Miao, Y.2    Metze, F.3    Waibel, A.4
  • 19
    • 84905259759 scopus 로고    scopus 로고
    • Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition
    • X. Feng, Y. Zhang, and J. Glass, "Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition, " Proceedings of ICASSP, pp. 1778-1782, 2014
    • (2014) Proceedings of ICASSP , pp. 1778-1782
    • Feng, X.1    Zhang, Y.2    Glass, J.3
  • 21
    • 84906262433 scopus 로고    scopus 로고
    • Speech enhancement based on deep denoising autoencoder
    • X. Lu, Y. Tsao, S. Matsuda1, and C. Hori, "Speech enhancement based on deep denoising autoencoder, " Proceedings of Interspeech, pp. 436-440, 2013
    • (2013) Proceedings of Interspeech , pp. 436-440
    • Lu, X.1    Tsao, Y.2    Matsuda, S.3    Hori, C.4
  • 22
    • 78049412607 scopus 로고    scopus 로고
    • An autoencoder neural-network based low-dimensionality approach to excitation modeling for hmm-based text-to-speech
    • R. Vishnubhotla, S. Fernandez and B. Ramabhadran, "An autoencoder neural-network based low-dimensionality approach to excitation modeling for hmm-based text-to-speech, " Proceedings of ICASSP, pp. 4614-4617, 2010
    • (2010) Proceedings of ICASSP , pp. 4614-4617
    • Vishnubhotla, R.1    Fernandez, S.2    Ramabhadran, B.3
  • 23
    • 84910068090 scopus 로고    scopus 로고
    • Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
    • T. Raitio, A. Suni, L. Juvela, M. Vainio, and P. Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, " Proceedings of Interspeech, pp. 1969-1973, 2014
    • (2014) Proceedings of Interspeech , pp. 1969-1973
    • Raitio, T.1    Suni, A.2    Juvela, L.3    Vainio, M.4    Alku, P.5
  • 24
    • 84973366354 scopus 로고    scopus 로고
    • A deep learning approach to data-driven parameterizations for statistical parametric speech synthesis
    • abs/1409. 8558
    • P. K. Muthukumar and Black. A., "A deep learning approach to data-driven parameterizations for statistical parametric speech synthesis, " CoRR, vol. Abs/1409. 8558, 2014
    • (2014) CoRR
    • Muthukumar, P.K.1    Black, A.2
  • 25
    • 33746600649 scopus 로고    scopus 로고
    • Reducing the dimensionality of data with neural networks
    • G. E. Hinton and R. Salakhutdinov, "Reducing the dimensionality of data with neural networks, " Science 28, vol. 313, no. 5786, pp. 504-507, 2006
    • (2006) Science 28 , vol.313 , Issue.5786 , pp. 504-507
    • Hinton, G.E.1    Salakhutdinov, R.2
  • 27
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, pp. 1039-1064, 2009
    • (2009) Speech Communication , vol.51 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 28
    • 33745200051 scopus 로고    scopus 로고
    • Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "Speech parameter generation algorithm considering global variance for HMM-based speech synthesis, " Proceedings of Interspeech 2005, pp. 2801-2804, 2005
    • (2005) Proceedings of Interspeech 2005 , pp. 2801-2804
    • Toda, T.1    Tokuda, K.2
  • 29
    • 79959836077 scopus 로고    scopus 로고
    • On generating combilex pronunciations via morphological analysis
    • K. Richmond, R. Clark, and S. Fitt, "On generating combilex pronunciations via morphological analysis, " Proceedings of Interspeech, pp. 1974-1977, 2010.
    • (2010) Proceedings of Interspeech , pp. 1974-1977
    • Richmond, K.1    Clark, R.2    Fitt, S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.