SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2017-August, Issue , 2017, Pages 1368-1372

Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system

(4) Juvela, Lauri a Bollepalli, Bajibabu a Yamagishi, Junichi b,c Alku, Paavo a

a AALTO UNIVERSITY (Finland)

b NATIONAL INSTITUTE OF INFORMATICS (Japan)

c UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Excitation modeling; Statistical parametric speech synthesis

Indexed keywords

DEEP NEURAL NETWORKS; ERRORS; INVERSE PROBLEMS; SPEECH; SPEECH SYNTHESIS;

EXCITATION MODELING; EXCITATION MODELS; IMPROVE PERFORMANCE; INVERSE FILTERING; PERFECT RECONSTRUCTION; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; SUBJECTIVE QUALITY; TEXT-TO-SPEECH SYSTEM;

SPEECH COMMUNICATION;

EID: 85039167642 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2017-848 Document Type: Conference Paper

Times cited : (8)

References (33)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 84876687945
- Speech synthesis based on hidden Markov models
- May
- K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models, " Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

3
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- May
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. of ICASSP, May 2013, pp. 7962-7966.
- (2013) Proc. of ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

4
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- May
- Z.-H. Ling, S.-Y. Kang, H. Zen, A. Senior, M. Schuster, X.-J. Qian, H. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " Signal Processing Magazine, IEEE, vol. 32, no. 3, pp. 35-52, May 2015.
- (2015) Signal Processing Magazine, IEEE , vol.32 , Issue.3 , pp. 35-52
- Ling, Z.-H.¹ Kang, S.-Y.² Zen, H.³ Senior, A.⁴ Schuster, M.⁵ Qian, X.-J.⁶ Meng, H.⁷ Deng, L.⁸

5
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks." in Interspeech, 2014, pp. 1964-1968.
- (2014) Interspeech , pp. 1964-1968
- Fan, Y.¹ Qian, Y.² Xie, F.-L.³ Soong, F.K.⁴

6
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- IEEE
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on. IEEE, 2015, pp. 4470-4474.
- (2015) Acoustics, Speech and Signal Processing (ICASSP 2015 IEEE International Conference on , pp. 4470-4474
- Zen, H.¹ Sak, H.²

7
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, " in MAVEBA, 2001.
- (2001) MAVEBA
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

8
- 85023745327
- An autoregressive recurrent mixture density network for parametric speech synthesis
- IEEE
- X. Wang, S. Takaki, and J. Yamagishi, "An autoregressive recurrent mixture density network for parametric speech synthesis, " in Proc. of ICASSP. IEEE, 2017, pp. 4895-4899.
- (2017) Proc. of ICASSP , pp. 4895-4899
- Wang, X.¹ Takaki, S.² Yamagishi, J.³

9
- 85023752230
- Generative adversarial network-based postfilter for statistical parametric speech synthesis
- March
- T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2017, pp. 4910-4914.
- (2017) Proc. of ICASSP , pp. 4910-4914
- Kaneko, T.¹ Kameoka, H.² Hojo, N.³ Ijima, Y.⁴ Hiramatsu, K.⁵ Kashino, K.⁶

10
- 84946077883
- Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis
- April
- K. Tokuda and H. Zen, "Directly modeling speech waveforms by neural networks for statistical parametric speech synthesis, " in Proc. of ICASSP, April 2015, pp. 4215-4219.
- (2015) Proc. of ICASSP , pp. 4215-4219
- Tokuda, K.¹ Zen, H.²

11
- 84973307947
- Directly modeling voiced and unvoiced components in speech waveforms by neural networks
- March
- -, "Directly modeling voiced and unvoiced components in speech waveforms by neural networks, " in Proc. of ICASSP, March 2016, pp. 5640-5644.
- (2016) Proc. of ICASSP , pp. 5640-5644

12
- 84973309345
- A deep auto-encoder based lowdimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis
- March
- S. Takaki and J. Yamagishi, "A deep auto-encoder based lowdimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2016, pp. 5535-5539.
- (2016) Proc. of ICASSP , pp. 5535-5539
- Takaki, S.¹ Yamagishi, J.²

13
- 85039167413
- Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis
- submitted
- S. Takaki, H. Kameoka, and J. Yamagishi, "Direct modeling of frequency spectra and waveform generation based on phase recovery for DNN-based speech synthesis, " in Interspeech (submitted), 2017.
- (2017) Interspeech
- Takaki, S.¹ Kameoka, H.² Yamagishi, J.³

14
- 85011070895
- Pre-print
- A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model for raw audio, " Pre-print, 2016, https://arxiv.org/pdf/1609.03499.pdf.
- (2016) WaveNet: A Generative Model for Raw Audio
- Van den Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

15
- 85088227413
- SampleRNN: An unconditional end-to-end neural audio generation model
- submission
- S. Mehri, K. Kumar, I. Gulrajani, R. Kumar, S. Jain, J. Sotelo, A. C. Courville, and Y. Bengio, "SampleRNN: An unconditional end-to-end neural audio generation model, " in ICLR 2017 (submission), 2017. [Online]. Available: http://arxiv.org/abs/1612.07837
- (2017) ICLR 2017
- Mehri, S.¹ Kumar, K.² Gulrajani, I.³ Kumar, R.⁴ Jain, S.⁵ Sotelo, J.⁶ Courville, A.C.⁷ Bengio, Y.⁸

16
- 85122685393
- Char2Waw: End-to-end speech synthesis
- submission
- J. Sotelo, S. Mehri, K. Kumar, J. F. Santos, K. Kastner, A. Courville, and Y. Bengio, "Char2Waw: End-to-end speech synthesis, " in ICLR 2017 workshop (submission), 2017, https://openreview.net/pdf?id=B1VWyySKx.
- (2017) ICLR 2017 Workshop
- Sotelo, J.¹ Mehri, S.² Kumar, K.³ Santos, J.F.⁴ Kastner, K.⁵ Courville, A.⁶ Bengio, Y.⁷

17
- 85039156048
- Deep voice: Real-time neural text-to-speech
- submission
- S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, and M. Shoeybi, "Deep voice: Real-time neural text-to-speech, " in ICML 2017 (submission), 2017, https://arxiv.org/pdf/1702.07825.pdf.
- (2017) ICML 2017
- Arik, S.O.¹ Chrzanowski, M.² Coates, A.³ Diamos, G.⁴ Gibiansky, A.⁵ Kang, Y.⁶ Li, X.⁷ Miller, J.⁸ Ng, A.⁹ Raiman, J.¹⁰ Sengupta, S.¹¹ Shoeybi, M.¹²

18
- 84973293681
- Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
- Mar
- L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network, " in "Proc. of ICASSP", Mar. 2016, pp. 5120-5124.
- (2016) Proc. of ICASSP , pp. 5120-5124
- Juvela, L.¹ Bollepalli, B.² Airaksinen, M.³ Alku, P.⁴

19
- 84994338062
- GlottDNN-a full-band glottal vocoder for statistical parametric speech synthesis
- M. Airaksinen, B. Bollepalli, L. Juvela, Z. Wu, S. King, and P. Alku, "GlottDNN-a full-band glottal vocoder for statistical parametric speech synthesis, " in Proc. of Interspeech, 2016.
- (2016) Proc. of Interspeech
- Airaksinen, M.¹ Bollepalli, B.² Juvela, L.³ Wu, Z.⁴ King, S.⁵ Alku, P.⁶

20
- 84994228803
- Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks
- L. Juvela, X. Wang, S. Takaki, M. Airaksinen, J. Yamagishi, and P. Alku, "Using text and acoustic features in predicting glottal excitation waveforms for parametric speech synthesis with recurrent neural networks, " in "Proc. of Interspeech", Sep. 2016, pp. 2283-2287.
- (2016) Proc. of Interspeech , pp. 2283-2287
- Juvela, L.¹ Wang, X.² Takaki, S.³ Airaksinen, M.⁴ Yamagishi, J.⁵ Alku, P.⁶

21
- 84994323722
- Flite: A small fast run-time synthesis engine
- A.W. Black and K. A. Lenzo, "Flite: a small fast run-time synthesis engine, " in 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, 2001.
- (2001) 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis
- Black, A.W.¹ Lenzo, K.A.²

22
- 70450180978
- Robust LTS rules with the Combilex speech technology lexicon
- September
- K. Richmond, R. A. Clark, and S. Fitt, "Robust LTS rules with the Combilex speech technology lexicon, " in Proc. of Interspeech, Brighton, September 2009, pp. 1295-1298.
- (2009) Proc. of Interspeech, Brighton , pp. 1295-1298
- Richmond, K.¹ Clark, R.A.² Fitt, S.³

23
- 85133720638
- The HMM-based speech synthesis system version 2.0
- Bonn, Germany August
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. W. Black, and K. Tokuda, "The HMM-based speech synthesis system version 2.0, " in Proc. of ISCA SSW6, Bonn, Germany, August 2007, pp. 294-299.
- (2007) Proc. of ISCA SSW6 , pp. 294-299
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.W.⁶ Tokuda, K.⁷

24
- 84898074254
- Quasi closed phase glottal inverse filtering analysis with weighted linear prediction
- March
- M. Airaksinen, T. Raitio, B. Story, and P. Alku, "Quasi closed phase glottal inverse filtering analysis with weighted linear prediction, " Audio, Speech, and Language Processing, IEEE/ACM Transactions on, vol. 22, no. 3, pp. 596-607, March 2014.
- (2014) Audio, Speech, and Language Processing, IEEE/ACM Transactions on , vol.22 , Issue.3 , pp. 596-607
- Airaksinen, M.¹ Raitio, T.² Story, B.³ Alku, P.⁴

25
- 0001455934
- A robust algorithm for pitch tracking (RAPT)
- D. Talkin, "A robust algorithm for pitch tracking (RAPT), " Speech coding and synthesis, vol. 495, p. 518, 1995.
- (1995) Speech Coding and Synthesis , vol.495 , pp. 518
- Talkin, D.¹

26
- 85039159309
- -, "REAPER: Robust Epoch And Pitch EstimatoR, " https://github.com/google/REAPER, 2015.
- (2015) REAPER: Robust Epoch and Pitch EstimatoR

27
- 77957744515
- HMM-based speech synthesis utilizing glottal inverse filtering
- January
- T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, "HMM-based speech synthesis utilizing glottal inverse filtering, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 153-165, January 2011.
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.1 , pp. 153-165
- Raitio, T.¹ Suni, A.² Yamagishi, J.³ Pulakka, H.⁴ Nurminen, J.⁵ Vainio, M.⁶ Alku, P.⁷

28
- 84865718521
- Comparison of formant enhancement methods for HMM-based speech synthesis
- T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Comparison of formant enhancement methods for HMM-based speech synthesis." in SSW, 2010, pp. 334-339.
- (2010) SSW , pp. 334-339
- Raitio, T.¹ Suni, A.² Pulakka, H.³ Vainio, M.⁴ Alku, P.⁵

29
- 0029254163
- Non-parametric techniques for pitch-scale and time-scale modification of speech
- E. Moulines and J. Laroche, "Non-parametric techniques for pitch-scale and time-scale modification of speech, " Speech communication, vol. 16, no. 2, pp. 175-205, 1995.
- (1995) Speech Communication , vol.16 , Issue.2 , pp. 175-205
- Moulines, E.¹ Laroche, J.²

30
- 85039167524
- ETSI TS 126 090: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions
- "ETSI TS 126 090: Mandatory Speech Codec speech processing functions; Adaptive Multi-Rate (AMR) speech codec; Transcoding functions" , " European Telecommunications Standards Institute, Technical specification, 2016.
- (2016) European Telecommunications Standards Institute, Technical Specification

31
- 84930639546
- Introducing CURRENNT: The Munich opensource CUDA recurrent neural network toolkit
- F. Weninger, "Introducing CURRENNT: The Munich opensource CUDA recurrent neural network toolkit, " Journal of Machine Learning Research, vol. 16, pp. 547-551, 2015.
- (2015) Journal of Machine Learning Research , vol.16 , pp. 547-551
- Weninger, F.¹

32
- 85039147615
- CrowdFlower Inc. (2017) Crowd-sourcing platform. [Online]. Available: https://www.crowdflower.com/
- (2017) Crowd-sourcing Platform

33
- 85012061519
- (2017) EF English proficiency index. [Online]. Available: http://www.ef.com/epi/
- (2017) EF English Proficiency Index

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.