SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2017-August, Issue , 2017, Pages 3394-3398

Generative adversarial network-based glottal waveform model for statistical parametric speech synthesis

(3) Bollepalli, Bajibabu a Juvela, Lauri a Alku, Paavo a

a AALTO UNIVERSITY (Finland)

Author keywords

DNN; GAN; Glottal souce modelling; TTS

Indexed keywords

ADDITIVE NOISE; DEEP NEURAL NETWORKS; SPEECH; SPEECH SYNTHESIS; STOCHASTIC MODELS; STOCHASTIC SYSTEMS; VOCODERS;

ADVERSARIAL NETWORKS; CONDITIONAL AVERAGE; DATA DISTRIBUTION; EXCITATION MODELS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; STOCHASTIC COMPONENT; STOCHASTIC VARIATION; WAVEFORM MODELING;

SPEECH COMMUNICATION;

EID: 85039171110 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2017-1288 Document Type: Conference Paper

Times cited : (47)

References (38)

1
- 84876687945
- Speech synthesis based on hidden Markov models
- May
- K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models, " Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

2
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

3
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- May
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. of ICASSP, May 2013, pp. 7962-7966.
- (2013) Proc. of ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

4
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Y. Fan, Y. Qian, F.-L. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks." in Interspeech, 2014, pp. 1964-1968.
- (2014) Inter Speech , pp. 1964-1968
- Fan, Y.¹ Qian, Y.² Xie, F.-L.³ Soong, F.K.⁴

5
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Proc. of ICASSP. IEEE, 2015, pp. 4470-4474.
- (2015) Proc. of ICASSP. IEEE , pp. 4470-4474
- Zen, H.¹ Sak, H.²

6
- 84978086501
- Improving trajectory modelling for DNNbased speech synthesis by using stacked bottleneck features and minimum generation error training
- Z. Wu and S. King, "Improving trajectory modelling for DNNbased speech synthesis by using stacked bottleneck features and minimum generation error training, " IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 7, pp. 1255-1265, 2016.
- (2016) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.24 , Issue.7 , pp. 1255-1265
- Wu, Z.¹ King, S.²

7
- 85023745327
- An autoregressive recurrent mixture density network for parametric speech synthesis
- X. Wang, S. Takaki, and J. Yamagishi, "An autoregressive recurrent mixture density network for parametric speech synthesis, " in Proc. of ICASSP. IEEE, 2017, pp. 4895-4899.
- (2017) Proc. of ICASSP. IEEE , pp. 4895-4899
- Wang, X.¹ Takaki, S.² Yamagishi, J.³

8
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. De Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigne, A.³

9
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, " in MAVEBA, 2001.
- (2001) MAVEBA
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

10
- 77957744515
- HMM-based speech synthesis utilizing glottal inverse filtering
- January
- T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, "HMM-based speech synthesis utilizing glottal inverse filtering, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 153-165, January 2011.
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.1 , pp. 153-165
- Raitio, T.¹ Suni, A.² Yamagishi, J.³ Pulakka, H.⁴ Nurminen, J.⁵ Vainio, M.⁶ Alku, P.⁷

11
- 84994338062
- GlottDNN-A full-band glottal vocoder for statistical parametric speech synthesis
- M. Airaksinen, B. Bollepalli, L. Juvela, Z. Wu, S. King, and P. Alku, "GlottDNN-a full-band glottal vocoder for statistical parametric speech synthesis, " in Proc. of Interspeech, 2016.
- (2016) Proc. of Interspeech
- Airaksinen, M.¹ Bollepalli, B.² Juvela, L.³ Wu, Z.⁴ King, S.⁵ Alku, P.⁶

12
- 0035127703
- Applying the harmonic plus noise model in concatenative speech synthesis
- Y. Stylianou, "Applying the harmonic plus noise model in concatenative speech synthesis, " IEEE Transactions on Speech and Audio Processing, vol. 9, no. 1, pp. 21-29, 2001.
- (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.1 , pp. 21-29
- Stylianou, Y.¹

13
- 84897865577
- Harmonics plus noise model based vocoder for statistical parametric speech synthesis
- April
- D. Erro, I. Sainz, E. Navas, and I. Hernaez, "Harmonics plus noise model based vocoder for statistical parametric speech synthesis, " IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 2, pp. 184-194, April 2014.
- (2014) IEEE Journal of Selected Topics in Signal Processing , vol.8 , Issue.2 , pp. 184-194
- Erro, D.¹ Sainz, I.² Navas, E.³ Hernaez, I.⁴

14
- 0026881384
- Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
- Eurospeech '91
- P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, " Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992, Eurospeech '91.
- (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 109-118
- Alku, P.¹

15
- 84973293681
- Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
- Mar
- L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network, " in "Proc. of ICASSP", Mar. 2016, pp. 5120-5124.
- (2016) Proc. of ICASSP , pp. 5120-5124
- Juvela, L.¹ Bollepalli, B.² Airaksinen, M.³ Alku, P.⁴

16
- 85017575778
- Glottal vocoding with frequency-warped time-weighted linear prediction
- April
- M. Airaksinen, B. Bollepalli, J. Pohjalainen, and P. Alku, "Glottal vocoding with frequency-warped time-weighted linear prediction, " IEEE Signal Processing Letters, vol. 24, no. 4, pp. 446-450, April 2017.
- (2017) IEEE Signal Processing Letters , vol.24 , Issue.4 , pp. 446-450
- Airaksinen, M.¹ Bollepalli, B.² Pohjalainen, J.³ Alku, P.⁴

17
- 84910068090
- Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
- September
- T. Raitio, A. Suni, L. Juvela, M. Vainio, and P. Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, " in Proc. of Interspeech, Singapore, September 2014, pp. 1969-1973.
- (2014) Proc. of Interspeech, Singapore , pp. 1969-1973
- Raitio, T.¹ Suni, A.² Juvela, L.³ Vainio, M.⁴ Alku, P.⁵

18
- 84937849144
- Generative adversarial nets
- I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, "Generative adversarial nets, " in Advances in neural information processing systems, 2014, pp. 2672-2680.
- (2014) Advances in Neural Information Processing Systems , pp. 2672-2680
- Goodfellow, I.¹ Pouget-Abadie, J.² Mirza, M.³ Xu, B.⁴ Warde-Farley, D.⁵ Ozair, S.⁶ Courville, A.⁷ Bengio, Y.⁸

19
- 85017259342
- WaveNet: A generative model for raw audio
- A. van den Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. Kalchbrenner, A. Senior, and K. Kavukcuoglu, "WaveNet: A generative model for raw audio, " Pre-print, 2016, https://arxiv.org/pdf/1609.03499.pdf.
- (2016) Pre-print
- Van den Oord, A.¹ Dieleman, S.² Zen, H.³ Simonyan, K.⁴ Vinyals, O.⁵ Graves, A.⁶ Kalchbrenner, N.⁷ Senior, A.⁸ Kavukcuoglu, K.⁹

20
- 85039156048
- Deep voice: Real-time neural text-to-speech
- submission
- S. O. Arik, M. Chrzanowski, A. Coates, G. Diamos, A. Gibiansky, Y. Kang, X. Li, J. Miller, A. Ng, J. Raiman, S. Sengupta, and M. Shoeybi, "Deep voice: Real-time neural text-to-speech, " in ICML 2017 (submission), 2017, https://arxiv.org/pdf/1702.07825.pdf.
- (2017) ICML 2017
- Arik, S.O.¹ Chrzanowski, M.² Coates, A.³ Diamos, G.⁴ Gibiansky, A.⁵ Kang, Y.⁶ Li, X.⁷ Miller, J.⁸ Ng, A.⁹ Raiman, J.¹⁰ Sengupta, S.¹¹ Shoeybi, M.¹²

21
- 85039165322
- Generative model-based text-to-speech synthesis
- H. Zen, "Generative model-based text-to-speech synthesis, " 2017, invited talk given at CBMM workshop on speech representation, perception and recognition.
- 2017, Invited Talk Given at CBMM Workshop on Speech Representation, Perception and Recognition
- Zen, H.¹

22
- 84902548165
- Glottal source processing: From analysis to applications
- T. Drugman, P. Alku, A. Alwan, and B. Yegnanarayana, "Glottal source processing: From analysis to applications, " Computer Speech & Language, vol. 28, no. 5, pp. 1117-1138, 2014.
- (2014) Computer Speech & Language , vol.28 , Issue.5 , pp. 1117-1138
- Drugman, T.¹ Alku, P.² Alwan, A.³ Yegnanarayana, B.⁴

23
- 84856294347
- Glottal inverse filtering analysis of human voice production-A review of estimation and parameterization methods of the glottal excitation and their applications. (invited article)
- P. Alku, "Glottal inverse filtering analysis of human voice production-a review of estimation and parameterization methods of the glottal excitation and their applications. (invited article), " Sadhana-Academy Proceedings in Engineering Sciences, vol. 36, no. 5, pp. 623-650, 2011.
- (2011) Sadhana-Academy Proceedings in Engineering Sciences , vol.36 , Issue.5 , pp. 623-650
- Alku, P.¹

24
- 85023752230
- Generative adversarial network-based postfilter for statistical parametric speech synthesis
- March
- T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, and K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis, " in Proc. of ICASSP, March 2017, pp. 4910-4914.
- (2017) Proc. of ICASSP , pp. 4910-4914
- Kaneko, T.¹ Kameoka, H.² Hojo, N.³ Ijima, Y.⁴ Hiramatsu, K.⁵ Kashino, K.⁶

25
- 85023772724
- Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis
- New Orleans, USA
- Y. Saito, S. Takamichi, and H. Saruwatari, "Training algorithm to deceive anti-spoofing verification for dnn-based speech synthesis, " in ICASSP, New Orleans, USA, 2017, pp. 4900-4904.
- (2017) ICASSP , pp. 4900-4904
- Saito, Y.¹ Takamichi, S.² Saruwatari, H.³

26
- 84978298377
- A. Radford, L. Metz, and S. Chintala, "Unsupervised representation learning with deep convolutional generative adversarial networks, " arXiv preprint arXiv:1511.06434, 2015.
- (2015) Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
- Radford, A.¹ Metz, L.² Chintala, S.³

27
- 84987947153
- M. Mirza and S. Osindero, "Conditional generative adversarial nets, " arXiv preprint arXiv:1411.1784, 2014.
- (2014) Conditional Generative Adversarial Nets
- Mirza, M.¹ Osindero, S.²

28
- 85039158168
- S.-W. Fu, Y. Tsao, X. Lu, and H. Kawai, "Raw waveform-based speech enhancement by fully convolutional networks, " arXiv preprint arXiv:1703.02205, 2017.
- (2017) Raw Waveform-based Speech Enhancement by Fully Convolutional Networks
- Fu, S.-W.¹ Tsao, Y.² Lu, X.³ Kawai, H.⁴

29
- 85039149742
- X. Mao, Q. Li, H. Xie, R. Y. Lau, and Z. Wang, "Least squares generative adversarial networks, " arXiv preprint arXiv:1611.04076v2, 2017.
- (2017) Least Squares Generative Adversarial Networks
- Mao, X.¹ Li, Q.² Xie, H.³ Lau, R.Y.⁴ Wang, Z.⁵

30
- 84994323722
- Flite: A small fast run-time synthesis engine
- A.W. Black and K. A. Lenzo, "Flite: a small fast run-time synthesis engine, " in 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis, 2001.
- (2001) 4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis
- Black, A.W.¹ Lenzo, K.A.²

31
- 70450180978
- Robust LTS rules with the Combilex speech technology lexicon
- September
- K. Richmond, R. A. Clark, and S. Fitt, "Robust LTS rules with the Combilex speech technology lexicon, " in Proc. of Interspeech, Brighton, September 2009, pp. 1295-1298.
- (2009) Proc. of Interspeech, Brighton , pp. 1295-1298
- Richmond, K.¹ Clark, R.A.² Fitt, S.³

32
- 85133439657
- An introduction of trajectory model into HMM-based speech synthesis
- H. Zen, K. Tokuda, and T. Kitamura, "An introduction of trajectory model into HMM-based speech synthesis, " in Fifth ISCA Workshop on Speech Synthesis, 2004.
- (2004) Fifth ISCA Workshop on Speech Synthesis
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

33
- 85039167642
- Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system
- L. Juvela, B. Bollepalli, J. Yamagishi, and P. Alku, "Reducing mismatch in training of DNN-based glottal excitation models in a statistical parametric text-to-speech system, " in "Submitted to Interspeech", 2017.
- (2017) Submitted to Interspeech
- Juvela, L.¹ Bollepalli, B.² Yamagishi, J.³ Alku, P.⁴

34
- 84964923476
- S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift, " arXiv preprint arXiv:1502.03167, 2015.
- (2015) Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
- Ioffe, S.¹ Szegedy, C.²

35
- 0029254163
- Non-parametric techniques for pitch-scale and time-scale modification of speech
- E. Moulines and J. Laroche, "Non-parametric techniques for pitch-scale and time-scale modification of speech, " Speech communication, vol. 16, no. 2, pp. 175-205, 1995.
- (1995) Speech Communication , vol.16 , Issue.2 , pp. 175-205
- Moulines, E.¹ Laroche, J.²

36
- 0003450846
- 800, methods for subjective determination of transmission quality
- Recommendation ITUTP, "800, methods for subjective determination of transmission quality, " International Telecommunication Union, 1996.
- (1996) International Telecommunication Union
- Itutp, R.¹

37
- 85039147615
- CrowdFlower Inc
- CrowdFlower Inc. (2017) Crowd-sourcing platform. [Online]. Available: https://www.crowdflower.com/
- (2017) Crowd-sourcing Platform

38
- 85039153956
- (2017) EF English Proficiency Index. [Online]. Available: http://www.ef.com/epi/
- (2017) EF English Proficiency Index. [Online]

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.