SCOPUS 정보 검색 플랫폼

IEEE/ACM Transactions on Audio Speech and Language Processing

Volumn 26, Issue 1, 2018, Pages 84-96

Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks

(3) Saito, Yuki a Takamichi, Shinnosuke a Saruwatari, Hiroshi a

Author keywords

deep neural networks; generative adversarial networks; over smoothing; Statistical parametric speech synthesis; text tospeech synthesis; voice conversion

Indexed keywords

DEEP NEURAL NETWORKS; SPEECH PROCESSING; SPEECH SYNTHESIS;

ADVERSARIAL NETWORKS; MINIMUM GENERATION ERRORS; OVER-SMOOTHING; QUALITY DEGRADATION; SPECTRAL PARAMETERS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; TRAINING ALGORITHMS; VOICE CONVERSION;

SPEECH;

EID: 85031781820 PISSN: 23299290 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2017.2761547 Document Type: Article

Times cited : (214)

References (53)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, A. Black, "Statistical parametric speech synthesis, " Speech Communi., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Communi. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

2
- 0023756465
- Speech synthesis by rule using an optimal selection of non-uniform synthesis units
- New York, NY, USA Apr.
- Y. Sagisaka, "Speech synthesis by rule using an optimal selection of non-uniform synthesis units, " in Proc. Int. Conf. Acoust., Speech, Signal Process., New York, NY, USA, Apr. 1988, pp. 679-682.
- (1988) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 679-682
- Sagisaka, Y.¹

3
- 0032026483
- Continuous probabilistic transform for voice conversion
- Mar.
- Y. Stylianou, O. Cappé, E. Moulines, "Continuous probabilistic transform for voice conversion, " IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1988.
- (1988) IEEE Trans. Speech Audio Process. , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappé, O.² Moulines, E.³

4
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- May
- Z.-H. Ling, et al., "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " IEEE Signal Process. Mag., vol. 32, no. 3, pp. 35-52, May 2015.
- (2015) IEEE Signal Process. Mag. , vol.32 , Issue.3 , pp. 35-52
- Ling, Z.-H.¹

5
- 84876687945
- Speech synthesis based on hiddenMarkovmodels
- Apr.
- K. Tokuda, Y Nankaku, T. Toda, H. Zen, J. Yamagishi, K. Oura, "Speech synthesis based on hiddenMarkovmodels, " Proc. IEEE, vol. 101, no. 5, pp. 1234-1252, Apr. 2013.
- (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

6
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- Nov.
- T. Toda, A. W. Black, K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

7
- 33846429403
- Minimum generation error training for HMM based speech synthesis
- Toulouse, France May
- Y. J. Wu and R. H. Wang, "Minimum generation error training for HMMbased speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Toulouse, France, May 2006, pp. 89-92.
- (2006) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 89-92
- Wu, Y.J.¹ Wang, R.H.²

8
- 84978086501
- Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training
- Jul.
- Z. Wu and S. King, "Improving trajectory modeling for DNN-based speech synthesis by using stacked bottleneck features and minimum trajectory error training, " IEEE Trans. Audio, Speech, Lang. Process., vol. 24, no. 7, pp. 1255-1265, Jul. 2016.
- (2016) IEEE Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.7 , pp. 1255-1265
- Wu, Z.¹ King, S.²

9
- 84994361374
- The voice conversion challenge 2016
- San Francisco, CA, USA, Sep.
- T. Toda, et al., "The voice conversion challenge 2016, " in Proc. INTERSPEECH, San Francisco, CA, USA, Sep. 2016, pp. 1632-1636.
- (2016) Proc. INTERSPEECH , pp. 1632-1636
- Toda, T.¹

10
- 84994234512
- Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis
- San Francisco, CA, USA, Sep.
- Y. Ijima, T. Asami, H. Mizuno, "Objective evaluation using association between dimensions within spectral features for statistical parametric speech synthesis, " in Proc. INTERSPEECH, San Francisco, CA, USA, Sep. 2016, pp. 337-341.
- (2016) Proc. INTERSPEECH , pp. 337-341
- Ijima, Y.¹ Asami, T.² Mizuno, H.³

11
- 84878387899
- Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP
- Portland, OR, USA, Sep.
- Y. Ohtani, M. Tamura, M. Morita, T. Kagoshima, M. Akamine, "Histogram-based spectral equalization for HMM-based speech synthesis using mel-LSP, " in Proc. INTERSPEECH, Portland, OR, USA, Sep. 2012, pp. 1155-1158.
- (2012) Proc. INTERSPEECH , pp. 1155-1158
- Ohtani, Y.¹ Tamura, M.² Morita, M.³ Kagoshima, T.⁴ Akamine, M.⁵

12
- 84962834006
- Postfilters to modify the modulation spectrum for statistical parametric speech synthesis
- Apr.
- S. Takamichi, T. Toda, A. W. Black, G. Neubig, S. Sakti, S. Nakamura, "Postfilters to modify the modulation spectrum for statistical parametric speech synthesis, " IEEE Trans. Audio, Speech, Lang. Process., vol. 24, no. 4, pp. 755-767, Apr. 2016.
- (2016) IEEE Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.4 , pp. 755-767
- Takamichi, S.¹ Toda, T.² Black, A.W.³ Neubig, G.⁴ Sakti, S.⁵ Nakamura, S.⁶

13
- 84946033919
- Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion
- Brisbane, QLD, Australia, Apr.
- S. Takamichi, T. Toda, A. W. Black, S. Nakamura, "Modulation spectrum-constrained trajectory training algorithm for GMM-based voice conversion, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Brisbane, QLD, Australia, Apr. 2015, pp. 4859-4863.
- (2015) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 4859-4863
- Takamichi, S.¹ Toda, T.² Black, A.W.³ Nakamura, S.⁴

14
- 84973375140
- Trajectory training considering global variance for speech synthesis based on neural networks
- Shanghai, China, Mar.
- K. Hashimoto, K. Oura, Y. Nankaku, K. Tokuda, "Trajectory training considering global variance for speech synthesis based on neural networks, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Shanghai, China, Mar. 2016, pp. 5600-5604.
- (2016) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 5600-5604
- Hashimoto, K.¹ Oura, K.² Nankaku, Y.³ Tokuda, K.⁴

15
- 84910088495
- Analysis of spectral enhancement using global variance in HMM-based speech synthesis
- MAX Atria, Singapore, May
- T. Nose and A. Ito, "Analysis of spectral enhancement using global variance in HMM-based speech synthesis, " in Proc. INTERSPEECH, MAX Atria, Singapore, May 2014, pp. 2917-2921.
- (2014) Proc. INTERSPEECH , pp. 2917-2921
- Nose, T.¹ Ito, A.²

16
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- Vancouver, BC, Canada, May
- H. Zen, A. Senior, M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. Int. Conf. Acoust., Speech, Signal Process, Vancouver, BC, Canada, May 2013, pp. 7962-7966.
- (2013) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

17
- 84937849144
- Generative adversarial nets
- I. Goodfellow, et al., "Generative adversarial nets, " in Proc. Int. Conf. Neural Inf. Process. Syst., 2014, pp. 2672-2680.
- (2014) Proc. Int. Conf. Neural Inf. Process. Syst. , pp. 2672-2680
- Goodfellow, I.¹

18
- 84962901047
- Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, human performance
- Apr.
- Z. Wu, et al., "Anti-spoofing for text-independent speaker verification: An initial database, comparison of countermeasures, human performance, " IEEE Trans. Audio, Speech, Lang. Process., vol. 24, no. 4, pp. 768-783, Apr. 2016.
- (2016) IEEE Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.4 , pp. 768-783
- Wu, Z.¹

19
- 84959178048
- Robust deep feature for spoofing detection the SJTU system for ASVspoof 2015 Challenge
- Dresden, Germany, Sep.
- N. Chen, Y. Qian, H. Dinkel, B. Chen, K. Yu, "Robust deep feature for spoofing detection the SJTU system for ASVspoof 2015 Challenge, " in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2097-2101.
- (2015) Proc. INTERSPEECH , pp. 2097-2101
- Chen, N.¹ Qian, Y.² Dinkel, H.³ Chen, B.⁴ Yu, K.⁵

20
- 85008023596
- Continuous F0 modeling for HMM based statistical parametric speech synthesis
- Jul.
- K. Yu and S. Young, "Continuous F0 modeling for HMM based statistical parametric speech synthesis, " IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1071-1079, Jul. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process. , vol.19 , Issue.5 , pp. 1071-1079
- Yu, K.¹ Young, S.²

21
- 85064210846
- Durational variability in speech and the rhythm class hypothesis
- Berlin, Germany: Mouton de Gruyter
- E. Grabe and E. L. Low, "Durational variability in speech and the rhythm class hypothesis, " in Papers in Laboratory Phonology 7. Berlin, Germany: Mouton de Gruyter, 2002, pp. 515-546.
- (2002) Papers in Laboratory Phonology , vol.7 , pp. 515-546
- Grabe, E.¹ Low, E.L.²

22
- 85018914753
- F-GAN: Training generative neural samplers using variational divergence minimization
- Dec.
- S. Nowozin, B. Cseke, R. Tomioka, "f-GAN: Training generative neural samplers using variational divergence minimization, " in Proc. Int. Conf. Neural Inf. Process. Syst., Dec. 2016, pp. 271-279.
- (2016) Proc. Int. Conf. Neural Inf. Process. Syst. , pp. 271-279
- Nowozin, S.¹ Cseke, B.² Tomioka, R.³

23
- 85035363407
- Wasserstein GAN
- M. Arjovsky, S. Chintala, L. Bottou, "Wasserstein GAN, " in Proc. 34th Int. Conf. Mach. Learn., PMLR 70, 2017, pp. 214-223.
- (2017) Proc. 34th Int. Conf. Mach. Learn., PMLR , vol.70 , pp. 214-223
- Arjovsky, M.¹ Chintala, S.² Bottou, L.³

24
- 85041908569
- Least squares generative adversarial networks
- X. Mao, Q. Li, H. Xie, R. Y. K. Lau, Z. Wang, S. P. Smolley, "Least squares generative adversarial networks, " IEEE Int. Conf. Comput. Vision (ICCV), 2017, pp. 2794-2802.
- (2017) IEEE Int. Conf. Comput. Vision (ICCV) , pp. 2794-2802
- Mao, X.¹ Li, Q.² Xie, H.³ Lau, R.Y.K.⁴ Wang, Z.⁵ Smolley, S.P.⁶

25
- 0001093042
- Algorithms for non-negative matrix factorization
- D. D. Lee and H. S. Seung, "Algorithms for non-negative matrix factorization, " in Proc. Int. Conf. Neural Inf. Process. Syst., 2000, pp. 556-562.
- (2000) Proc. Int. Conf. Neural Inf. Process. Syst. , pp. 556-562
- Lee, D.D.¹ Seung, H.S.²

26
- 33847655586
- A generalized divergence measure for nonnegative matrix factorization
- Mar.
- R. Kompass, "A generalized divergence measure for nonnegative matrix factorization, " Neural Comput., vol. 19, no. 3, pp. 780-891, Mar. 2007.
- (2007) Neural Comput. , vol.19 , Issue.3 , pp. 780-891
- Kompass, R.¹

27
- 33748099812
- Information theory and statistics: A tutorial
- I. Csiszár and P. C. Shields, "Information theory and statistics: A tutorial, " Found. Trends Commun. Inf. Theory, vol. 1, no. 4, pp. 417-518, 2004.
- (2004) Found. Trends Commun. Inf. Theory , vol.1 , Issue.4 , pp. 417-518
- Csiszár, I.¹ Shields, P.C.²

28
- 37349070894
- New York NY USA: Springer-Verlag
- Cédric Vilani, Optimal Transport: Old and New. New York, NY, USA: Springer-Verlag, 2009.
- (2009) Optimal Transport: Old and New
- Vilani, C.¹

29
- 84959090360
- Multi-task learning deep neural networks for speech feature denoising
- Dresden, Germany, Sep.
- B. Huang, D. Ke, H. Zheng, B. Xu, Y. Xu, K. Su, "Multi-task learning deep neural networks for speech feature denoising, " in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2464-2468.
- (2015) Proc. INTERSPEECH , pp. 2464-2468
- Huang, B.¹ Ke, D.² Zheng, H.³ Xu, B.⁴ Xu, Y.⁵ Su, K.⁶

30
- 84998636515
- Generative adversarial text-to-image synthesis
- S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, H. Lee, "Generative adversarial text-to-image synthesis, " in Proc. Int. Conf. Mach. Learn., 2016, pp. 1060-1069.
- (2016) Proc. Int. Conf. Mach. Learn. , pp. 1060-1069
- Reed, S.¹ Akata, Z.² Yan, X.³ Logeswaran, L.⁴ Schiele, B.⁵ Lee, H.⁶

31
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- Brisbane, QLD, Australia, Apr.
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Brisbane, QLD, Australia, Apr. 2015, pp. 4470-4474.
- (2015) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 4470-4474
- Zen, H.¹ Sak, H.²

32
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks, " Science, vol. 313, no. 5786, pp. 504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

33
- 84984985889
- Why should I trust you': Explaining the predictions of any classifier
- San Francisco, CA, USA, Aug.
- M. Tulio Ribeiro, S. Singh, C. Guestrin, "Why should I trust you': Explaining the predictions of any classifier, " in Proc. 22nd ACMSIGKDD Int. Conf. Knowl. Discovery Data Mining, San Francisco, CA, USA, Aug. 2016, pp. 1135-1164.
- (2016) Proc. 22nd ACMSIGKDD Int. Conf. Knowl. Discovery Data Mining , pp. 1135-1164
- Tulio Ribeiro, M.¹ Singh, S.² Guestrin, C.³

34
- 84970016114
- Generative moment matching networks
- Y. Li, K. Swersky, R. Zemel, "Generative moment matching networks, " in Proc. 32nd Int. Conf. Mach. Learn., 2015, pp. 1718-1727.
- (2015) Proc. 32nd Int. Conf. Mach. Learn. , pp. 1718-1727
- Li, Y.¹ Swersky, K.² Zemel, R.³

35
- 85019013147
- arXiv:1701. 00160
- I. Goodfellow, "NIPS 2016 tutorial: Generative adversarial networks, " arXiv:1701. 00160, 2017.
- (2017) NIPS 2016 Tutorial: Generative Adversarial Networks
- Goodfellow, I.¹

36
- 83755163018
- Detecting novel associations in large data sets
- D. N. Reshef, et al., "Detecting novel associations in large data sets, " Science, vol. 334, no. 6062, pp. 1518-1524, 2011.
- (2011) Science , vol.334 , Issue.6062 , pp. 1518-1524
- Reshef, D.N.¹

37
- 84862294866
- Deep sparse rectifier neural networks
- Lauderdale, FL, USA, Apr.
- X. Glorot, A. Bordes, Y. Bengio, "Deep sparse rectifier neural networks, " in Proc. 14th Int. Conf. Artif. Intell. Statist., Lauderdale, FL, USA, Apr. 2011, pp. 315-323.
- (2011) Proc. 14th Int. Conf. Artif. Intell. Statist. , pp. 315-323
- Glorot, X.¹ Bordes, A.² Bengio, Y.³

38
- 84901764355
- A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation
- Jun.
- K. Tanaka, T. Toda, G. Neubig, S. Sakti, S. Nakamura, "A hybrid approach to electrolaryngeal speech enhancement based on noise reduction and statistical excitation generation, " IEICE Trans. Inf. Syst., vol. E97-D, no. 6, pp. 1429-1437, Jun. 2014.
- (2014) IEICE Trans. Inf. Syst. , vol.E97-D , Issue.6 , pp. 1429-1437
- Tanaka, K.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

39
- 84910030421
- Statistical parametric speech synthesis using weighted multi-distribution deep belief network
- Max Atria, Singapore, Sep.
- S. Kang and H. Meng, "Statistical parametric speech synthesis using weighted multi-distribution deep belief network, " in Proc. INTERSPEECH, Max Atria, Singapore, Sep. 2014, pp. 1959-1963.
- (2014) Proc. INTERSPEECH , pp. 1959-1963
- Kang, S.¹ Meng, H.²

40
- 85040306596
- Stack GAN: Text to photo-realistic image synthesis with stacked generative adversarial networks
- H. Zhang, et al., "StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks, " IEEE Int. Conf. Comput. Vision (ICCV), 2017, pp. 5907-5915.
- (2017) IEEE Int. Conf. Comput. Vision (ICCV) , pp. 5907-5915
- Zhang, H.¹

41
- 84950159800
- Modeling f0 trajectories in hierarchically structured deep neural networks
- X. Yin, et al., "Modeling f0 trajectories in hierarchically structured deep neural networks, " Speech Commun., vol. 76, pp. 82-92, 2016.
- (2016) Speech Commun. , vol.76 , pp. 82-92
- Yin, X.¹

42
- 85023752230
- Generative adversarial network-based postfilter for statistical parametric speech synthesis
- New Orleans, LA, USA, Mar.
- T. Kaneko, H. Kameoka, N. Hojo, Y. Ijima, K. Hiramatsu, K. Kashino, "Generative adversarial network-based postfilter for statistical parametric speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process., New Orleans, LA, USA, Mar. 2017, pp. 4910-4914.
- (2017) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 4910-4914
- Kaneko, T.¹ Kameoka, H.² Hojo, N.³ Ijima, Y.⁴ Hiramatsu, K.⁵ Kashino, K.⁶

43
- 84994314564
- Fast, compact, high quality LSTM-RNN based statistical parametric speech synthesizer for mobile devices
- San Francisco, CA, USA, Sep.
- H. Zen, Y. Agiomyrgiannakis, N. Egberts, F. Henderson, P. Szczepaniak, "Fast, compact, high quality LSTM-RNN based statistical parametric speech synthesizer for mobile devices, " in Proc. INTERSPEECH, San Francisco, CA, USA, Sep. 2016, pp. 2273-2277.
- (2016) Proc. INTERSPEECH , pp. 2273-2277
- Zen, H.¹ Agiomyrgiannakis, Y.² Egberts, N.³ Henderson, F.⁴ Szczepaniak, P.⁵

44
- 84973307947
- Directly modeling voiced and unvoiced components in speech waveforms by neural networks
- Shanghai, China, Mar.
- K. Tokuda and H. Zen, "Directly modeling voiced and unvoiced components in speech waveforms by neural networks, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Shanghai, China, Mar. 2016, pp. 5640-5644.
- (2016) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 5640-5644
- Tokuda, K.¹ Zen, H.²

45
- 85011070895
- arXiv:1609. 03499
- A. Oord, et al., "WaveNet: A generative model for raw audio, " arXiv:1609. 03499, 2016.
- (2016) WaveNet: A Generative Model for Raw Audio
- Oord, A.¹

46
- 6644226630
- A large-scale Japanese speech database
- Kobe, Japan Nov.
- Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, H. Kawahara, "A large-scale Japanese speech database, " in Proc. Int. Conf. Spoken Lang. Process., Kobe, Japan, Nov. 1990, pp. 1089-1092.
- (1990) Proc. Int. Conf. Spoken Lang. Process. , pp. 1089-1092
- Sagisaka, Y.¹ Takeda, K.² Abe, M.³ Katagiri, S.⁴ Umeda, T.⁵ Kawahara, H.⁶

47
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- Firentze, Italy Sep.
- H. Kawahara, Jo Estill, O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT, " in Proc. Int. Workshop Models Anal. Vocal Emissions Biomed. Appl., Firentze, Italy, Sep. 2001, pp. 1-6.
- (2001) Proc. Int. Workshop Models Anal. Vocal Emissions Biomed. Appl. , pp. 1-6
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

48
- 44949143155
- Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation
- Pittsburgh, PA, USA Sep.
- Y. Ohtani, T. Toda, H. Saruwatari, K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation, " in Proc. INTERSPEECH, Pittsburgh, PA, USA, Sep. 2006, pp. 2266-2269.
- (2006) Proc. INTERSPEECH , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

49
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- Apr.
- H. Kawahara, I. Masuda-Katsuse, A. D. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Commun., vol. 27, no. 3/4, pp. 187-207, Apr. 1999.
- (1999) Speech Commun. , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.D.³

50
- 84994252904
- The NAIST text-to-speech system for the Blizzard Challenge 2015
- Berlin, Germany, Sep.
- S. Takamichi, K. Kobayashi, K. Tanaka, T. Toda, S. Nakamura, "The NAIST text-to-speech system for the Blizzard Challenge 2015, " in Proc. Blizzard Challenge Workshop, Berlin, Germany, Sep. 2015.
- (2015) Proc. Blizzard Challenge Workshop
- Takamichi, S.¹ Kobayashi, K.² Tanaka, K.³ Toda, T.⁴ Nakamura, S.⁵

51
- 84878390910
- Implementation of computationally efficient real-time voice conversion
- Portland, OR, USA, Sep.
- T. Toda, T. Muramatsu, H. Banno, "Implementation of computationally efficient real-time voice conversion, " in Proc. INTERSPEECH, Portland, OR, USA, Sep. 2012, pp. 94-97.
- (2012) Proc. INTERSPEECH , pp. 94-97
- Toda, T.¹ Muramatsu, T.² Banno, H.³

52
- 84959126237
- A comparison of features for synthetic speech detection
- Dresden, Germany, Sep.
- M. Sahidullah, T. Kinnunen, C. Hanilçi, "A comparison of features for synthetic speech detection, " in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 2087-2091.
- (2015) Proc. INTERSPEECH , pp. 2087-2091
- Sahidullah, M.¹ Kinnunen, T.² Hanilçi, C.³

53
- 85039171110
- Generative adversarial networkbased glottal waveform model for statistical parametric speech synthesis
- Stockholm, Sweden, Aug.
- B. Bollepalli, L. Juvela, P. Alku, "Generative adversarial networkbased glottal waveform model for statistical parametric speech synthesis, " in Proc. INTERSPEECH, Stockholm, Sweden, Aug. 2017, pp. 3394-3398.
- (2017) Proc. INTERSPEECH , pp. 3394-3398
- Bollepalli, B.¹ Juvela, L.² Alku, P.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.