SCOPUS 정보 검색 플랫폼

IEEE/ACM Transactions on Audio Speech and Language Processing

Volumn 24, Issue 4, 2016, Pages 755-767

Postfilters to modify the modulation spectrum for statistical parametric speech synthesis

(6) Takamichi, Shinnosuke a Toda, Tomoki b Black, Alan W c Neubig, Graham a Sakti, Sakriani a Nakamura, Satoshi a

a NARA INSTITUTE OF SCIENCE AND TECHNOLOGY (Japan)

b NAGOYA UNIVERSITY (Japan)

c Carnegie Mellon University ^* (United States)

Author keywords

Clustergen; Global variance; GMM based voice conversion; Hmm based text to speech; Modulation spectrum; Over smoothing; Post filter; Statistical parametric speech synthesis

Indexed keywords

COMMUNICATION CHANNELS (INFORMATION THEORY); GAUSSIAN DISTRIBUTION; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MODULATION; PLASMA DIAGNOSTICS; SPEECH PROCESSING; SPEECH SYNTHESIS; STATISTICS; TRAJECTORIES; TRELLIS CODES;

CLUSTERGEN; GLOBAL VARIANCE; MODULATION SPECTRUM; OVER-SMOOTHING; POSTFILTERS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; TEXT TO SPEECH; VOICE CONVERSION;

SPEECH;

EID: 84962834006 PISSN: 23299290 EISSN: None Source Type: Journal
DOI: 10.1109/TASLP.2016.2522655 Document Type: Article

Times cited : (68)

References (66)

1
- 84855906479
- Speech synthesis technologies for individuals with vocal diabilities: Voice banking and reconstruction
- J. Yamagishi, C. Veaux, S. King, and S. Renals, "Speech synthesis technologies for individuals with vocal diabilities: Voice banking and reconstruction, " Acoust. Sci. Technol., vol. 33, pp. 1-5, 2012.
- (2012) Acoust. Sci. Technol. , vol.33 , pp. 1-5
- Yamagishi, J.¹ Veaux, C.² King, S.³ Renals, S.⁴

2
- 84905252904
- An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement
- Florence, Italy, May
- K. Tanaka, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "An evaluation of excitation feature prediction in a hybrid approach to electrolaryngeal speech enhancement, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Florence, Italy, May 2014, pp. 4521-4525.
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 4521-4525
- Tanaka, K.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

3
- 84874403435
- Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system
- Hollywood, CA, USA, Nov
- H. Doi, T. Toda, T. Nakano, M. Goto, and S. Nakamura, "Singing voice conversion method based on many-to-many eigenvoice conversion and training data generation using a singing-to-singing synthesis system, " in Proc. Asia Pac. Signal Inf. Process. Assoc. Annu. Summit Conf. (APSIPA ASC), Hollywood, CA, USA, Nov. 2012, pp. 1-6.
- (2012) Proc. Asia Pac. Signal Inf. Process. Assoc. Annu. Summit Conf. (APSIPA ASC) , pp. 1-6
- Doi, H.¹ Toda, T.² Nakano, T.³ Goto, M.⁴ Nakamura, S.⁵

4
- 84865743435
- Speakeradaptive speech synthesis based on eigenvoice conversion and languagedependent prosodic conversion in speech-to-speech translation
- Florence, Italy, Aug
- N. Hattori, T. Toda, H. Kawai, H. Saruwatari, and K. Shikano, "Speakeradaptive speech synthesis based on eigenvoice conversion and languagedependent prosodic conversion in speech-to-speech translation, " in Proc. INTERSPEECH, Florence, Italy, Aug. 2011, pp. 2769-2772.
- (2011) Proc. INTERSPEECH , pp. 2769-2772
- Hattori, N.¹ Toda, T.² Kawai, H.³ Saruwatari, H.⁴ Shikano, K.⁵

5
- 84905280452
- Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis
- Florence, Italy, May
- P. K. Muthukumar and A. W. Black, "Automatic discovery of a phonetic inventory for unwritten languages for statistical speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process., Florence, Italy, May 2014, pp. 2594-2598.
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 2594-2598
- Muthukumar, P.K.¹ Black, A.W.²

6
- 85047398116
- Text to speech in new languages without a standardized orthography
- Barcelona, Spain, Aug
- S. Sitaram, G. Anumanchipalli, J. Chiu, A. Parlikar, and A. W. Black, "Text to speech in new languages without a standardized orthography, " in Proc. 8th Speech Synth. Workshop, Barcelona, Spain, Aug. 2013, pp. 95- 100.
- (2013) Proc. 8th Speech Synth. Workshop , pp. 95-100
- Sitaram, S.¹ Anumanchipalli, G.² Chiu, J.³ Parlikar, A.⁴ Black, A.W.⁵

7
- 84905223321
- Regression approaches to perceptual age control in singing voice conversion
- Florence, Italy, May
- K. Kobayashi et al., "Regression approaches to perceptual age control in singing voice conversion, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Florence, Italy, May 2014, pp. 7954-7958.
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 7954-7958
- Kobayashi, K.¹

8
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- May
- Z.-H. Ling et al., "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " IEEE Signal Process. Mag., vol. 32, no. 3, pp. 35-52, May 2015.
- (2015) IEEE Signal Process. Mag. , vol.32 , Issue.3 , pp. 35-52
- Ling, Z.-H.¹

9
- 0023756465
- Speech synthesis by rule using an optimal selection of non-uniform synthesis units
- New York, NY, USA, Apr
- Y. Sagisaka, "Speech synthesis by rule using an optimal selection of non-uniform synthesis units, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), New York, NY, USA, Apr. 1988, pp. 679-682.
- (1988) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 679-682
- Sagisaka, Y.¹

10
- 0032026483
- Continuous probabilistic transform for voice conversion
- Mar
- Y. Stylianou, O. Cappe, and E. Moulines, "Continuous probabilistic transform for voice conversion, " IEEE Trans. Speech Audio Process., vol. 6, no. 2, pp. 131-142, Mar. 1988.
- (1988) IEEE Trans. Speech Audio Process , vol.6 , Issue.2 , pp. 131-142
- Stylianou, Y.¹ Cappe, O.² Moulines, E.³

11
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. Black, "Statistical parametric speech synthesis, " Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

12
- 0028996993
- Speech parameter generation from hmm using dynamic features
- Detroit, MI, USA, May
- K. Tokuda, T. Kobayashi, and S. Imai, "Speech parameter generation from HMM using dynamic features, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Detroit, MI, USA, May 1995, pp. 660-663.
- (1995) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 660-663
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

13
- 84876687945
- Speech synthesis based on hidden markov models
- May
- K. Tokuda, Y. Nankaku, T. Toda, H. Zen, J. Yamagishi, and K. Oura, "Speech synthesis based on hidden Markov models, " Proc. IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
- (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

14
- 57749193836
- Voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- Nov
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 8, pp. 2222-2235, Nov. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

15
- 44949232373
- CLUSTERGEN: A statistical parametric synthesizer using trajectory modeling
- Pittsburgh, PA, USA, Sep
- A.W. Black, "CLUSTERGEN: A statistical parametric synthesizer using trajectory modeling, " in Proc. INTERSPEECH, Pittsburgh, PA, USA, Sep. 2006.
- (2006) Proc. INTERSPEECH
- Black, A.W.¹

16
- 84897902941
- Statistical parametric speech synthesis based on Gaussian process regression
- Apr
- T. Koriyama, T. Nose, and T. Kobayashi, "Statistical parametric speech synthesis based on Gaussian process regression, " IEEE J. Sel. Topics Signal Process., vol. 8, no. 2, pp. 173-183, Apr. 2014.
- (2014) IEEE J. Sel. Topics Signal Process , vol.8 , Issue.2 , pp. 173-183
- Koriyama, T.¹ Nose, T.² Kobayashi, T.³

17
- 84856141218
- Voice conversion using dynamic kernel partial least squares regression
- Mar
- E. Helander, T. V. H. Silen, and M. Gabbouj, "Voice conversion using dynamic kernel partial least squares regression, " IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 806-817, Mar. 2012.
- (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.3 , pp. 806-817
- Helander, E.¹ Silen, T.V.H.² Gabbouj, M.³

18
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- Florence, Italy, May
- H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Florence, Italy, May 2014, pp. 3872-3876.
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 3872-3876
- Zen, H.¹ Senior, A.²

19
- 84906280857
- Voice conversion in high-order Eigen space using deep belief nets
- Lyon, France, Aug
- T. Nakashika, R. Takashima, T. Takiguchi, and Y. Ariki, "Voice conversion in high-order Eigen space using deep belief nets, " in Proc. INTERSPEECH, Lyon, France, Aug. 2013, pp. 369-372.
- (2013) Proc. INTERSPEECH , pp. 369-372
- Nakashika, T.¹ Takashima, R.² Takiguchi, T.³ Ariki, Y.⁴

20
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- Atlanta, GA, USA, May
- A. J. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Atlanta, GA, USA, May 1996, pp. 373-376.
- (1996) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 373-376
- Hunt, A.J.¹ Black, A.²

21
- 84988274722
- An investigation of implementation performance analysis of DNN based speech synthesis system
- Brighton, U.K
- K. Oura, H. Zen, Y. Nankaku, A. Lee, and K. Tokuda, "An investigation of implementation performance analysis of DNN based speech synthesis system, " in Proc. INTERSPEECH, Brighton, U.K., 2014, pp. 577-582.
- (2014) Proc. INTERSPEECH , pp. 577-582
- Oura, K.¹ Zen, H.² Nankaku, Y.³ Lee, A.⁴ Tokuda, K.⁵

22
- 33847129573
- Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training
- J. Yamagishi and T. Kobayashi, "Average-voice-based speech synthesis using HSMM-based speaker adaptation and adaptive training, " IEICE Trans. Inf. Syst., vol. E90-D, no. 2, pp. 533-543, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.2 , pp. 533-543
- Yamagishi, J.¹ Kobayashi, T.²

23
- 51449114529
- A style control technique for HMM-based expressive speech synthesis
- T. Nose, J. Yamagishi, T. Masuko, and T. Kobayashi, "A style control technique for HMM-based expressive speech synthesis, " IEICE Trans. Inf. Syst., vol. E90-D, no. 9, pp. 1406-1413, 2007.
- (2007) IEICE Trans. Inf. Syst. , vol.E90-D , Issue.9 , pp. 1406-1413
- Nose, T.¹ Yamagishi, J.² Masuko, T.³ Kobayashi, T.⁴

24
- 84878397811
- Exploring rich expressive information from audiobook data using cluster adaptive training
- Portland, OR, USA, Sep
- L. Chen, M. J. F. Gales, L. Chen, K. Chin, K. Knull, and M. Akamine, "Exploring rich expressive information from audiobook data using cluster adaptive training, " in Proc. INTERSPEECH, Portland, OR, USA, Sep. 2012.
- (2012) Proc. INTERSPEECH
- Chen, L.¹ Gales, M.J.F.² Chen, L.³ Chin, K.⁴ Knull, K.⁵ Akamine, M.⁶

25
- 84878419996
- The blizzard challenge 2011
- Turin, Italy, Sep
- S. King and V. Karaiskos, "The Blizzard challenge 2011, " in Proc. Blizzard Challenge Workshop, Turin, Italy, Sep. 2011.
- (2011) Proc. Blizzard Challenge Workshop
- King, S.¹ Karaiskos, V.²

26
- 70349197715
- Voice transformation: A survey
- Taipei, Taiwan, Apr
- Y. Stylianou, "Voice transformation: A survey, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Taipei, Taiwan, Apr. 2009, pp. 3585-3588.
- (2009) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 3585-3588
- Stylianou, Y.¹

27
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based f0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. D. Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech Commun., vol. 27, no. 3-4, pp. 187-207, 1999.
- (1999) Speech Commun. , vol.27 , Issue.3-4 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² Cheveigne, A.D.³

28
- 85010837662
- An attempt to develop a singing synthesizer by collaborative creation
- Stockholm, Sweden, Aug
- M. Morise, "An attempt to develop a singing synthesizer by collaborative creation, " in Proc. Stockholm Music Acoust. Conf. (SMAC), Stockholm, Sweden, Aug. 2013, pp. 287 292.
- (2013) Proc. Stockholm Music Acoust. Conf. (SMAC) , pp. 287-292
- Morise, M.¹

29
- 84930664922
- Vocaine the vocoder and applications in speech synthesis
- Brisbane, QLD, Australia, Apr
- Y. Agiomyrgiannakis, "Vocaine the vocoder and applications in speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Brisbane, QLD, Australia, Apr. 2015, pp. 4230-4234.
- (2015) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 4230-4234
- Agiomyrgiannakis, Y.¹

30
- 84906279165
- Optimizations and fitting procedures for the liljencrants-fant model for statistical parametric speech synthesis
- Vancouver, BC, Canada, May
- P. K. Muthukumar, A. W. Black, and H. T. Bunnell, "Optimizations and fitting procedures for the Liljencrants-Fant model for statistical parametric speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Vancouver, BC, Canada, May 2013, pp. 397-401.
- (2013) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 397-401
- Muthukumar, P.K.¹ Black, A.W.² Bunnell, H.T.³

31
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis, " IEICE Trans., vol. E90-D, no. 5, pp. 816-824, 2007.
- (2007) IEICE Trans , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

32
- 84897862522
- Parameter generation methods with rich context models for highquality and flexible text-to-speech synthesis
- Apr
- S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, and S. Nakamura, "Parameter generation methods with rich context models for highquality and flexible text-to-speech synthesis, " IEEE J. Sel. Topics Signal Process., vol. 8, no. 2, pp. 239-250, Apr. 2014.
- (2014) IEEE J. Sel. Topics Signal Process , vol.8 , Issue.2 , pp. 239-250
- Takamichi, S.¹ Toda, T.² Shiga, Y.³ Sakti, S.⁴ Neubig, G.⁵ Nakamura, S.⁶

33
- 84897832343
- A parameter generation algorithm using local variance for HMM-based speech synthesis
- Apr
- T. Nose, V. Chunwijitra, and T. Kobayashi, "A parameter generation algorithm using local variance for HMM-based speech synthesis, " IEEE J. Sel. Topics Signal Process., vol. 8, no. 2, pp. 221-228, Apr. 2014.
- (2014) IEEE J. Sel. Topics Signal Process , vol.8 , Issue.2 , pp. 221-228
- Nose, T.¹ Chunwijitra, V.² Kobayashi, T.³

34
- 84890495160
- Fast, low-artifact speech synthesis considering global variance
- Vancouver, BC, Canada, May
- M. Shannon and W. Byrne, "Fast, low-artifact speech synthesis considering global variance, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Vancouver, BC, Canada, May 2013, pp. 7869-7873.
- (2013) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 7869-7873
- Shannon, M.¹ Byrne, W.²

35
- 84878390910
- Implementation of computationally efficient real-time voice conversion
- Portland, OR, USA, Sep
- T. Toda, T. Muramatsu, and H. Banno, "Implementation of computationally efficient real-time voice conversion, " in Proc. INTERSPEECH, Portland, OR, USA, Sep. 2012.
- (2012) Proc. INTERSPEECH
- Toda, T.¹ Muramatsu, T.² Banno, H.³

36
- 0028287770
- Effect of reducing slow temporal modulations on speech reception
- R. Drullman, J. M. Festen, and R. Plomp, "Effect of reducing slow temporal modulations on speech reception, " J. Acoust. Soc. Amer., vol. 95, pp. 2670-2680, 1994.
- (1994) J. Acoust. Soc. Amer. , vol.95 , pp. 2670-2680
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

37
- 70349212558
- Phoneme recognition using spectral envelop and modulation frequency features
- Taipei, Taiwan, Apr
- S. Thomas, S. Ganapathy, and H. Hermansky, "Phoneme recognition using spectral envelop and modulation frequency features, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Taipei, Taiwan, Apr. 2009, pp. 4453-4456.
- (2009) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 4453-4456
- Thomas, S.¹ Ganapathy, S.² Hermansky, H.³

38
- 84959088222
- Reduction of reverberation effects in the mfcc modulation spectrum for improved classification of acoustic signals
- Dresden, Germany, Sep
- S. Gergen, A. Nagathil, and R. Martin, "Reduction of reverberation effects in the MFCC modulation spectrum for improved classification of acoustic signals, " in Proc. INTERSPEECH, Dresden, Germany, Sep. 2015, pp. 1992-1995.
- (2015) Proc. INTERSPEECH , pp. 1992-1995
- Gergen, S.¹ Nagathil, A.² Martin, R.³

39
- 84890543945
- Synthetic speech detection using temporal modulation feature
- Vancouver, BC, Canada, May
- Z. Wu, X. Xiao, E. S. Chng, and H. Li, "Synthetic speech detection using temporal modulation feature, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Vancouver, BC, Canada, May 2013, pp. 7234-7238.
- (2013) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 7234-7238
- Wu, Z.¹ Xiao, X.² Chng, E.S.³ Li, H.⁴

40
- 0027957839
- Effect of temporal envelope smearing on speech perception
- R. Drullman, J. M. Festen, and R. Plomp, "Effect of temporal envelope smearing on speech perception, " J. Acoust. Soc. Amer., vol. 95, pp. 1053- 1064, 1994.
- (1994) J. Acoust. Soc. Amer. , vol.95 , pp. 1053-1064
- Drullman, R.¹ Festen, J.M.² Plomp, R.³

41
- 0030369532
- Intelligibility of speech with filtered time trajectories of spectral envelopes
- T. Arai, M. Pavel, H. Hermansky, and C. Avendano, "Intelligibility of speech with filtered time trajectories of spectral envelopes, " in Proc. 4th Int. Conf. Spoken Lang. (ICSLP), 1996, vol. 4, pp. 2490-2493.
- (1996) Proc. 4th Int. Conf. Spoken Lang. (ICSLP) , vol.4 , pp. 2490-2493
- Arai, T.¹ Pavel, M.² Hermansky, H.³ Avendano, C.⁴

42
- 84867211725
- Lowdelay voice conversion based on maximum likelihood estimation of spectral parameter trajectory
- Brisbane, QLD, Australia, Sep
- T. Muramatsu, Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Lowdelay voice conversion based on maximum likelihood estimation of spectral parameter trajectory, " in Proc. INTERSPEECH, Brisbane, QLD, Australia, Sep. 2008, pp. 1076-1079.
- (2008) Proc. INTERSPEECH , pp. 1076-1079
- Muramatsu, T.¹ Ohtani, Y.² Toda, T.³ Saruwatari, H.⁴ Shikano, K.⁵

43
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- Brisbane, QLD, Australia, Apr
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Brisbane, QLD, Australia, Apr. 2015, pp. 4470-4474.
- (2015) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 4470-4474
- Zen, H.¹ Sak, H.²

44
- 84905234422
- A postfilter to modify modulation spectrum in hmm-based speech synthesis
- Florence, Italy, May
- S. Takamichi, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "A postfilter to modify modulation spectrum in HMM-based speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Florence, Italy, May 2014, pp. 290-294.
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 290-294
- Takamichi, S.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

45
- 84959144982
- Modified modulation spectrum-based post-filter for HMM-based speech synthesis
- Atlanta, GA, USA, Dec
- S. Takamichi, T. Toda, A. W. Black, and S. Nakamura, "Modified modulation spectrum-based post-filter for HMM-based speech synthesis, " in Proc. GlobalSIP, Atlanta, GA, USA, Dec. 2014, pp. 710-714.
- (2014) Proc. GlobalSIP , pp. 710-714
- Takamichi, S.¹ Toda, T.² Black, A.W.³ Nakamura, S.⁴

46
- 84949926049
- Modulation spectrum-based post-filter for GMM-based voice conversion
- Siem Reap, Cambodia, Dec
- S. Takamichi, T. Toda, A. W. Black, and S. Nakamura, "Modulation spectrum-based post-filter for GMM-based voice conversion, " in Proc. Annu. Summit Conf. Asia-Pac. Signal Inf. Process. Assoc. (APSIPA ASC), Siem Reap, Cambodia, Dec. 2014, pp. 1-4.
- (2014) Proc. Annu. Summit Conf. Asia-Pac. Signal Inf. Process. Assoc. (APSIPA ASC) , pp. 1-4
- Takamichi, S.¹ Toda, T.² Black, A.W.³ Nakamura, S.⁴

47
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- Budapest, Hungary, Apr
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, " in Proc. EUROSPEECH, Budapest, Hungary, Apr. 1999, pp. 2347-2350.
- (1999) Proc. EUROSPEECH , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

48
- 0033708106
- Speech parameter generation algorithms for hmm-based speech synthesis
- Istanbul, Turkey, Jun
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Istanbul, Turkey, Jun. 2000, pp. 1315-1318.
- (2000) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

49
- 84989843139
- MDL-based context-dependent subword modeling for speech recognition
- K. Shinoda and T. Watanabe, "MDL-based context-dependent subword modeling for speech recognition, " J. Acoust. Soc. Jpn. (E), vol. 28, no. 3, pp. 140-146, 2007.
- (2007) J. Acoust. Soc. Jpn. (E) , vol.28 , Issue.3 , pp. 140-146
- Shinoda, K.¹ Watanabe, T.²

50
- 0038443474
- Joint acoustic and modulation frequency
- L. Atlas and S. A. Shamma, "Joint acoustic and modulation frequency, " EURASIP J. Appl. Signal Process., vol. 7, pp. 668-675, 2003.
- (2003) EURASIP J. Appl. Signal Process , vol.7 , pp. 668-675
- Atlas, L.¹ Shamma, S.A.²

51
- 84866866142
- A state duration generation algorithm considering global variance for hmm-based speech synthesis
- Xi'an, China
- S. Pan, J. Tao, and Y. Wang, "A state duration generation algorithm considering global variance for HMM-based speech synthesis, " in Proc. Annu. Summit Conf. Asia-Pac. Signal Inf. Process. Assoc. (APSIPA ASC), Xi'an, China, 2011.
- (2011) Proc. Annu. Summit Conf. Asia-Pac. Signal Inf. Process. Assoc. (APSIPA ASC)
- Pan, S.¹ Tao, J.² Wang, Y.³

52
- 85008023596
- Continuous f0 modeling for hmm based statistical parametric speech synthesis
- Jul
- K. Yu and S. Young, "Continuous F0 modeling for HMM based statistical parametric speech synthesis, " IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 5, pp. 1071-1079, Jul. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.5 , pp. 1071-1079
- Yu, K.¹ Young, S.²

53
- 84905244240
- A hybrid approach to electrolaryngeal speech enhansement based on spectral subtraction and statistical voice conversion
- Lyon, France, Sep
- K. Tanaka, T. Toda, G. Neubig, S. Sakti, and S. Nakamura, "A hybrid approach to electrolaryngeal speech enhansement based on spectral subtraction and statistical voice conversion, " in Proc. Interspeech, Lyon, France, Sep. 2013, pp. 3067-3071.
- (2013) Proc. Interspeech , pp. 3067-3071
- Tanaka, K.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

54
- 84925160976
- Cambridge, U.K.: Cambridge Univ. Press
- P. Taylor, Text-To-Speech Synthesis. Cambridge, U.K.: Cambridge Univ. Press, 2009.
- (2009) Text-To-Speech Synthesis
- Taylor, P.¹

55
- 84905283795
- A frequency-weighted post-filtering transform for compensation of the over-smoothing effect in hmm-based speech synthesis
- Florence, Italy, May
- F. Eyben and Y. Agiomyrgiannakis, "A frequency-weighted post-filtering transform for compensation of the over-smoothing effect in HMM-based speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Florence, Italy, May 2014, pp. 275-279.
- (2014) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 275-279
- Eyben, F.¹ Agiomyrgiannakis, Y.²

56
- 84946074523
- The effect of neural networks in statistical parametric speech synthesis
- Brisbane, QLD, Australia, Apr
- K. Hashimoto, K. Oura, Y. Nankaku, and K. Tokuda, "The effect of neural networks in statistical parametric speech synthesis, " in Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Brisbane, QLD, Australia, Apr. 2015, pp. 4455-4459.
- (2015) Proc. Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 4455-4459
- Hashimoto, K.¹ Oura, K.² Nankaku, Y.³ Tokuda, K.⁴

57
- 84910100893
- Dnn-based stochastic postfilter for HMM-based speech synthesis
- MAX Atria, Singapore, May
- L.-H. Chen, T. Raitio, C.-V. Botinhao, J. Yamagishi, and Z.-H. Ling, "DNN-based stochastic postfilter for HMM-based speech synthesis, " in Proc. INTERSPEECH, MAX Atria, Singapore, May 2014, pp. 1954- 1958.
- (2014) Proc. INTERSPEECH , pp. 1954-1958
- Chen, L.-H.¹ Raitio, T.² Botinhao, C.-V.³ Yamagishi, J.⁴ Ling, Z.-H.⁵

58
- 85008525798
- Product of experts for statistical parametric speech synthesis
- Mar
- H. Zen, M. J. F. Gales, Y. Nankaku, and K. Tokuda, "Product of experts for statistical parametric speech synthesis, " IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 794-805, Mar. 2011.
- (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.3 , pp. 794-805
- Zen, H.¹ Gales, M.J.F.² Nankaku, Y.³ Tokuda, K.⁴

59
- 84910088495
- Analysis of spectral enhancement using global variance in HMM-based speech synthesis
- MAX Atria, Singapore, May
- T. Nose and A. Ito, "Analysis of spectral enhancement using global variance in HMM-based speech synthesis, " in Proc. INTERSPEECH, MAX Atria, Singapore, May 2014, pp. 2917-2921.
- (2014) Proc. INTERSPEECH , pp. 2917-2921
- Nose, T.¹ Ito, A.²

60
- 44449177634
- Hidden semi- Markov model based speech synthesis system
- E90-D, no. 5
- H. Zen, K. Tokuda, T. K. T. Masuko, and T. Kitamura, "Hidden semi- Markov model based speech synthesis system, " IEICE Trans. Inf. Syst., E90-D, no. 5, pp. 825-834, 2007.
- (2007) IEICE Trans. Inf. Syst. , pp. 825-834
- Zen, H.¹ Tokuda, K.² Masuko, T.K.T.³ Kitamura, T.⁴

61
- 6644226630
- A large-scale Japanese speech database
- Kobe, Japan, Nov
- Y. Sagisaka, K. Takeda, M. Abe, S. Katagiri, T. Umeda, and H. Kuawhara, "A large-scale Japanese speech database, " in Proc. Int. Conf. Spoken Lang. (ICSLP'90), Kobe, Japan, Nov. 1990, pp. 1089-1092.
- (1990) Proc. Int. Conf. Spoken Lang. (ICSLP'90) , pp. 1089-1092
- Sagisaka, Y.¹ Takeda, K.² Abe, M.³ Katagiri, S.⁴ Umeda, T.⁵ Kuawhara, H.⁶

62
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
- Firentze, Italy, Sep
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT, " in Proc. Int. Workshop Models Anal. Vocal Emissions Biomed. Appl. (MAVEBA), Firentze, Italy, Sep. 2001, pp. 1-6.
- (2001) Proc. Int. Workshop Models Anal. Vocal Emissions Biomed. Appl. (MAVEBA) , pp. 1-6
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

63
- 44949143155
- Maximum likelihood voice conversion based on GMM with straight mixed excitation
- Pittsburgh, PA, USA, Sep
- Y. Ohtani, T. Toda, H. Saruwatari, and K. Shikano, "Maximum likelihood voice conversion based on GMM with STRAIGHT mixed excitation, " in Proc. INTERSPEECH, Pittsburgh, PA, USA, Sep. 2006, pp. 2266-2269.
- (2006) Proc. INTERSPEECH , pp. 2266-2269
- Ohtani, Y.¹ Toda, T.² Saruwatari, H.³ Shikano, K.⁴

64
- 84946750132
- Mel-cepstrum modulation spectrum (mcms) features for robust asr
- MAX Atria, Singapore, Nov
- V. Tyagi, I. McCowan, H. Misra, and H. Bourlard, "Mel-cepstrum modulation spectrum (MCMS) features for robust ASR, " in Proc. Automat. Speech Recog. Understand. (ASRU), MAX Atria, Singapore, Nov. 2003, pp. 399-404.
- (2003) Proc. Automat. Speech Recog. Understand. (ASRU) , pp. 399-404
- Tyagi, V.¹ McCowan, I.² Misra, H.³ Bourlard, H.⁴

65
- 84878381939
- Online Available
- Speech Signal Processing Toolkit (SPTK) [Online]. Available: http://sp-tk.sourceforge.net/.
- Speech Signal Processing Toolkit (SPTK)

66
- 84870436666
- Online Available
- Amazon Mechanical Turk [Online]. Available: https://www.mturk.com/.
- Amazon Mechanical Turk

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.