메뉴 건너뛰기




Volumn 2016-May, Issue , 2016, Pages 5120-5124

High-pitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network

Author keywords

Deep neural network; Glottal inverse filtering; Glottal vocoder; QCP; Statistical parametric speech synthesis

Indexed keywords


EID: 84973293681     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2016.7472653     Document Type: Conference Paper
Times cited : (35)

References (29)
  • 1
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • Takayoshi Yoshimura, Keiichi Tokuda, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis, " in Proc. of Interspeech, 1999, pp. 2347-2350.
    • (1999) Proc. of Interspeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 2
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • Heiga Zen, Keiichi Tokuda, and Alan W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 3
    • 84876687945 scopus 로고    scopus 로고
    • Speech synthesis based on hidden markov models
    • May
    • Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura, "Speech synthesis based on hidden markov models, " Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, May 2013.
    • (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
    • Tokuda, K.1    Nankaku, Y.2    Toda, T.3    Zen, H.4    Yamagishi, J.5    Oura, K.6
  • 4
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • May
    • Heiga Zen, Andrew Senior, and Mike Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. of ICASSP, May 2013, pp. 7962-7966.
    • (2013) Proc. of ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 5
    • 85032750981 scopus 로고    scopus 로고
    • Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
    • May
    • Zhen-Hua Ling, Shi-Yin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen Meng, and Li Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " Signal Processing Magazine, IEEE, vol. 32, no. 3, pp. 35-52, May 2015.
    • (2015) Signal Processing Magazine, IEEE , vol.32 , Issue.3 , pp. 35-52
    • Ling, Z.1    Kang, S.2    Zen, H.3    Senior, A.4    Schuster, M.5    Qian, X.6    Meng, H.7    Deng, L.8
  • 6
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • Hideki Kawahara, Ikuyo Masuda-Katsuse, and Alain De Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
    • (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigne, A.3
  • 7
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight
    • Hideki Kawahara, Jo Estill, and Osamu Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system straight, " in MAVEBA, 2001.
    • (2001) MAVEBA
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 8
    • 0032638660 scopus 로고    scopus 로고
    • On phase perception in speech
    • Mar
    • Harald Pobloth and W. Bastiaan Kleijn, "On phase perception in speech, " in Proc. of ICASSP, Mar 1999, vol. 1, pp. 29-32 vol. 1.
    • (1999) Proc. of ICASSP , vol.1 , pp. 29-32
    • Pobloth, H.1    Bastiaan Kleijn, W.2
  • 9
    • 84959096758 scopus 로고    scopus 로고
    • Phase perception of the glottal excitation of vocoded speech
    • Dresden, September
    • Tuomo Raitio, Lauri Juvela, Antti Suni, Martti Vainio, and Paavo Alku, "Phase perception of the glottal excitation of vocoded speech, " in Proc. of Interspeech, Dresden, September 2015, pp. 254-258.
    • (2015) Proc. of Interspeech , pp. 254-258
    • Raitio, T.1    Juvela, L.2    Suni, A.3    Vainio, M.4    Alku, P.5
  • 10
    • 84867209230 scopus 로고    scopus 로고
    • HMM-based Finnish text-to-speech system utilizing glottal inverse filtering
    • Brisbane, Australia, September
    • Tuomo Raitio, Antti Suni, Hannu Pulakka, Martti Vainio, and Paavo Alku, "HMM-based Finnish text-to-speech system utilizing glottal inverse filtering, " in Proc. of Interspeech, Brisbane, Australia, September 2008, pp. 1881-1884.
    • (2008) Proc. of Interspeech , pp. 1881-1884
    • Raitio, T.1    Suni, A.2    Pulakka, H.3    Vainio, M.4    Alku, P.5
  • 12
    • 84865755765 scopus 로고    scopus 로고
    • The GlottHMM speech synthesis entry for Blizzard Challenge 2010
    • Kyoto, Japan, September
    • Antti Suni, Tuomo Raitio, Martti Vainio, and Paavo Alku, "The GlottHMM speech synthesis entry for Blizzard Challenge 2010, " in Blizzard Challenge 2010 Workshop, Kyoto, Japan, September 2010.
    • (2010) Blizzard Challenge 2010 Workshop
    • Suni, A.1    Raitio, T.2    Vainio, M.3    Alku, P.4
  • 14
    • 84910068090 scopus 로고    scopus 로고
    • Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
    • Singapore, September
    • Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, and Paavo Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort, " in Proc. of Interspeech, Singapore, September 2014, pp. 1969-1973.
    • (2014) Proc. of Interspeech , pp. 1969-1973
    • Raitio, T.1    Suni, A.2    Juvela, L.3    Vainio, M.4    Alku, P.5
  • 15
    • 84890547237 scopus 로고    scopus 로고
    • Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise
    • March
    • Tuomo Raitio, Antti Suni, Martti Vainio, and Paavo Alku, "Synthesis and perception of breathy, normal, and Lombard speech in the presence of noise, " Computer Speech & Language, vol. 28, no. 2, pp. 648-664, March 2014.
    • (2014) Computer Speech & Language , vol.28 , Issue.2 , pp. 648-664
    • Raitio, T.1    Suni, A.2    Vainio, M.3    Alku, P.4
  • 18
    • 0026881384 scopus 로고
    • Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
    • Eurospeech '91
    • Paavo Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering, " Speech Communication, vol. 11, no. 2-3, pp. 109-118, 1992, Eurospeech '91.
    • (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 109-118
    • Alku, P.1
  • 19
    • 84890448428 scopus 로고    scopus 로고
    • The GlottHMM entry for Blizzard Challenge 2011: Utilizing source unit selection in HMM-based speech synthesis for improved excitation generation
    • Turin, Italy, September
    • Antti Suni, Tuomo Raitio, Martti Vainio, and Paavo Alku, "The GlottHMM entry for Blizzard Challenge 2011: Utilizing source unit selection in HMM-based speech synthesis for improved excitation generation, " in Blizzard Challenge 2011 Workshop, Turin, Italy, September 2011.
    • (2011) Blizzard Challenge 2011 Workshop
    • Suni, A.1    Raitio, T.2    Vainio, M.3    Alku, P.4
  • 20
    • 84856294347 scopus 로고    scopus 로고
    • Glottal inverse filtering analysis of human voice production - A review of estimation and parameterization methods of the glottal excitation and their applications. (invited article)
    • Paavo Alku, "Glottal inverse filtering analysis of human voice production-a review of estimation and parameterization methods of the glottal excitation and their applications. (invited article), " Sadhana-Academy Proceedings in Engineering Sciences, vol. 36, no. 5, pp. 623-650, 2011.
    • (2011) Sadhana-Academy Proceedings in Engineering Sciences , vol.36 , Issue.5 , pp. 623-650
    • Alku, P.1
  • 21
    • 0016495091 scopus 로고
    • Linear prediction: A tutorial review
    • Apr
    • John Makhoul, "Linear prediction: A tutorial review, " Proceedings of the IEEE, vol. 63, no. 4, pp. 561-580, Apr 1975.
    • (1975) Proceedings of the IEEE , vol.63 , Issue.4 , pp. 561-580
    • Makhoul, J.1
  • 23
    • 70450198169 scopus 로고    scopus 로고
    • Glottal closure and opening instant detection from speech signals
    • Thomas Drugman and Thierry Dutoit, "Glottal closure and opening instant detection from speech signals., " in Proc. of Interspeech, 2009, pp. 2891-2894.
    • (2009) Proc. of Interspeech , pp. 2891-2894
    • Drugman, T.1    Dutoit, T.2
  • 28
    • 85133720638 scopus 로고    scopus 로고
    • The HMM-based speech synthesis system version 2. 0
    • Bonn, Germany, August
    • Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, and Keiichi Tokuda, "The HMM-based speech synthesis system version 2. 0, " in Proc. of ISCA SSW6, Bonn, Germany, August 2007, pp. 294-299.
    • (2007) Proc. of ISCA SSW6 , pp. 294-299
    • Zen, H.1    Nose, T.2    Yamagishi, J.3    Sako, S.4    Masuko, T.5    Black, A.W.6    Tokuda, K.7
  • 29
    • 84973276298 scopus 로고    scopus 로고
    • Methods for subjective determination of transmission quality
    • 800 ITU-T SG12 Geneva, Switzerland, Aug.
    • "Methods for Subjective Determination of Transmission Quality, " Recommendation P. 800, ITU-T SG12, Geneva, Switzerland, Aug. 1996.
    • (1996) Recommendation


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.