메뉴 건너뛰기




Volumn 08-12-September-2016, Issue , 2016, Pages 2473-2477

GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis

Author keywords

Deep neural network; Glottal inverse filtering; Speech synthesis; Vocoder

Indexed keywords

INVERSE PROBLEMS; SPEECH; SPEECH COMMUNICATION; SPEECH PROCESSING; SPEECH SYNTHESIS;

EID: 84994338062     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: 10.21437/Interspeech.2016-342     Document Type: Conference Paper
Times cited : (36)

References (25)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Review: Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Review: Statistical parametric speech synthesis," Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 2
    • 84966398940 scopus 로고
    • Optimising selection of units from speech databases for concatenative synthesis
    • A. W. Black and N. Campbell, "Optimising selection of units from speech databases for concatenative synthesis," in In Proc. Eurospeech, 1995, pp. 581-584.
    • (1995) Proc. Eurospeech , pp. 581-584
    • Black, A.W.1    Campbell, N.2
  • 3
    • 85032750981 scopus 로고    scopus 로고
    • Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
    • Z. H. Ling, S. Y. Kang, H. Zen, A. Senior, M. Schuster, X. J. Qian, H. M. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends," IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 35-52, 2015.
    • (2015) IEEE Signal Processing Magazine , vol.32 , Issue.3 , pp. 35-52
    • Ling, Z.H.1    Kang, S.Y.2    Zen, H.3    Senior, A.4    Schuster, M.5    Qian, X.J.6    Meng, H.M.7    Deng, L.8
  • 4
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. ICASSP, 2015, pp. 4470-4474.
    • (2015) Proc. ICASSP , pp. 4470-4474
    • Zen, H.1    Sak, H.2
  • 5
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based fF0g extraction: Possible role of a repetitive structure in sounds1
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based fF0g extraction: Possible role of a repetitive structure in sounds1," Speech Communication, vol. 27, no. 34, pp. 187 - 207, 1999.
    • (1999) Speech Communication , vol.27 , Issue.34 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 6
    • 84874199000 scopus 로고    scopus 로고
    • Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
    • H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in Proc. MAVEBA, 2001.
    • (2001) Proc. MAVEBA
    • Kawahara, H.1    Estill, J.2    Fujimura, O.3
  • 8
    • 84897865577 scopus 로고    scopus 로고
    • Harmonics plus noise model based vocoder for statistical parametric speech synthesis
    • D. Erro, I. Sainz, E. Navas, and I. Hernaez, "Harmonics plus noise model based vocoder for statistical parametric speech synthesis," IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 2, pp. 184-194, 2014.
    • (2014) IEEE Journal of Selected Topics in Signal Processing , vol.8 , Issue.2 , pp. 184-194
    • Erro, D.1    Sainz, I.2    Navas, E.3    Hernaez, I.4
  • 10
    • 80051962869 scopus 로고    scopus 로고
    • The lombard effect
    • S. A. Zollinger and H. Brumm, "The lombard effect," Current Biology, vol. 21, no. 16, pp. R614 - R615, 2011.
    • (2011) Current Biology , vol.21 , Issue.16 , pp. R614-R615
    • Zollinger, S.A.1    Brumm, H.2
  • 11
    • 80051650578 scopus 로고    scopus 로고
    • Utilizing glottal source pulse library for generating improved excitation signal for hmm-based speech synthesis
    • T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Utilizing glottal source pulse library for generating improved excitation signal for hmm-based speech synthesis," in Proc. ICASSP, 2011.
    • (2011) Proc. ICASSP
    • Raitio, T.1    Suni, A.2    Pulakka, H.3    Vainio, M.4    Alku, P.5
  • 12
    • 84890448428 scopus 로고    scopus 로고
    • The glotthmm entry for blizzard challenge 2011: Utilizing source unit selection in hmm-based speech synthesis for improved excitation generation
    • A. Suni, T. Raitio, M. Vainio, and P. Alku, "The glotthmm entry for blizzard challenge 2011: Utilizing source unit selection in hmm-based speech synthesis for improved excitation generation," in Blizzard Challenge 2011 Workshop, 2011.
    • (2011) Blizzard Challenge 2011 Workshop
    • Suni, A.1    Raitio, T.2    Vainio, M.3    Alku, P.4
  • 13
    • 84910068090 scopus 로고    scopus 로고
    • Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
    • T. Raitio, A. Suni, L. Juvela, M. Vainio, and P. Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort," in Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • Raitio, T.1    Suni, A.2    Juvela, L.3    Vainio, M.4    Alku, P.5
  • 15
    • 84973293681 scopus 로고    scopus 로고
    • Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
    • L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network," in Proc. ICASSP, 2016.
    • (2016) Proc. ICASSP
    • Juvela, L.1    Bollepalli, B.2    Airaksinen, M.3    Alku, P.4
  • 17
    • 84946033275 scopus 로고    scopus 로고
    • Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
    • IEEE
    • Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis," in Proc. ICASSP. IEEE, 2015, pp. 4460-4464.
    • (2015) Proc. ICASSP , pp. 4460-4464
    • Wu, Z.1    Valentini-Botinhao, C.2    Watts, O.3    King, S.4
  • 18
    • 0026881384 scopus 로고
    • Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
    • P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering," Speech Communication, vol. 11, no. 2-3, pp. 109 - 118, 1992.
    • (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 109-118
    • Alku, P.1
  • 19
    • 0019176143 scopus 로고
    • A filter family designed for use in quadrature mirror filter banks
    • J. Johnston, "A filter family designed for use in quadrature mirror filter banks," in Proc. ICASSP, vol. 5, 1980, pp. 291-294.
    • (1980) Proc. ICASSP , vol.5 , pp. 291-294
    • Johnston, J.1
  • 20
    • 84968482180 scopus 로고
    • Polynomial roots from companion matrix eigenvalues
    • A. Edelman and H. Murakami, "Polynomial roots from companion matrix eigenvalues," Math. Comp, vol. 64, pp. 763-776, 1995.
    • (1995) Math. Comp , vol.64 , pp. 763-776
    • Edelman, A.1    Murakami, H.2
  • 21
    • 84878394171 scopus 로고    scopus 로고
    • Wideband parametric speech synthesis using warped linear prediction
    • T. Raitio, A. Suni, M. Vainio, and P. Alku, "Wideband parametric speech synthesis using warped linear prediction," in Proc. Interspeech, 2012.
    • (2012) Proc. Interspeech
    • Raitio, T.1    Suni, A.2    Vainio, M.3    Alku, P.4
  • 22
    • 0027560122 scopus 로고
    • Robust signal selection for linear prediction analysis of voiced speech
    • C. Ma, Y. Kamp, and L.Willems, "Robust signal selection for linear prediction analysis of voiced speech," Speech Communication, vol. 12, no. 1, pp. 69 - 81, 1993.
    • (1993) Speech Communication , vol.12 , Issue.1 , pp. 69-81
    • Ma, C.1    Kamp, Y.2    Willems, L.3
  • 25
    • 84914102477 scopus 로고    scopus 로고
    • BeaqleJS: HTML5 and JavaScript based Framework for the Subjective Evaluation of Audio Quality
    • S. Kraft and U. Zölzer, "BeaqleJS: HTML5 and JavaScript based Framework for the Subjective Evaluation of Audio Quality," in Linux Audio Conference, 2014.
    • (2014) Linux Audio Conference
    • Kraft, S.1    Zölzer, U.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.