SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 2473-2477

GlottDNN - A full-band glottal vocoder for statistical parametric speech synthesis

(6) Airaksinen, Manu a Bollepalli, Bajibabu a Juvela, Lauri a Wu, Zhizheng b King, Simon b Alku, Paavo a

a AALTO UNIVERSITY (Finland)

b UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Deep neural network; Glottal inverse filtering; Speech synthesis; Vocoder

Indexed keywords

INVERSE PROBLEMS; SPEECH; SPEECH COMMUNICATION; SPEECH PROCESSING; SPEECH SYNTHESIS;

DEEP NEURAL NETWORKS; INVERSE FILTERING; NEW APPROACHES; PARAMETERIZING; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; SUBJECTIVE LISTENING TEST; TEXT TO SPEECH; VOCAL-TRACTS;

VOCODERS;

EID: 84994338062 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-342 Document Type: Conference Paper

Times cited : (36)

References (25)

1
- 67651002140
- Review: Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Review: Statistical parametric speech synthesis," Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 84966398940
- Optimising selection of units from speech databases for concatenative synthesis
- A. W. Black and N. Campbell, "Optimising selection of units from speech databases for concatenative synthesis," in In Proc. Eurospeech, 1995, pp. 581-584.
- (1995) Proc. Eurospeech , pp. 581-584
- Black, A.W.¹ Campbell, N.²

3
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- Z. H. Ling, S. Y. Kang, H. Zen, A. Senior, M. Schuster, X. J. Qian, H. M. Meng, and L. Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends," IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 35-52, 2015.
- (2015) IEEE Signal Processing Magazine , vol.32 , Issue.3 , pp. 35-52
- Ling, Z.H.¹ Kang, S.Y.² Zen, H.³ Senior, A.⁴ Schuster, M.⁵ Qian, X.J.⁶ Meng, H.M.⁷ Deng, L.⁸

4
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. ICASSP, 2015, pp. 4470-4474.
- (2015) Proc. ICASSP , pp. 4470-4474
- Zen, H.¹ Sak, H.²

5
- 0032673049
- Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based fF0g extraction: Possible role of a repetitive structure in sounds1
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive timefrequency smoothing and an instantaneous-frequency-based fF0g extraction: Possible role of a repetitive structure in sounds1," Speech Communication, vol. 27, no. 34, pp. 187 - 207, 1999.
- (1999) Speech Communication , vol.27 , Issue.34 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

6
- 84874199000
- Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT
- H. Kawahara, J. Estill, and O. Fujimura, "Aperiodicity extraction and control using mixed mode excitation and group delay manipulation for a high quality speech analysis, modification and synthesis system STRAIGHT," in Proc. MAVEBA, 2001.
- (2001) Proc. MAVEBA
- Kawahara, H.¹ Estill, J.² Fujimura, O.³

7
- 77957744515
- Hmm-based speech synthesis utilizing glottal inverse filtering
- T. Raitio, A. Suni, J. Yamagishi, H. Pulakka, J. Nurminen, M. Vainio, and P. Alku, "Hmm-based speech synthesis utilizing glottal inverse filtering," IEEE Transactions on Audio, Speech, and Language Processing, vol. 19, no. 1, pp. 153-165, 2011.
- (2011) IEEE Transactions on Audio, Speech, and Language Processing , vol.19 , Issue.1 , pp. 153-165
- Raitio, T.¹ Suni, A.² Yamagishi, J.³ Pulakka, H.⁴ Nurminen, J.⁵ Vainio, M.⁶ Alku, P.⁷

8
- 84897865577
- Harmonics plus noise model based vocoder for statistical parametric speech synthesis
- D. Erro, I. Sainz, E. Navas, and I. Hernaez, "Harmonics plus noise model based vocoder for statistical parametric speech synthesis," IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 2, pp. 184-194, 2014.
- (2014) IEEE Journal of Selected Topics in Signal Processing , vol.8 , Issue.2 , pp. 184-194
- Erro, D.¹ Sainz, I.² Navas, E.³ Hernaez, I.⁴

9
- 0003425258
- ser Prentice-Hall signal processing series. Prentice-Hall
- L. Rabiner and R. Schafer, Digital Processing of Speech Signals, ser. Prentice-Hall signal processing series. Prentice-Hall, 1978.
- (1978) Digital Processing of Speech Signals
- Rabiner, L.¹ Schafer, R.²

10
- 80051962869
- The lombard effect
- S. A. Zollinger and H. Brumm, "The lombard effect," Current Biology, vol. 21, no. 16, pp. R614 - R615, 2011.
- (2011) Current Biology , vol.21 , Issue.16 , pp. R614-R615
- Zollinger, S.A.¹ Brumm, H.²

11
- 80051650578
- Utilizing glottal source pulse library for generating improved excitation signal for hmm-based speech synthesis
- T. Raitio, A. Suni, H. Pulakka, M. Vainio, and P. Alku, "Utilizing glottal source pulse library for generating improved excitation signal for hmm-based speech synthesis," in Proc. ICASSP, 2011.
- (2011) Proc. ICASSP
- Raitio, T.¹ Suni, A.² Pulakka, H.³ Vainio, M.⁴ Alku, P.⁵

12
- 84890448428
- The glotthmm entry for blizzard challenge 2011: Utilizing source unit selection in hmm-based speech synthesis for improved excitation generation
- A. Suni, T. Raitio, M. Vainio, and P. Alku, "The glotthmm entry for blizzard challenge 2011: Utilizing source unit selection in hmm-based speech synthesis for improved excitation generation," in Blizzard Challenge 2011 Workshop, 2011.
- (2011) Blizzard Challenge 2011 Workshop
- Suni, A.¹ Raitio, T.² Vainio, M.³ Alku, P.⁴

13
- 84910068090
- Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort
- T. Raitio, A. Suni, L. Juvela, M. Vainio, and P. Alku, "Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort," in Proc. Interspeech, 2014.
- (2014) Proc. Interspeech
- Raitio, T.¹ Suni, A.² Juvela, L.³ Vainio, M.⁴ Alku, P.⁵

14
- 84865755765
- The glotthmm speech synthesis entry for blizzard challenge 2010
- A. Suni, T. Raitio, M. Vainio, and P. Alku, "The glotthmm speech synthesis entry for blizzard challenge 2010," in Blizzard Challenge 2010 Workshop, 2010.
- (2010) Blizzard Challenge 2010 Workshop
- Suni, A.¹ Raitio, T.² Vainio, M.³ Alku, P.⁴

15
- 84973293681
- Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network
- L. Juvela, B. Bollepalli, M. Airaksinen, and P. Alku, "Highpitched excitation generation for glottal vocoding in statistical parametric speech synthesis using a deep neural network," in Proc. ICASSP, 2016.
- (2016) Proc. ICASSP
- Juvela, L.¹ Bollepalli, B.² Airaksinen, M.³ Alku, P.⁴

16
- 84898074254
- Quasi closed phase glottal inverse filtering analysis with weighted linear prediction
- M. Airaksinen, T. Raitio, B. Story, and P. Alku, "Quasi closed phase glottal inverse filtering analysis with weighted linear prediction," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 3, pp. 596-607, 2014.
- (2014) IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol.22 , Issue.3 , pp. 596-607
- Airaksinen, M.¹ Raitio, T.² Story, B.³ Alku, P.⁴

17
- 84946033275
- Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
- IEEE
- Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis," in Proc. ICASSP. IEEE, 2015, pp. 4460-4464.
- (2015) Proc. ICASSP , pp. 4460-4464
- Wu, Z.¹ Valentini-Botinhao, C.² Watts, O.³ King, S.⁴

18
- 0026881384
- Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering
- P. Alku, "Glottal wave analysis with pitch synchronous iterative adaptive inverse filtering," Speech Communication, vol. 11, no. 2-3, pp. 109 - 118, 1992.
- (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 109-118
- Alku, P.¹

19
- 0019176143
- A filter family designed for use in quadrature mirror filter banks
- J. Johnston, "A filter family designed for use in quadrature mirror filter banks," in Proc. ICASSP, vol. 5, 1980, pp. 291-294.
- (1980) Proc. ICASSP , vol.5 , pp. 291-294
- Johnston, J.¹

20
- 84968482180
- Polynomial roots from companion matrix eigenvalues
- A. Edelman and H. Murakami, "Polynomial roots from companion matrix eigenvalues," Math. Comp, vol. 64, pp. 763-776, 1995.
- (1995) Math. Comp , vol.64 , pp. 763-776
- Edelman, A.¹ Murakami, H.²

21
- 84878394171
- Wideband parametric speech synthesis using warped linear prediction
- T. Raitio, A. Suni, M. Vainio, and P. Alku, "Wideband parametric speech synthesis using warped linear prediction," in Proc. Interspeech, 2012.
- (2012) Proc. Interspeech
- Raitio, T.¹ Suni, A.² Vainio, M.³ Alku, P.⁴

22
- 0027560122
- Robust signal selection for linear prediction analysis of voiced speech
- C. Ma, Y. Kamp, and L.Willems, "Robust signal selection for linear prediction analysis of voiced speech," Speech Communication, vol. 12, no. 1, pp. 69 - 81, 1993.
- (1993) Speech Communication , vol.12 , Issue.1 , pp. 69-81
- Ma, C.¹ Kamp, Y.² Willems, L.³

23
- 0003450846
- International Telecommunication Union, Recommendation ITU-T P.800
- ITU, "Methods for subjective determination of transmission quality," in International Telecommunication Union, Recommendation ITU-T P.800, 1996.
- (1996) Methods for Subjective Determination of Transmission Quality
- ITU¹

24
- 84994377328
- Hurricane natural speech corpus
- M. Cooke, C. Mayo, and C. Valentini-Botinhao, "Hurricane natural speech corpus," 2013, LISTA Consortium. [Online]. Available: Doi: 10.7488/ds/140
- (2013) LISTA Consortium
- Cooke, M.¹ Mayo, C.² Valentini-Botinhao, C.³

25
- 84914102477
- BeaqleJS: HTML5 and JavaScript based Framework for the Subjective Evaluation of Audio Quality
- S. Kraft and U. Zölzer, "BeaqleJS: HTML5 and JavaScript based Framework for the Subjective Evaluation of Audio Quality," in Linux Audio Conference, 2014.
- (2014) Linux Audio Conference
- Kraft, S.¹ Zölzer, U.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.