SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2016-May, Issue , 2016, Pages 5130-5134

Robust TTS duration modelling using DNNS

(6) Henter, Gustav Eje a Ronanki, Srikanth a Watts, Oliver a Wester, Mirjam a Wu, Zhizheng a King, Simon a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

duration modelling; robust statistics; Speech synthesis

Indexed keywords

EID: 84973395039 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2016.7472655 Document Type: Conference Paper

Times cited : (37)

References (35)

1
- 84904680338
- The blizzard challenge 2013
- S. King and V. Karaiskos, "The Blizzard Challenge 2013, " in Proc. Blizzard Chall. Workshop, 2013.
- (2013) Proc. Blizzard Chall. Workshop
- King, S.¹ Karaiskos, V.²

2
- 84867223798
- Robustness of HMMbased speech synthesis
- J. Yamagishi, Z.-H. Ling, and S. King, "Robustness of HMMbased speech synthesis, " in Proc. Interspeech, pp. 581-584, 2008.
- (2008) Proc. Interspeech , pp. 581-584
- Yamagishi, J.¹ Ling, Z.-H.² King, S.³

3
- 84897869648
- Noise in HMM-based speech synthesis adaptation: Analysis, evaluation methods and experiments
- R. Karhila, U. Remes, and M. Kurimo, "Noise in HMM-based speech synthesis adaptation: Analysis, evaluation methods and experiments, " IEEE J. Sel. Top. Signa., vol. 8, no. 2, pp. 285-295, 2014.
- (2014) IEEE J. Sel. Top. Signa. , vol.8 , Issue.2 , pp. 285-295
- Karhila, R.¹ Remes, U.² Kurimo, M.³

4
- 0023407575
- Review of text-to-speech conversion for English
- D. H. Klatt, "Review of text-to-speech conversion for English, " J. Acoust. Soc. Am., vol. 82, no. 3, pp. 737-793, 1987.
- (1987) J. Acoust. Soc. Am. , vol.82 , Issue.3 , pp. 737-793
- Klatt, D.H.¹

5
- 0034854347
- Joint prosody prediction and unit selection for concatenative speech synthesis
- I. Bulyko and M. Ostendorf, "Joint prosody prediction and unit selection for concatenative speech synthesis, " in Proc. ICASSP, vol. 2, pp. 781-784, 2001.
- (2001) Proc. ICASSP , vol.2 , pp. 781-784
- Bulyko, I.¹ Ostendorf, M.²

6
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

7
- 84856237844
- An introduction to statistical parametric speech synthesis
- S. King, "An introduction to statistical parametric speech synthesis, " Sadhana, vol. 36, no. 5, pp. 837-852, 2011.
- (2011) Sadhana , vol.36 , Issue.5 , pp. 837-852
- King, S.¹

8
- 85133720638
- The HMM-based speech synthesis system (HTS) version 2. 0
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2. 0, " in Proc. SSW6, pp. 294-299, 2007.
- (2007) Proc. SSW6 , pp. 294-299
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.⁶ Tokuda, K.⁷

9
- 0024610919
- A tutorial on hidden Markov models and selected applications in speech recognition
- L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition, " Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
- (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
- Rabiner, L.R.¹

10
- 0003571976
- S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book (for HTK Version 3. 4). 2006.
- (2006) The HTK Book (For HTK Version 3. 4)
- Young, S.¹ Evermann, G.² Gales, M.³ Hain, T.⁴ Kershaw, D.⁵ Liu, X.⁶ Moore, G.⁷ Odell, J.⁸ Ollason, D.⁹ Povey, D.¹⁰ Valtchev, V.¹¹ Woodland, P.¹²

11
- 0004262735
- New York, NY: Springer, 2nd ed.
- P. J. Huber, Robust Statistics. New York, NY: Springer, 2nd ed., 2011.
- (2011) Robust Statistics
- Huber, P.J.¹

12
- 0003841907
- New York, NY: John Wiley & Sons
- F. R. Hampel, E. M. Ronchetti, P. J. Rousseeuw, and W. A. Stahel, Robust Statistics: The Approach Based on Influence Functions. New York, NY: John Wiley & Sons, 1986.
- (1986) Obust Statistics: The Approach Based on Influence Functions
- Hampel, F.R.¹ Ronchetti, E.M.² Rousseeuw, P.J.³ Stahel, W.A.⁴

13
- 84910028520
- Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
- G. E. Henter, T. Merritt, M. Shannon, C. Mayo, and S. King, "Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech, " in Proc. Interspeech, pp. 1504-1508, 2014.
- (2014) Proc. Interspeech , pp. 1504-1508
- Henter, G.E.¹ Merritt, T.² Shannon, M.³ Mayo, C.⁴ King, S.⁵

14
- 84857550885
- Robust full-waveform inversion using the Student's t-distribution
- A. Y. Aravkin, T. van Leeuwen, and F. J. Herrmann, "Robust full-waveform inversion using the Student's t-distribution, " in SEG Tech. Program Expand. Abstr., vol. 30, pp. 2669-2673, 2011.
- (2011) SEG Tech. Program Expand. Abstr. , vol.30 , pp. 2669-2673
- Aravkin, A.Y.¹ Van Leeuwen, T.² Herrmann, F.J.³

15
- 84973373484
- Master's thesis, Department of Electrical Engineering, IIT Madras, India, July
- K. R. Krishnan, "Prosodic analysis of Indian languages and its application to text to speech synthesis, " Master's thesis, Department of Electrical Engineering, IIT Madras, India, July 2015.
- (2015) Prosodic Analysis of Indian Languages and Its Application to Text to Speech Synthesis
- Krishnan, K.R.¹

16
- 84973290641
- Blizzard challenge 2015: Submission by donlab, IIT madras
- A. Prakash, A. Baby, A. S. Shanmugam, J. J. Prakash, N. L. Nishanthi, K. R. Krishnan, V. S. Rupak, and H. A. Murthy, "Blizzard Challenge 2015: Submission by DONLab, IIT Madras, " in Proc. Blizzard Chall. Workshop, 2015.
- (2015) Proc. Blizzard Chall. Workshop
- Prakash, A.¹ Baby, A.² Shanmugam, A.S.³ Prakash, J.J.⁴ Nishanthi, N.L.⁵ Krishnan, K.R.⁶ Rupak, V.S.⁷ Murthy, H.A.⁸

17
- 33745224103
- Spontaneous speech: How people really talk and why engineers should care
- E. Shriberg, "Spontaneous speech: how people really talk and why engineers should care, " in Proc. Interspeech, pp. 1781-1784, 2005.
- (2005) Proc. Interspeech , pp. 1781-1784
- Shriberg, E.¹

18
- 84904698123
- Phonetic variations: Impact of the communicative situation
- S. Brognaux and T. Drugman, "Phonetic variations: Impact of the communicative situation, " in Speech Prosody, 2014.
- (2014) Speech Prosody
- Brognaux, S.¹ Drugman, T.²

19
- 0004113976
- Tech. Rep. NCRG/94/004 Neural Computing Research Group, Aston University
- C. M. Bishop, "Mixture density networks, " Tech. Rep. NCRG/94/004, Neural Computing Research Group, Aston University, 1994.
- (1994) Mixture Density Networks
- Bishop, C.M.¹

20
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. ICASSP, pp. 7962-7966, 2013.
- (2013) Proc. ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

21
- 84946033275
- Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
- Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, " in Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Wu, Z.¹ Valentini-Botinhao, C.² Watts, O.³ King, S.⁴

22
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, " in Proc. ICASSP, pp. 3844-3848, 2014.
- (2014) Proc. ICASSP , pp. 3844-3848
- Zen, H.¹ Senior, A.²

23
- 0001640740
- Robust and efficient estimation by minimising a density power divergence
- A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones, "Robust and efficient estimation by minimising a density power divergence, " Biometrika, vol. 85, no. 3, pp. 549-559, 1998.
- (1998) Biometrika , vol.85 , Issue.3 , pp. 549-559
- Basu, A.¹ Harris, I.R.² Hjort, N.L.³ Jones, M.C.⁴

24
- 33645702834
- Tech. Rep. Research Memo 802 Institute of Statistical Mathematics, Tokyo, Japan, June
- S. Eguchi and Y. Kano, "Robustifying maximum likelihood estimation, " Tech. Rep. Research Memo 802, Institute of Statistical Mathematics, Tokyo, Japan, June 2001.
- (2001) Robustifying Maximum Likelihood Estimation
- Eguchi, S.¹ Kano, Y.²

25
- 84973380400
- Emma. Accessed 2015-09-24
- J. Austen and S. Crowther, "Emma, " in LibriVox, 2006. http: //librivox. org/emma-by-jane-austen-solo/. Accessed 2015-09-24.
- (2006)
- Austen, J.¹ Crowther, S.²

26
- 79956282392
- Segmentation of monologues in audio books for building synthetic voices
- K. Prahallad and A. W. Black, "Segmentation of monologues in audio books for building synthetic voices, " IEEE T. Audio Speech, vol. 19, no. 5, pp. 1444-1449, 2011.
- (2011) IEEE T. Audio Speech , vol.19 , Issue.5 , pp. 1444-1449
- Prahallad, K.¹ Black, A.W.²

27
- 33947674781
- Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis
- K. Prahallad, A. W. Black, and R. Mosur, "Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis, " in Proc. ICASSP, vol. 1, pp. I-I, 2006.
- (2006) Proc. ICASSP , vol.1 , pp. I-I
- Prahallad, K.¹ Black, A.W.² Mosur, R.³

28
- 33750915991
- STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds
- H. Kawahara, "STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds, " Acoust. Sci. Technol., vol. 27, no. 6, pp. 349-353, 2006.
- (2006) Acoust. Sci. Technol. , vol.27 , Issue.6 , pp. 349-353
- Kawahara, H.¹

29
- 0001033261
- Robust regression: Asymptotics, conjectures and Monte Carlo
- P. J. Huber, "Robust regression: asymptotics, conjectures and Monte Carlo, " Ann. Stat., pp. 799-821, 1973.
- (1973) Ann. Stat. , pp. 799-821
- Huber, P.J.¹

30
- 84856673205
- Theano: A CPU and GPU math compiler in Python
- J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio, "Theano: A CPU and GPU math compiler in Python, " in Proc. 9th Python in Science Conf., pp. 3-10, 2010.
- (2010) Proc. 9th Python in Science Conf. , pp. 3-10
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ Warde-Farley, D.⁸ Bengio, Y.⁹

31
- 0033708106
- Speech parameter generation algorithms for HMMbased speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMMbased speech synthesis, " in Proc. ICASSP, vol. 3, pp. 1315-1318, 2000.
- (2000) Proc. ICASSP , vol.3 , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

32
- 84883051736
- Geneva, Switzerland, Objective measurement of active speech level, March
- International Telecommunication Union, Telecommunication Standardization Sector, Geneva, Switzerland, Objective measurement of active speech level, March 2011.
- (2011) International Telecommunication Union, Telecommunication Standardization Sector

33
- 84945179107
- Geneva, Switzerland, Method for the subjective assessment of intermediate quality level of audio systems, June
- International Telecommunication Union Radiocommunication Assembly, Geneva, Switzerland, Method for the subjective assessment of intermediate quality level of audio systems, June 2014.
- (2014) International Telecommunication Union Radiocommunication Assembly

34
- 84973406063
- A simple sequentially rejective multiple test procedure
- S. Holm, "A simple sequentially rejective multiple test procedure, " Scand. J. Stat., vol. 6, no. 2, pp. 65-70, 1979.
- (1979) Scand. J. Stat. , vol.6 , Issue.2 , pp. 65-70
- Holm, S.¹

35
- 84910034043
- The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech
- R. Dall, M. Wester, and M. Corley, "The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech, " in Proc. Interspeech, pp. 56-60, 2014.
- (2014) Proc. Interspeech , pp. 56-60
- Dall, R.¹ Wester, M.² Corley, M.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.