메뉴 건너뛰기




Volumn 2016-May, Issue , 2016, Pages 5130-5134

Robust TTS duration modelling using DNNS

Author keywords

duration modelling; robust statistics; Speech synthesis

Indexed keywords


EID: 84973395039     PISSN: 15206149     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/ICASSP.2016.7472655     Document Type: Conference Paper
Times cited : (37)

References (35)
  • 2
    • 84867223798 scopus 로고    scopus 로고
    • Robustness of HMMbased speech synthesis
    • J. Yamagishi, Z.-H. Ling, and S. King, "Robustness of HMMbased speech synthesis, " in Proc. Interspeech, pp. 581-584, 2008.
    • (2008) Proc. Interspeech , pp. 581-584
    • Yamagishi, J.1    Ling, Z.-H.2    King, S.3
  • 3
    • 84897869648 scopus 로고    scopus 로고
    • Noise in HMM-based speech synthesis adaptation: Analysis, evaluation methods and experiments
    • R. Karhila, U. Remes, and M. Kurimo, "Noise in HMM-based speech synthesis adaptation: Analysis, evaluation methods and experiments, " IEEE J. Sel. Top. Signa., vol. 8, no. 2, pp. 285-295, 2014.
    • (2014) IEEE J. Sel. Top. Signa. , vol.8 , Issue.2 , pp. 285-295
    • Karhila, R.1    Remes, U.2    Kurimo, M.3
  • 4
    • 0023407575 scopus 로고
    • Review of text-to-speech conversion for English
    • D. H. Klatt, "Review of text-to-speech conversion for English, " J. Acoust. Soc. Am., vol. 82, no. 3, pp. 737-793, 1987.
    • (1987) J. Acoust. Soc. Am. , vol.82 , Issue.3 , pp. 737-793
    • Klatt, D.H.1
  • 5
    • 0034854347 scopus 로고    scopus 로고
    • Joint prosody prediction and unit selection for concatenative speech synthesis
    • I. Bulyko and M. Ostendorf, "Joint prosody prediction and unit selection for concatenative speech synthesis, " in Proc. ICASSP, vol. 2, pp. 781-784, 2001.
    • (2001) Proc. ICASSP , vol.2 , pp. 781-784
    • Bulyko, I.1    Ostendorf, M.2
  • 6
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 7
    • 84856237844 scopus 로고    scopus 로고
    • An introduction to statistical parametric speech synthesis
    • S. King, "An introduction to statistical parametric speech synthesis, " Sadhana, vol. 36, no. 5, pp. 837-852, 2011.
    • (2011) Sadhana , vol.36 , Issue.5 , pp. 837-852
    • King, S.1
  • 9
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech recognition
    • L. R. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition, " Proc. IEEE, vol. 77, no. 2, pp. 257-286, 1989.
    • (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.R.1
  • 11
    • 0004262735 scopus 로고    scopus 로고
    • New York, NY: Springer, 2nd ed.
    • P. J. Huber, Robust Statistics. New York, NY: Springer, 2nd ed., 2011.
    • (2011) Robust Statistics
    • Huber, P.J.1
  • 13
    • 84910028520 scopus 로고    scopus 로고
    • Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
    • G. E. Henter, T. Merritt, M. Shannon, C. Mayo, and S. King, "Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech, " in Proc. Interspeech, pp. 1504-1508, 2014.
    • (2014) Proc. Interspeech , pp. 1504-1508
    • Henter, G.E.1    Merritt, T.2    Shannon, M.3    Mayo, C.4    King, S.5
  • 14
    • 84857550885 scopus 로고    scopus 로고
    • Robust full-waveform inversion using the Student's t-distribution
    • A. Y. Aravkin, T. van Leeuwen, and F. J. Herrmann, "Robust full-waveform inversion using the Student's t-distribution, " in SEG Tech. Program Expand. Abstr., vol. 30, pp. 2669-2673, 2011.
    • (2011) SEG Tech. Program Expand. Abstr. , vol.30 , pp. 2669-2673
    • Aravkin, A.Y.1    Van Leeuwen, T.2    Herrmann, F.J.3
  • 17
    • 33745224103 scopus 로고    scopus 로고
    • Spontaneous speech: How people really talk and why engineers should care
    • E. Shriberg, "Spontaneous speech: how people really talk and why engineers should care, " in Proc. Interspeech, pp. 1781-1784, 2005.
    • (2005) Proc. Interspeech , pp. 1781-1784
    • Shriberg, E.1
  • 18
    • 84904698123 scopus 로고    scopus 로고
    • Phonetic variations: Impact of the communicative situation
    • S. Brognaux and T. Drugman, "Phonetic variations: Impact of the communicative situation, " in Speech Prosody, 2014.
    • (2014) Speech Prosody
    • Brognaux, S.1    Drugman, T.2
  • 19
    • 0004113976 scopus 로고
    • Tech. Rep. NCRG/94/004 Neural Computing Research Group, Aston University
    • C. M. Bishop, "Mixture density networks, " Tech. Rep. NCRG/94/004, Neural Computing Research Group, Aston University, 1994.
    • (1994) Mixture Density Networks
    • Bishop, C.M.1
  • 20
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. ICASSP, pp. 7962-7966, 2013.
    • (2013) Proc. ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 21
    • 84946033275 scopus 로고    scopus 로고
    • Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
    • Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, " in Proc. ICASSP, 2015.
    • (2015) Proc. ICASSP
    • Wu, Z.1    Valentini-Botinhao, C.2    Watts, O.3    King, S.4
  • 22
    • 84905262874 scopus 로고    scopus 로고
    • Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
    • H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, " in Proc. ICASSP, pp. 3844-3848, 2014.
    • (2014) Proc. ICASSP , pp. 3844-3848
    • Zen, H.1    Senior, A.2
  • 23
    • 0001640740 scopus 로고    scopus 로고
    • Robust and efficient estimation by minimising a density power divergence
    • A. Basu, I. R. Harris, N. L. Hjort, and M. C. Jones, "Robust and efficient estimation by minimising a density power divergence, " Biometrika, vol. 85, no. 3, pp. 549-559, 1998.
    • (1998) Biometrika , vol.85 , Issue.3 , pp. 549-559
    • Basu, A.1    Harris, I.R.2    Hjort, N.L.3    Jones, M.C.4
  • 24
    • 33645702834 scopus 로고    scopus 로고
    • Tech. Rep. Research Memo 802 Institute of Statistical Mathematics, Tokyo, Japan, June
    • S. Eguchi and Y. Kano, "Robustifying maximum likelihood estimation, " Tech. Rep. Research Memo 802, Institute of Statistical Mathematics, Tokyo, Japan, June 2001.
    • (2001) Robustifying Maximum Likelihood Estimation
    • Eguchi, S.1    Kano, Y.2
  • 25
    • 84973380400 scopus 로고    scopus 로고
    • Emma. Accessed 2015-09-24
    • J. Austen and S. Crowther, "Emma, " in LibriVox, 2006. http: //librivox. org/emma-by-jane-austen-solo/. Accessed 2015-09-24.
    • (2006)
    • Austen, J.1    Crowther, S.2
  • 26
    • 79956282392 scopus 로고    scopus 로고
    • Segmentation of monologues in audio books for building synthetic voices
    • K. Prahallad and A. W. Black, "Segmentation of monologues in audio books for building synthetic voices, " IEEE T. Audio Speech, vol. 19, no. 5, pp. 1444-1449, 2011.
    • (2011) IEEE T. Audio Speech , vol.19 , Issue.5 , pp. 1444-1449
    • Prahallad, K.1    Black, A.W.2
  • 27
    • 33947674781 scopus 로고    scopus 로고
    • Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis
    • K. Prahallad, A. W. Black, and R. Mosur, "Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis, " in Proc. ICASSP, vol. 1, pp. I-I, 2006.
    • (2006) Proc. ICASSP , vol.1 , pp. I-I
    • Prahallad, K.1    Black, A.W.2    Mosur, R.3
  • 28
    • 33750915991 scopus 로고    scopus 로고
    • STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds
    • H. Kawahara, "STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds, " Acoust. Sci. Technol., vol. 27, no. 6, pp. 349-353, 2006.
    • (2006) Acoust. Sci. Technol. , vol.27 , Issue.6 , pp. 349-353
    • Kawahara, H.1
  • 29
    • 0001033261 scopus 로고
    • Robust regression: Asymptotics, conjectures and Monte Carlo
    • P. J. Huber, "Robust regression: asymptotics, conjectures and Monte Carlo, " Ann. Stat., pp. 799-821, 1973.
    • (1973) Ann. Stat. , pp. 799-821
    • Huber, P.J.1
  • 31
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMMbased speech synthesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMMbased speech synthesis, " in Proc. ICASSP, vol. 3, pp. 1315-1318, 2000.
    • (2000) Proc. ICASSP , vol.3 , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 33
    • 84945179107 scopus 로고    scopus 로고
    • Geneva, Switzerland, Method for the subjective assessment of intermediate quality level of audio systems, June
    • International Telecommunication Union Radiocommunication Assembly, Geneva, Switzerland, Method for the subjective assessment of intermediate quality level of audio systems, June 2014.
    • (2014) International Telecommunication Union Radiocommunication Assembly
  • 34
    • 84973406063 scopus 로고
    • A simple sequentially rejective multiple test procedure
    • S. Holm, "A simple sequentially rejective multiple test procedure, " Scand. J. Stat., vol. 6, no. 2, pp. 65-70, 1979.
    • (1979) Scand. J. Stat. , vol.6 , Issue.2 , pp. 65-70
    • Holm, S.1
  • 35
    • 84910034043 scopus 로고    scopus 로고
    • The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech
    • R. Dall, M. Wester, and M. Corley, "The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech, " in Proc. Interspeech, pp. 56-60, 2014.
    • (2014) Proc. Interspeech , pp. 56-60
    • Dall, R.1    Wester, M.2    Corley, M.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.