메뉴 건너뛰기




Volumn 08-12-September-2016, Issue , 2016, Pages 2463-2467

A template-based approach for speech synthesis intonation generation using LSTMs

Author keywords

CTC; F0 templates; Intonation modelling; LSTM; Speech synthesis

Indexed keywords

FORECASTING; RECURRENT NEURAL NETWORKS; SPEECH; SPEECH PROCESSING; SPEECH SYNTHESIS;

EID: 84994213378     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: 10.21437/Interspeech.2016-96     Document Type: Conference Paper
Times cited : (16)

References (29)
  • 1
    • 84910105608 scopus 로고    scopus 로고
    • Measuring a decade of progress in text-to-speech
    • S. King, "Measuring a decade of progress in text-to-speech," Loquens, vol. 1, no. 1, 2014.
    • (2014) Loquens , vol.1 , Issue.1
    • King, S.1
  • 3
    • 0033708106 scopus 로고    scopus 로고
    • Speech parameter generation algorithms for HMM-based speech synthesis
    • K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, 2000, pp. 1315-1318.
    • (2000) Proc. ICASSP , pp. 1315-1318
    • Tokuda, K.1    Yoshimura, T.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 4
    • 85009139544 scopus 로고    scopus 로고
    • Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
    • T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
    • (1999) Proc. Eurospeech , pp. 2347-2350
    • Yoshimura, T.1    Tokuda, K.2    Masuko, T.3    Kobayashi, T.4    Kitamura, T.5
  • 5
    • 85008023596 scopus 로고    scopus 로고
    • Continuous F0 modeling for HMM based statistical parametric speech synthesis
    • K. Yu and S. Young, "Continuous F0 modeling for HMM based statistical parametric speech synthesis," IEEE T. Audio Speech, vol. 19, no. 5, pp. 1071-1079, 2011.
    • (2011) IEEE T. Audio Speech , vol.19 , Issue.5 , pp. 1071-1079
    • Yu, K.1    Young, S.2
  • 6
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, 2013, pp. 7962-7966.
    • (2013) Proc. ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 7
    • 84973359646 scopus 로고    scopus 로고
    • From HMMs to DNNs: Where do the improvements come from?
    • O. Watts, G. E. Henter, T. Merritt, Z. Wu, and S. King, "From HMMs to DNNs: where do the improvements come from?" in Proc. ICASSP, 2016, pp. 5505-5509.
    • (2016) Proc. ICASSP , pp. 5505-5509
    • Watts, O.1    Henter, G.E.2    Merritt, T.3    Wu, Z.4    King, S.5
  • 8
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. ICASSP, 2015, pp. 4470-4474.
    • (2015) Proc. ICASSP , pp. 4470-4474
    • Zen, H.1    Sak, H.2
  • 9
    • 84973355618 scopus 로고    scopus 로고
    • Investigating gated recurrent networks for speech synthesis
    • Z. Wu and S. King, "Investigating gated recurrent networks for speech synthesis," in Proc. ICASSP, 2016, pp. 5140-5144.
    • (2016) Proc. ICASSP , pp. 5140-5144
    • Wu, Z.1    King, S.2
  • 10
    • 84910068142 scopus 로고    scopus 로고
    • Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks
    • R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks," in Proc. Interspeech, 2014, pp. 2268-2272.
    • (2014) Proc. Interspeech , pp. 2268-2272
    • Fernandez, R.1    Rendel, A.2    Ramabhadran, B.3    Hoory, R.4
  • 11
    • 84950159800 scopus 로고    scopus 로고
    • Modeling F0 trajectories in hierarchically structured deep neural networks
    • X. Yin, M. Lei, Y. Qian, F. K. Soong, L. He, Z.-H. Ling, and L.-R. Dai, "Modeling F0 trajectories in hierarchically structured deep neural networks," Speech Commun., vol. 76, pp. 82-92, 2016.
    • Speech Commun. , vol.76 , Issue.2016 , pp. 82-92
    • Yin, X.1    Lei, M.2    Qian, Y.3    Soong, F.K.4    He, L.5    Ling, Z.-H.6    Dai, L.-R.7
  • 12
    • 84904608062 scopus 로고    scopus 로고
    • SLAM: Automatic stylization and labelling of speech melody
    • N. Obin, J. Beliao, C. Veaux, and A. Lacheret, "SLAM: Automatic stylization and labelling of speech melody," in Proc. Speech Prosody, 2014, pp. 246-250.
    • (2014) Proc. Speech Prosody , pp. 246-250
    • Obin, N.1    Beliao, J.2    Veaux, C.3    Lacheret, A.4
  • 13
    • 84982995064 scopus 로고    scopus 로고
    • JNDSLAM: A SLAM extension for speech synthesis
    • R. Dall and X. Gonzalvo, "JNDSLAM: A SLAM extension for speech synthesis," in Proc. Speech Prosody, 2016, pp. 1024-1028.
    • (2016) Proc. Speech Prosody , pp. 1024-1028
    • Dall, R.1    Gonzalvo, X.2
  • 14
    • 84865748446 scopus 로고    scopus 로고
    • A statistical phrase/accent model for intonation modeling
    • G. K. Anumanchipalli, L. C. Oliveira, and A. W. Black, "A statistical phrase/accent model for intonation modeling," in Proc. Interspeech, 2011, pp. 1813-1816.
    • (2011) Proc. Interspeech , pp. 1813-1816
    • Anumanchipalli, G.K.1    Oliveira, L.C.2    Black, A.W.3
  • 15
    • 33749259827 scopus 로고    scopus 로고
    • Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
    • A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proc. ICML, 2006, pp. 369-376.
    • (2006) Proc ICML , pp. 369-376
    • Graves, A.1    Fernández, S.2    Gomez, F.3    Schmidhuber, J.4
  • 16
    • 84867194192 scopus 로고    scopus 로고
    • Multilevel parametric-base F0 model for speech synthesis
    • J. Latorre and M. Akamine, "Multilevel parametric-base F0 model for speech synthesis," in Proc. Interspeech, 2008, pp. 2274-2277.
    • (2008) Proc. Interspeech , pp. 2274-2277
    • Latorre, J.1    Akamine, M.2
  • 17
    • 51449117929 scopus 로고    scopus 로고
    • Modelling and synthesising F0 contours with the discrete cosine transform
    • J. Teutenberg, C. Watson, and P. Riddle, "Modelling and synthesising F0 contours with the discrete cosine transform," in Proc. ICASSP, 2008, pp. 3973-3976.
    • (2008) Proc. ICASSP , pp. 3973-3976
    • Teutenberg, J.1    Watson, C.2    Riddle, P.3
  • 18
    • 85008039410 scopus 로고    scopus 로고
    • Improved prosody generation by maximizing joint probability of state and longer units
    • Y. Qian, Z.Wu, B. Gao, and F. K. Soong, "Improved prosody generation by maximizing joint probability of state and longer units," IEEE T. Audio Speech, vol. 19, no. 6, pp. 1702-1710, 2011.
    • (2011) IEEE T. Audio Speech , vol.19 , Issue.6 , pp. 1702-1710
    • Qian, Y.1    Wu, Z.2    Gao, B.3    Soong, F.K.4
  • 19
    • 84946045633 scopus 로고    scopus 로고
    • Wavelets for intonation modeling inHMMspeech synthesis
    • A. Suni, D. Aalto, T. Raitio, P. Alku, and M. Vainio, "Wavelets for intonation modeling inHMMspeech synthesis," in Proc. SSW, vol. 8, 2013, pp. 285-290.
    • (2013) Proc. SSW , vol.8 , pp. 285-290
    • Suni, A.1    Aalto, D.2    Raitio, T.3    Alku, P.4    Vainio, M.5
  • 20
    • 84946044619 scopus 로고    scopus 로고
    • A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform
    • M. S. Ribeiro and R. A. J. Clark, "A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform," in Proc. ICASSP, 2015, pp. 4909-4913.
    • (2015) Proc. ICASSP , pp. 4909-4913
    • Ribeiro, M.S.1    Clark, R.A.J.2
  • 21
    • 84910044428 scopus 로고    scopus 로고
    • Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree
    • X. Yin, M. Lei, Y. Qian, F. K. Soong, L. He, Z.-H. Ling, and L.-R. Dai, "Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree," in Proc. Interspeech, 2014, pp. 2273-2277.
    • (2014) Proc. Interspeech , pp. 2273-2277
    • Yin, X.1    Lei, M.2    Qian, Y.3    Soong, F.K.4    He, L.5    Ling, Z.-H.6    Dai, L.-R.7
  • 22
    • 0001481529 scopus 로고    scopus 로고
    • Bark and ERB bilinear transforms
    • J. O. Smith III and J. S. Abel, "Bark and ERB bilinear transforms," IEEE T. Speech Audi. P., vol. 7, no. 6, pp. 697-708, 1999.
    • (1999) IEEE T. Speech Audi. P. , vol.7 , Issue.6 , pp. 697-708
    • Smith, J.O.1    Abel, J.S.2
  • 23
    • 0014129195 scopus 로고
    • Hierarchical clustering schemes
    • S. C. Johnson, "Hierarchical clustering schemes," Psychometrika, vol. 32, no. 3, pp. 241-254, 1967.
    • (1967) Psychometrika , vol.32 , Issue.3 , pp. 241-254
    • Johnson, S.C.1
  • 24
    • 33947674781 scopus 로고    scopus 로고
    • Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis
    • K. Prahallad, A.W. Black, and R. Mosur, "Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis," in Proc. ICASSP, 2006, pp. I-853-I-856.
    • (2006) Proc. ICASSP , pp. I853-I856
    • Prahallad, K.1    Black, A.W.2    Mosur, R.3
  • 25
    • 84946033275 scopus 로고    scopus 로고
    • Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
    • Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis," in Proc. ICASSP, 2015, pp. 4460-4464.
    • (2015) Proc. ICASSP , pp. 4460-4464
    • Wu, Z.1    Valentini-Botinhao, C.2    Watts, O.3    King, S.4
  • 26
    • 33750915991 scopus 로고    scopus 로고
    • STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds
    • H. Kawahara, "STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds," Acoust. Sci. Technol., vol. 27, no. 6, pp. 349-353, 2006.
    • (2006) Acoust. Sci. Technol. , vol.27 , Issue.6 , pp. 349-353
    • Kawahara, H.1
  • 28
    • 13344250603 scopus 로고    scopus 로고
    • ITU Recommendation ITU-R BS.1534-3, International Telecommunication Union Radiocommunication Sector, Geneva, Switzerland, October
    • Method for the subjective assessment of intermediate quality levels of coding systems, ITU Recommendation ITU-R BS.1534- 3, International Telecommunication Union Radiocommunication Sector, Geneva, Switzerland, October 2015.
    • (2015) Method for the Subjective Assessment of Intermediate Quality Levels of Coding Systems
  • 29
    • 84994267677 scopus 로고
    • A simple sequentially rejective multiple test procedure
    • S. Holm, "A simple sequentially rejective multiple test procedure," Scand. J. Stat., vol. 6, no. 2, pp. 65-70, 1979.
    • (1979) Scand. J. Stat. , vol.6 , Issue.2 , pp. 65-70
    • Holm, S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.