SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 08-12-September-2016, Issue , 2016, Pages 2463-2467

A template-based approach for speech synthesis intonation generation using LSTMs

(4) Ronanki, Srikanth a Henter, Gustav Eje a Wu, Zhizheng a King, Simon a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

CTC; F0 templates; Intonation modelling; LSTM; Speech synthesis

Indexed keywords

FORECASTING; RECURRENT NEURAL NETWORKS; SPEECH; SPEECH PROCESSING; SPEECH SYNTHESIS;

F0 TEMPLATES; FUNDAMENTAL FREQUENCIES; INTONATION GENERATIONS; LSTM; RECURRENT NEURAL NETWORK (RNN); SPEECH SYNTHESIS SYSTEM; TEMPLATE-BASED APPROACHES; TEMPORAL CLASSIFICATION;

SPEECH COMMUNICATION;

EID: 84994213378 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: 10.21437/Interspeech.2016-96 Document Type: Conference Paper

Times cited : (16)

References (29)

1
- 84910105608
- Measuring a decade of progress in text-to-speech
- S. King, "Measuring a decade of progress in text-to-speech," Loquens, vol. 1, no. 1, 2014.
- (2014) Loquens , vol.1 , Issue.1
- King, S.¹

2
- 85133720638
- The HMM-based speech synthesis system (HTS) version 2.0
- H. Zen, T. Nose, J. Yamagishi, S. Sako, T. Masuko, A. Black, and K. Tokuda, "The HMM-based speech synthesis system (HTS) version 2.0," in Proc. SSW, vol. 6, 2007, pp. 294-299.
- (2007) Proc. SSW , vol.6 , pp. 294-299
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.⁶ Tokuda, K.⁷

3
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-based speech synthesis," in Proc. ICASSP, 2000, pp. 1315-1318.
- (2000) Proc. ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

4
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- T. Yoshimura, K. Tokuda, T. Masuko, T. Kobayashi, and T. Kitamura, "Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis," in Proc. Eurospeech, 1999, pp. 2347-2350.
- (1999) Proc. Eurospeech , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

5
- 85008023596
- Continuous F0 modeling for HMM based statistical parametric speech synthesis
- K. Yu and S. Young, "Continuous F0 modeling for HMM based statistical parametric speech synthesis," IEEE T. Audio Speech, vol. 19, no. 5, pp. 1071-1079, 2011.
- (2011) IEEE T. Audio Speech , vol.19 , Issue.5 , pp. 1071-1079
- Yu, K.¹ Young, S.²

6
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, 2013, pp. 7962-7966.
- (2013) Proc. ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

7
- 84973359646
- From HMMs to DNNs: Where do the improvements come from?
- O. Watts, G. E. Henter, T. Merritt, Z. Wu, and S. King, "From HMMs to DNNs: where do the improvements come from?" in Proc. ICASSP, 2016, pp. 5505-5509.
- (2016) Proc. ICASSP , pp. 5505-5509
- Watts, O.¹ Henter, G.E.² Merritt, T.³ Wu, Z.⁴ King, S.⁵

8
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. ICASSP, 2015, pp. 4470-4474.
- (2015) Proc. ICASSP , pp. 4470-4474
- Zen, H.¹ Sak, H.²

9
- 84973355618
- Investigating gated recurrent networks for speech synthesis
- Z. Wu and S. King, "Investigating gated recurrent networks for speech synthesis," in Proc. ICASSP, 2016, pp. 5140-5144.
- (2016) Proc. ICASSP , pp. 5140-5144
- Wu, Z.¹ King, S.²

10
- 84910068142
- Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks
- R. Fernandez, A. Rendel, B. Ramabhadran, and R. Hoory, "Prosody contour prediction with long short-term memory, bidirectional, deep recurrent neural networks," in Proc. Interspeech, 2014, pp. 2268-2272.
- (2014) Proc. Interspeech , pp. 2268-2272
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

11
- 84950159800
- Modeling F0 trajectories in hierarchically structured deep neural networks
- X. Yin, M. Lei, Y. Qian, F. K. Soong, L. He, Z.-H. Ling, and L.-R. Dai, "Modeling F0 trajectories in hierarchically structured deep neural networks," Speech Commun., vol. 76, pp. 82-92, 2016.
- Speech Commun. , vol.76 , Issue.2016 , pp. 82-92
- Yin, X.¹ Lei, M.² Qian, Y.³ Soong, F.K.⁴ He, L.⁵ Ling, Z.-H.⁶ Dai, L.-R.⁷

12
- 84904608062
- SLAM: Automatic stylization and labelling of speech melody
- N. Obin, J. Beliao, C. Veaux, and A. Lacheret, "SLAM: Automatic stylization and labelling of speech melody," in Proc. Speech Prosody, 2014, pp. 246-250.
- (2014) Proc. Speech Prosody , pp. 246-250
- Obin, N.¹ Beliao, J.² Veaux, C.³ Lacheret, A.⁴

13
- 84982995064
- JNDSLAM: A SLAM extension for speech synthesis
- R. Dall and X. Gonzalvo, "JNDSLAM: A SLAM extension for speech synthesis," in Proc. Speech Prosody, 2016, pp. 1024-1028.
- (2016) Proc. Speech Prosody , pp. 1024-1028
- Dall, R.¹ Gonzalvo, X.²

14
- 84865748446
- A statistical phrase/accent model for intonation modeling
- G. K. Anumanchipalli, L. C. Oliveira, and A. W. Black, "A statistical phrase/accent model for intonation modeling," in Proc. Interspeech, 2011, pp. 1813-1816.
- (2011) Proc. Interspeech , pp. 1813-1816
- Anumanchipalli, G.K.¹ Oliveira, L.C.² Black, A.W.³

15
- 33749259827
- Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks
- A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber, "Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks," in Proc. ICML, 2006, pp. 369-376.
- (2006) Proc ICML , pp. 369-376
- Graves, A.¹ Fernández, S.² Gomez, F.³ Schmidhuber, J.⁴

16
- 84867194192
- Multilevel parametric-base F0 model for speech synthesis
- J. Latorre and M. Akamine, "Multilevel parametric-base F0 model for speech synthesis," in Proc. Interspeech, 2008, pp. 2274-2277.
- (2008) Proc. Interspeech , pp. 2274-2277
- Latorre, J.¹ Akamine, M.²

17
- 51449117929
- Modelling and synthesising F0 contours with the discrete cosine transform
- J. Teutenberg, C. Watson, and P. Riddle, "Modelling and synthesising F0 contours with the discrete cosine transform," in Proc. ICASSP, 2008, pp. 3973-3976.
- (2008) Proc. ICASSP , pp. 3973-3976
- Teutenberg, J.¹ Watson, C.² Riddle, P.³

18
- 85008039410
- Improved prosody generation by maximizing joint probability of state and longer units
- Y. Qian, Z.Wu, B. Gao, and F. K. Soong, "Improved prosody generation by maximizing joint probability of state and longer units," IEEE T. Audio Speech, vol. 19, no. 6, pp. 1702-1710, 2011.
- (2011) IEEE T. Audio Speech , vol.19 , Issue.6 , pp. 1702-1710
- Qian, Y.¹ Wu, Z.² Gao, B.³ Soong, F.K.⁴

19
- 84946045633
- Wavelets for intonation modeling inHMMspeech synthesis
- A. Suni, D. Aalto, T. Raitio, P. Alku, and M. Vainio, "Wavelets for intonation modeling inHMMspeech synthesis," in Proc. SSW, vol. 8, 2013, pp. 285-290.
- (2013) Proc. SSW , vol.8 , pp. 285-290
- Suni, A.¹ Aalto, D.² Raitio, T.³ Alku, P.⁴ Vainio, M.⁵

20
- 84946044619
- A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform
- M. S. Ribeiro and R. A. J. Clark, "A multi-level representation of f0 using the continuous wavelet transform and the discrete cosine transform," in Proc. ICASSP, 2015, pp. 4909-4913.
- (2015) Proc. ICASSP , pp. 4909-4913
- Ribeiro, M.S.¹ Clark, R.A.J.²

21
- 84910044428
- Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree
- X. Yin, M. Lei, Y. Qian, F. K. Soong, L. He, Z.-H. Ling, and L.-R. Dai, "Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree," in Proc. Interspeech, 2014, pp. 2273-2277.
- (2014) Proc. Interspeech , pp. 2273-2277
- Yin, X.¹ Lei, M.² Qian, Y.³ Soong, F.K.⁴ He, L.⁵ Ling, Z.-H.⁶ Dai, L.-R.⁷

22
- 0001481529
- Bark and ERB bilinear transforms
- J. O. Smith III and J. S. Abel, "Bark and ERB bilinear transforms," IEEE T. Speech Audi. P., vol. 7, no. 6, pp. 697-708, 1999.
- (1999) IEEE T. Speech Audi. P. , vol.7 , Issue.6 , pp. 697-708
- Smith, J.O.¹ Abel, J.S.²

23
- 0014129195
- Hierarchical clustering schemes
- S. C. Johnson, "Hierarchical clustering schemes," Psychometrika, vol. 32, no. 3, pp. 241-254, 1967.
- (1967) Psychometrika , vol.32 , Issue.3 , pp. 241-254
- Johnson, S.C.¹

24
- 33947674781
- Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis
- K. Prahallad, A.W. Black, and R. Mosur, "Sub-phonetic modeling for capturing pronunciation variations for conversational speech synthesis," in Proc. ICASSP, 2006, pp. I-853-I-856.
- (2006) Proc. ICASSP , pp. I853-I856
- Prahallad, K.¹ Black, A.W.² Mosur, R.³

25
- 84946033275
- Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
- Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis," in Proc. ICASSP, 2015, pp. 4460-4464.
- (2015) Proc. ICASSP , pp. 4460-4464
- Wu, Z.¹ Valentini-Botinhao, C.² Watts, O.³ King, S.⁴

26
- 33750915991
- STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds
- H. Kawahara, "STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds," Acoust. Sci. Technol., vol. 27, no. 6, pp. 349-353, 2006.
- (2006) Acoust. Sci. Technol. , vol.27 , Issue.6 , pp. 349-353
- Kawahara, H.¹

27
- 84994311603
- Union, Telecommunication Standardization Sector, Geneva, Switzerland, March
- Objective measurement of active speech level, ITU Recommendation ITU-T P.56, International Telecommunication Union, Telecommunication Standardization Sector, Geneva, Switzerland, March 2011.
- (2011) Objective Measurement of Active Speech Level, ITU Recommendation ITU-T P.56, International Telecommunication

28
- 13344250603
- ITU Recommendation ITU-R BS.1534-3, International Telecommunication Union Radiocommunication Sector, Geneva, Switzerland, October
- Method for the subjective assessment of intermediate quality levels of coding systems, ITU Recommendation ITU-R BS.1534- 3, International Telecommunication Union Radiocommunication Sector, Geneva, Switzerland, October 2015.
- (2015) Method for the Subjective Assessment of Intermediate Quality Levels of Coding Systems

29
- 84994267677
- A simple sequentially rejective multiple test procedure
- S. Holm, "A simple sequentially rejective multiple test procedure," Scand. J. Stat., vol. 6, no. 2, pp. 65-70, 1979.
- (1979) Scand. J. Stat. , vol.6 , Issue.2 , pp. 65-70
- Holm, S.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.