SCOPUS 정보 검색 플랫폼

Speech Communication

Volumn 76, Issue , 2016, Pages 82-92

Modeling F0 trajectories in hierarchically structured deep neural networks

(7) Yin, Xiang a,b Lei, Ming b Qian, Yao b Soong, Frank K b He, Lei b Ling, Zhen Hua a Dai, Li Rong a

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b MICROSOFT (United States)

Author keywords

Deep neural network; Discrete cosine transform; Fundamental frequency; Hidden Markov model; Speech synthesis

Indexed keywords

DISCRETE COSINE TRANSFORMS; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MODEL STRUCTURES; SPEECH SYNTHESIS; SYNTHESIS (CHEMICAL);

CASCADE STRUCTURES; CONTEXTUAL FEATURE; DEEP NEURAL NETWORKS; DISCRETE COSINE TRANSFORM(DCT); FUNDAMENTAL FREQUENCIES; PARALLEL STRUCTURES; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; SUBJECTIVE PERFORMANCE;

COMPLEX NETWORKS;

EID: 84950159800 PISSN: 01676393 EISSN: None Source Type: Journal
DOI: 10.1016/j.specom.2015.10.007 Document Type: Article

Times cited : (26)

References (35)

1
- 84905283451
- New methods in continuous mandarin speech recognition
- Chen C.J., Gopinath R.A., Monkowski M.D., Picheny M.A., and Shen K. New methods in continuous mandarin speech recognition Eurospeech 1997 1543 1546
- (1997) Eurospeech , pp. 1543-1546
- Chen, C.J.¹ Gopinath, R.A.² Monkowski, M.D.³ Picheny, M.A.⁴ Shen, K.⁵

2
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Fan Y.C., Qian Y., and Soong F.K. TTS synthesis with bidirectional LSTM based recurrent neural networks Interspeech 2014 1964 1968
- (2014) Interspeech , pp. 1964-1968
- Fan, Y.C.¹ Qian, Y.² Soong, F.K.³

3
- 84910068142
- Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks
- Fernandez R., Rendel A., Ramabhadran B., and Hoory R. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. Interspeech 2014 2268 2272
- (2014) Interspeech , pp. 2268-2272
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

4
- 84867218240
- In search of models in speech communication research
- Fujisaki H. In search of models in speech communication research. Interspeech 2008 1 10
- (2008) Interspeech , pp. 1-10
- Fujisaki, H.¹

5
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- Kawahara H., Masuda-Katsuse I., and de Cheveigné A. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds Speech Commun. 27 3 1999 187 208
- (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-208
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

6
- 84867194192
- Multilevel parametric-base F0 model for speech synthesis
- Latorre J., and Akamine M. Multilevel parametric-base F0 model for speech synthesis Proceedings of Interspeech 2008 2274 2277
- (2008) Proceedings of Interspeech , pp. 2274-2277
- Latorre, J.¹ Akamine, M.²

7
- 79959844205
- A hierarchical F0 modeling method for HMM-based speech synthesis
- Lei M., Wu Y.J., Soong F.K., Ling Z.H., and Dai L.R. A hierarchical F0 modeling method for HMM-based speech synthesis. Interspeech 2010 2170 2173
- (2010) Interspeech , pp. 2170-2173
- Lei, M.¹ Wu, Y.J.² Soong, F.K.³ Ling, Z.H.⁴ Dai, L.R.⁵

8
- 84901237776
- Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Ling Z.H., Li D., and Yu D. Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis IEEE Trans. Audio, Speech Language Proc. 21 10 2013 2129 2139
- (2013) IEEE Trans. Audio, Speech Language Proc. , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.H.¹ Li, D.² Yu, D.³

9
- 67650851754
- USTC system for Blizzard Challenge 2006: An improved HMM-based speech synthesis method
- Ling Z.-H., Wu Y.J., Wang Y.P., Qin L., and Wang R.H. USTC system for Blizzard Challenge 2006: an improved HMM-based speech synthesis method Proceedings of Blizzard Challenge Workshop 2006
- (2006) Proceedings of Blizzard Challenge Workshop
- Ling, Z.-H.¹ Wu, Y.J.² Wang, Y.P.³ Qin, L.⁴ Wang, R.H.⁵

10
- 0018983824
- A fast cosine transform in one and two dimensions
- Makhoul J. A fast cosine transform in one and two dimensions IEEE Trans. Acoustics, Speech Signal Proc. 28 1 1980 27 34
- (1980) IEEE Trans. Acoustics, Speech Signal Proc. , vol.28 , Issue.1 , pp. 27-34
- Makhoul, J.¹

11
- 84865714286
- Stylization and trajectory modelling of short and long term speech prosody variations
- Obin N., Lacheret A., and Rodet X. Stylization and trajectory modelling of short and long term speech prosody variations Proceedings of Interspeech 2011 2029 2032
- (2011) Proceedings of Interspeech , pp. 2029-2032
- Obin, N.¹ Lacheret, A.² Rodet, X.³

12
- 84905251808
- On the training aspects of deep neural network (DNN) for parametric TTS synthesis
- Qian Y., Fan Y.C., Hu W.-P., and Soong F.K. On the training aspects of deep neural network (DNN) for parametric TTS synthesis Proceedings of ICASSP 2014 3857 3861
- (2014) Proceedings of ICASSP , pp. 3857-3861
- Qian, Y.¹ Fan, Y.C.² Hu, W.-P.³ Soong, F.K.⁴

13
- 84867200235
- Generating natural F0 trajectory with additive trees
- Qian Y., Liang H., and Soong F.K. Generating natural F0 trajectory with additive trees. Proceedings of Interspeech 2008 2126 2129
- (2008) Proceedings of Interspeech , pp. 2126-2129
- Qian, Y.¹ Liang, H.² Soong, F.K.³

14
- 85008039410
- Improved prosody generation by maximizing joint probability of state and longer units
- Qian Y., Wu Z., Gao B., and Soong F.K. Improved prosody generation by maximizing joint probability of state and longer units IEEE Trans. Audio, Speech, Language Proc. 19 6 2011 1702 1710
- (2011) IEEE Trans. Audio, Speech, Language Proc. , vol.19 , Issue.6 , pp. 1702-1710
- Qian, Y.¹ Wu, Z.² Gao, B.³ Soong, F.K.⁴

15
- 0033906251
- MDL-based context-dependent sub-word modeling for speech recognition
- Shinoda K., and Watanabe T. MDL-based context-dependent sub-word modeling for speech recognition J. Acoust. Soc. Jpn(E) 21 2 2000 79 86
- (2000) J. Acoust. Soc. Jpn(E) , vol.21 , Issue.2 , pp. 79-86
- Shinoda, K.¹ Watanabe, T.²

16
- 0001455934
- A robust algorithm for pitch tracking (RAPT)
- Talkin D. A robust algorithm for pitch tracking (RAPT) Speech Coding Synthesis 1995 495 518
- (1995) Speech Coding Synthesis , pp. 495-518
- Talkin, D.¹

17
- 51449117929
- Modelling and synthesising F0 contours with the discrete cosine transform
- Teutenberg J., Watson C., and Riddle P. Modelling and synthesising F0 contours with the discrete cosine transform Proceedings of ICASSP 2008 3973 3976
- (2008) Proceedings of ICASSP , pp. 3973-3976
- Teutenberg, J.¹ Watson, C.² Riddle, P.³

18
- 33846410497
- Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- Toda T., and Tokuda K. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis Proceedings of Eurospeech 2005 1315 1318
- (2005) Proceedings of Eurospeech , pp. 1315-1318
- Toda, T.¹ Tokuda, K.²

19
- 0028996993
- Speech parameter generation from HMM using dynamic features
- Tokuda K., Kobayashi T., and Imai S. Speech parameter generation from HMM using dynamic features Proceedings of ICASSP 1995 660 663
- (1995) Proceedings of ICASSP , pp. 660-663
- Tokuda, K.¹ Kobayashi, T.² Imai, S.³

20
- 0032678076
- Hidden markov models based on multi-space probability distribution for pitch pattern modeling
- Tokuda K., Masuko T., Miyazaki N., and Kobayashi T. Hidden markov models based on multi-space probability distribution for pitch pattern modeling Proceedings of ICASSP 1999 229 232
- (1999) Proceedings of ICASSP , pp. 229-232
- Tokuda, K.¹ Masuko, T.² Miyazaki, N.³ Kobayashi, T.⁴

21
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- Tokuda K., Yoshimura T., Masuko T., Kobayashi T., and Kitamura T. Speech parameter generation algorithms for HMM-based speech synthesis Proceedings of ICASSP 2000 1315 1318
- (2000) Proceedings of ICASSP , pp. 1315-1318
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

22
- 60849084576
- Multi-layer F0 modeling for HMM-based speech synthesis
- Wang C.C., Ling Z.H., Zhang B.F., and Dai L.R. Multi-layer F0 modeling for HMM-based speech synthesis Proceedings of ISCSLP 2008 129 132
- (2008) Proceedings of ISCSLP , pp. 129-132
- Wang, C.C.¹ Ling, Z.H.² Zhang, B.F.³ Dai, L.R.⁴

23
- 0001556085
- Tobi: A standard for labeling english prosody
- Wightman C., Price P., Pierrehumbert J., and Hirschberg J. Tobi: a standard for labeling english prosody Proceedings of ICSLP 1992 12 16
- (1992) Proceedings of ICSLP , pp. 12-16
- Wightman, C.¹ Price, P.² Pierrehumbert, J.³ Hirschberg, J.⁴

24
- 84867589421
- Modeling pitch trajectory by hierarchical HMM with minimum generation error training
- Wu Y.J., and Soong F.K. Modeling pitch trajectory by hierarchical HMM with minimum generation error training Proceedings of ICASSP 2012 4017 4020 10.1109/ICASSP.2012.6288799
- (2012) Proceedings of ICASSP , pp. 4017-4020
- Wu, Y.J.¹ Soong, F.K.²

25
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Wu Y.J., and Wang R.H. Minimum generation error training for HMM-based speech synthesis Proceedings of ICASSP 2006 89 92 10.1109/ICASSP.2006.1659964
- (2006) Proceedings of ICASSP , pp. 89-92
- Wu, Y.J.¹ Wang, R.H.²

26
- 34547517493
- Full HMM training for minimizing generation error in synthesis
- Wu Y.J., Wang R.H., and Soong F.K. Full HMM training for minimizing generation error in synthesis ICASSP 2007 517 520
- (2007) ICASSP , pp. 517-520
- Wu, Y.J.¹ Wang, R.H.² Soong, F.K.³

27
- 60849112575
- Modeling and generating tone contour with phrase intonation for mandarin chinese speech
- Wu Z., Qian Y., Soong F.K., and Zhang B. Modeling and generating tone contour with phrase intonation for mandarin chinese speech Proceedings of ISCSLP 2008 1 4 10.1109/CHINSL.2008.ECP.42
- (2008) Proceedings of ISCSLP , pp. 1-4
- Wu, Z.¹ Qian, Y.² Soong, F.K.³ Zhang, B.⁴

28
- 84910044428
- Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree
- Yin X., Lei M., Qian Y., Soong F.-K., He L., Ling Z.H., and Dai L.R. Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree Proceedings of Interspeech 2014 2273 2277
- (2014) Proceedings of Interspeech , pp. 2273-2277
- Yin, X.¹ Lei, M.² Qian, Y.³ Soong, F.-K.⁴ He, L.⁵ Ling, Z.H.⁶ Dai, L.R.⁷

29
- 85009139544
- Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis
- Yoshimura T., Tokuda K., Masuko T., Kobayashi T., and Kitamura T. Simultaneous modeling of spectrum, pitch and duration in HMM-based speech synthesis Proceedings of Eurospeech 6 1999 2347 2350
- (1999) Proceedings of Eurospeech , vol.6 , pp. 2347-2350
- Yoshimura, T.¹ Tokuda, K.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

30
- 70450161503
- Context-dependent additive log F0 model for HMM-based speech synthesis
- Zen H., and Braunschweiler N. Context-dependent additive log F0 model for HMM-based speech synthesis. Proceedings of Interspeech 2009 2091 2094
- (2009) Proceedings of Interspeech , pp. 2091-2094
- Zen, H.¹ Braunschweiler, N.²

31
- 84950118067
- Statistical parametric speech synthesis based on recurrent neural networks
- Zen H., Sak H., Graves A., and Senior A. Statistical parametric speech synthesis based on recurrent neural networks Proceedings of Conference on UKSpeech 2014
- (2014) Proceedings of Conference on UKSpeech
- Zen, H.¹ Sak, H.² Graves, A.³ Senior, A.⁴

32
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- Zen H., Senior A., and Schuster M. Statistical parametric speech synthesis using deep neural networks Proceedings of ICASSP 2013 7962 7966
- (2013) Proceedings of ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

33
- 33846405723
- Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
- Zen H., Toda T., Nakamura M., and Tokuda K. Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005 IEICE Trans. Inf. Syst. E90-D 1 2007 325 333
- (2007) IEICE Trans. Inf. Syst. , vol.E90D , Issue.1 , pp. 325-333
- Zen, H.¹ Toda, T.² Nakamura, M.³ Tokuda, K.⁴

34
- 67651002140
- Statistical parametric speech synthesis
- Zen H., Tokuda K., and Black A. Statistical parametric speech synthesis Speech Commun. 51 2009 1039 1064 10.1016/j.specom.2009.04.004
- (2009) Speech Commun. , vol.51 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.³

35
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- Zen H., Tokuda K., and Kitamura T. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences Comput. Speech Language 21 1 2007 153 173
- (2007) Comput. Speech Language , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.