메뉴 건너뛰기




Volumn 76, Issue , 2016, Pages 82-92

Modeling F0 trajectories in hierarchically structured deep neural networks

Author keywords

Deep neural network; Discrete cosine transform; Fundamental frequency; Hidden Markov model; Speech synthesis

Indexed keywords

DISCRETE COSINE TRANSFORMS; HIDDEN MARKOV MODELS; MARKOV PROCESSES; MODEL STRUCTURES; SPEECH SYNTHESIS; SYNTHESIS (CHEMICAL);

EID: 84950159800     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2015.10.007     Document Type: Article
Times cited : (26)

References (35)
  • 2
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Fan Y.C., Qian Y., and Soong F.K. TTS synthesis with bidirectional LSTM based recurrent neural networks Interspeech 2014 1964 1968
    • (2014) Interspeech , pp. 1964-1968
    • Fan, Y.C.1    Qian, Y.2    Soong, F.K.3
  • 3
    • 84910068142 scopus 로고    scopus 로고
    • Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks
    • Fernandez R., Rendel A., Ramabhadran B., and Hoory R. Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks. Interspeech 2014 2268 2272
    • (2014) Interspeech , pp. 2268-2272
    • Fernandez, R.1    Rendel, A.2    Ramabhadran, B.3    Hoory, R.4
  • 4
    • 84867218240 scopus 로고    scopus 로고
    • In search of models in speech communication research
    • Fujisaki H. In search of models in speech communication research. Interspeech 2008 1 10
    • (2008) Interspeech , pp. 1-10
    • Fujisaki, H.1
  • 5
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • Kawahara H., Masuda-Katsuse I., and de Cheveigné A. Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds Speech Commun. 27 3 1999 187 208
    • (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-208
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 6
    • 84867194192 scopus 로고    scopus 로고
    • Multilevel parametric-base F0 model for speech synthesis
    • Latorre J., and Akamine M. Multilevel parametric-base F0 model for speech synthesis Proceedings of Interspeech 2008 2274 2277
    • (2008) Proceedings of Interspeech , pp. 2274-2277
    • Latorre, J.1    Akamine, M.2
  • 7
    • 79959844205 scopus 로고    scopus 로고
    • A hierarchical F0 modeling method for HMM-based speech synthesis
    • Lei M., Wu Y.J., Soong F.K., Ling Z.H., and Dai L.R. A hierarchical F0 modeling method for HMM-based speech synthesis. Interspeech 2010 2170 2173
    • (2010) Interspeech , pp. 2170-2173
    • Lei, M.1    Wu, Y.J.2    Soong, F.K.3    Ling, Z.H.4    Dai, L.R.5
  • 8
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
    • Ling Z.H., Li D., and Yu D. Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis IEEE Trans. Audio, Speech Language Proc. 21 10 2013 2129 2139
    • (2013) IEEE Trans. Audio, Speech Language Proc. , vol.21 , Issue.10 , pp. 2129-2139
    • Ling, Z.H.1    Li, D.2    Yu, D.3
  • 10
    • 0018983824 scopus 로고
    • A fast cosine transform in one and two dimensions
    • Makhoul J. A fast cosine transform in one and two dimensions IEEE Trans. Acoustics, Speech Signal Proc. 28 1 1980 27 34
    • (1980) IEEE Trans. Acoustics, Speech Signal Proc. , vol.28 , Issue.1 , pp. 27-34
    • Makhoul, J.1
  • 11
    • 84865714286 scopus 로고    scopus 로고
    • Stylization and trajectory modelling of short and long term speech prosody variations
    • Obin N., Lacheret A., and Rodet X. Stylization and trajectory modelling of short and long term speech prosody variations Proceedings of Interspeech 2011 2029 2032
    • (2011) Proceedings of Interspeech , pp. 2029-2032
    • Obin, N.1    Lacheret, A.2    Rodet, X.3
  • 12
    • 84905251808 scopus 로고    scopus 로고
    • On the training aspects of deep neural network (DNN) for parametric TTS synthesis
    • Qian Y., Fan Y.C., Hu W.-P., and Soong F.K. On the training aspects of deep neural network (DNN) for parametric TTS synthesis Proceedings of ICASSP 2014 3857 3861
    • (2014) Proceedings of ICASSP , pp. 3857-3861
    • Qian, Y.1    Fan, Y.C.2    Hu, W.-P.3    Soong, F.K.4
  • 13
    • 84867200235 scopus 로고    scopus 로고
    • Generating natural F0 trajectory with additive trees
    • Qian Y., Liang H., and Soong F.K. Generating natural F0 trajectory with additive trees. Proceedings of Interspeech 2008 2126 2129
    • (2008) Proceedings of Interspeech , pp. 2126-2129
    • Qian, Y.1    Liang, H.2    Soong, F.K.3
  • 14
    • 85008039410 scopus 로고    scopus 로고
    • Improved prosody generation by maximizing joint probability of state and longer units
    • Qian Y., Wu Z., Gao B., and Soong F.K. Improved prosody generation by maximizing joint probability of state and longer units IEEE Trans. Audio, Speech, Language Proc. 19 6 2011 1702 1710
    • (2011) IEEE Trans. Audio, Speech, Language Proc. , vol.19 , Issue.6 , pp. 1702-1710
    • Qian, Y.1    Wu, Z.2    Gao, B.3    Soong, F.K.4
  • 15
    • 0033906251 scopus 로고    scopus 로고
    • MDL-based context-dependent sub-word modeling for speech recognition
    • Shinoda K., and Watanabe T. MDL-based context-dependent sub-word modeling for speech recognition J. Acoust. Soc. Jpn(E) 21 2 2000 79 86
    • (2000) J. Acoust. Soc. Jpn(E) , vol.21 , Issue.2 , pp. 79-86
    • Shinoda, K.1    Watanabe, T.2
  • 16
    • 0001455934 scopus 로고
    • A robust algorithm for pitch tracking (RAPT)
    • Talkin D. A robust algorithm for pitch tracking (RAPT) Speech Coding Synthesis 1995 495 518
    • (1995) Speech Coding Synthesis , pp. 495-518
    • Talkin, D.1
  • 17
    • 51449117929 scopus 로고    scopus 로고
    • Modelling and synthesising F0 contours with the discrete cosine transform
    • Teutenberg J., Watson C., and Riddle P. Modelling and synthesising F0 contours with the discrete cosine transform Proceedings of ICASSP 2008 3973 3976
    • (2008) Proceedings of ICASSP , pp. 3973-3976
    • Teutenberg, J.1    Watson, C.2    Riddle, P.3
  • 18
    • 33846410497 scopus 로고    scopus 로고
    • Speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • Toda T., and Tokuda K. Speech parameter generation algorithm considering global variance for HMM-based speech synthesis Proceedings of Eurospeech 2005 1315 1318
    • (2005) Proceedings of Eurospeech , pp. 1315-1318
    • Toda, T.1    Tokuda, K.2
  • 19
    • 0028996993 scopus 로고
    • Speech parameter generation from HMM using dynamic features
    • Tokuda K., Kobayashi T., and Imai S. Speech parameter generation from HMM using dynamic features Proceedings of ICASSP 1995 660 663
    • (1995) Proceedings of ICASSP , pp. 660-663
    • Tokuda, K.1    Kobayashi, T.2    Imai, S.3
  • 20
    • 0032678076 scopus 로고    scopus 로고
    • Hidden markov models based on multi-space probability distribution for pitch pattern modeling
    • Tokuda K., Masuko T., Miyazaki N., and Kobayashi T. Hidden markov models based on multi-space probability distribution for pitch pattern modeling Proceedings of ICASSP 1999 229 232
    • (1999) Proceedings of ICASSP , pp. 229-232
    • Tokuda, K.1    Masuko, T.2    Miyazaki, N.3    Kobayashi, T.4
  • 24
    • 84867589421 scopus 로고    scopus 로고
    • Modeling pitch trajectory by hierarchical HMM with minimum generation error training
    • Wu Y.J., and Soong F.K. Modeling pitch trajectory by hierarchical HMM with minimum generation error training Proceedings of ICASSP 2012 4017 4020 10.1109/ICASSP.2012.6288799
    • (2012) Proceedings of ICASSP , pp. 4017-4020
    • Wu, Y.J.1    Soong, F.K.2
  • 25
    • 33846429403 scopus 로고    scopus 로고
    • Minimum generation error training for HMM-based speech synthesis
    • Wu Y.J., and Wang R.H. Minimum generation error training for HMM-based speech synthesis Proceedings of ICASSP 2006 89 92 10.1109/ICASSP.2006.1659964
    • (2006) Proceedings of ICASSP , pp. 89-92
    • Wu, Y.J.1    Wang, R.H.2
  • 26
    • 34547517493 scopus 로고    scopus 로고
    • Full HMM training for minimizing generation error in synthesis
    • Wu Y.J., Wang R.H., and Soong F.K. Full HMM training for minimizing generation error in synthesis ICASSP 2007 517 520
    • (2007) ICASSP , pp. 517-520
    • Wu, Y.J.1    Wang, R.H.2    Soong, F.K.3
  • 27
    • 60849112575 scopus 로고    scopus 로고
    • Modeling and generating tone contour with phrase intonation for mandarin chinese speech
    • Wu Z., Qian Y., Soong F.K., and Zhang B. Modeling and generating tone contour with phrase intonation for mandarin chinese speech Proceedings of ISCSLP 2008 1 4 10.1109/CHINSL.2008.ECP.42
    • (2008) Proceedings of ISCSLP , pp. 1-4
    • Wu, Z.1    Qian, Y.2    Soong, F.K.3    Zhang, B.4
  • 28
    • 84910044428 scopus 로고    scopus 로고
    • Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree
    • Yin X., Lei M., Qian Y., Soong F.-K., He L., Ling Z.H., and Dai L.R. Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree Proceedings of Interspeech 2014 2273 2277
    • (2014) Proceedings of Interspeech , pp. 2273-2277
    • Yin, X.1    Lei, M.2    Qian, Y.3    Soong, F.-K.4    He, L.5    Ling, Z.H.6    Dai, L.R.7
  • 30
    • 70450161503 scopus 로고    scopus 로고
    • Context-dependent additive log F0 model for HMM-based speech synthesis
    • Zen H., and Braunschweiler N. Context-dependent additive log F0 model for HMM-based speech synthesis. Proceedings of Interspeech 2009 2091 2094
    • (2009) Proceedings of Interspeech , pp. 2091-2094
    • Zen, H.1    Braunschweiler, N.2
  • 32
    • 84890490547 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using deep neural networks
    • Zen H., Senior A., and Schuster M. Statistical parametric speech synthesis using deep neural networks Proceedings of ICASSP 2013 7962 7966
    • (2013) Proceedings of ICASSP , pp. 7962-7966
    • Zen, H.1    Senior, A.2    Schuster, M.3
  • 33
    • 33846405723 scopus 로고    scopus 로고
    • Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005
    • Zen H., Toda T., Nakamura M., and Tokuda K. Details of Nitech HMM-based speech synthesis system for the Blizzard Challenge 2005 IEICE Trans. Inf. Syst. E90-D 1 2007 325 333
    • (2007) IEICE Trans. Inf. Syst. , vol.E90D , Issue.1 , pp. 325-333
    • Zen, H.1    Toda, T.2    Nakamura, M.3    Tokuda, K.4
  • 34
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • Zen H., Tokuda K., and Black A. Statistical parametric speech synthesis Speech Commun. 51 2009 1039 1064 10.1016/j.specom.2009.04.004
    • (2009) Speech Commun. , vol.51 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.3
  • 35
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
    • Zen H., Tokuda K., and Kitamura T. Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences Comput. Speech Language 21 1 2007 153 173
    • (2007) Comput. Speech Language , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.