메뉴 건너뛰기




Volumn 24, Issue 7, 2016, Pages 1255-1265

Improving Trajectory Modelling for DNN-Based Speech Synthesis by Using Stacked Bottleneck Features and Minimum Generation Error Training

Author keywords

acoustic modelling; bottleneck; deep neural network; Minimum generation error; Speech synthesis

Indexed keywords

COMPUTATIONAL LINGUISTICS; LINGUISTICS; RECURRENT NEURAL NETWORKS; SPEECH SYNTHESIS;

EID: 84978086501     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2016.2551865     Document Type: Article
Times cited : (37)

References (49)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, A. W. Black, "Statistical parametric speech synthesis," Speech Commun., vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 2
    • 84910105608 scopus 로고    scopus 로고
    • Measuring a decade of progress in text-to-speech
    • S. King, "Measuring a decade of progress in text-to-speech," Loquens, vol. 1, no. 1, 2014.
    • Loquens , vol.1 , Issue.1 , pp. 2014
    • King, S.1
  • 3
    • 0029765811 scopus 로고    scopus 로고
    • Unit selection in a concatenative speech synthesis system using a large speech database
    • A. J. Hunt and A. W. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., vol. 1, 1996, pp. 373-376.
    • (1996) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , vol.1 , pp. 373-376
    • Hunt, A.J.1    Black, A.W.2
  • 5
    • 33749573927 scopus 로고    scopus 로고
    • Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
    • H. Zen, K. Tokuda, T. Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Comput. Speech Language, vol. 21, no. 1, pp. 153-173, 2007.
    • (2007) Comput. Speech Language , vol.21 , Issue.1 , pp. 153-173
    • Zen, H.1    Tokuda, K.2    Kitamura, T.3
  • 6
    • 38549096029 scopus 로고    scopus 로고
    • A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
    • T. Toda and K. Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Trans. Inform. Syst., vol. 90, no. 5, pp. 816-824, 2007.
    • (2007) IEICE Trans. Inform. Syst. , vol.90 , Issue.5 , pp. 816-824
    • Toda, T.1    Tokuda, K.2
  • 8
    • 84946042252 scopus 로고    scopus 로고
    • Attributing modelling errors in HMM synthesis by stepping gradually from natural tomodelled speech
    • T. Merritt, J. Latorre, S. King, "Attributing modelling errors in HMM synthesis by stepping gradually from natural tomodelled speech," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4220-4224.
    • (2015) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 4220-4224
    • Merritt, T.1    Latorre, J.2    King, S.3
  • 9
    • 85032751458 scopus 로고    scopus 로고
    • Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
    • Nov.
    • G. Hinton et al.,"Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
    • (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 82-97
    • Hinton, G.1
  • 10
    • 85032750981 scopus 로고    scopus 로고
    • Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
    • May
    • Z.-H. Ling et al., "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends," IEEE Signal Process. Mag., vol. 32, no. 3, pp. 35-52, May 2015.
    • (2015) IEEE Signal Process. Mag. , vol.32 , Issue.3 , pp. 35-52
    • Ling, Z.-H.1
  • 12
    • 0027246817 scopus 로고
    • Speech synthesiswith artificial neural networks
    • T. Weijters and J. Thole, "Speech synthesiswith artificial neural networks," in Proc. Int. Conf. Neural Netw., 1993, pp. 1764-1769.
    • (1993) Proc. Int. Conf. Neural Netw. , pp. 1764-1769
    • Weijters, T.1    Thole, J.2
  • 14
    • 84976450820 scopus 로고
    • Speech synthesis using artificial neural networks trained on cepstral coefficients
    • C. Tuerk and T. Robinson, "Speech synthesis using artificial neural networks trained on cepstral coefficients." in Proc. Eur. Conf. Speech Commun. Technol., 1993, pp. 4-7.
    • (1993) Proc. Eur. Conf. Speech Commun. Technol. , pp. 4-7
    • Tuerk, C.1    Robinson, T.2
  • 15
    • 0012235593 scopus 로고
    • A neural-network-based model of segmental duration for speech synthesis
    • M. Riedi, "A neural-network-based model of segmental duration for speech synthesis," in Proc. Eur. Conf. Speech Commun. Technol., 1995, pp. 599-602.
    • (1995) Proc. Eur. Conf. Speech Commun. Technol. , pp. 599-602
    • Riedi, M.1
  • 16
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis
    • Oct.
    • Z.-H. Ling, L. Deng, D. Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2129-2139, Oct. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.10 , pp. 2129-2139
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 18
    • 84910030421 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis using weighted multi-distribution deep belief network
    • S. Kang and H. Meng, "Statistical parametric speech synthesis using weighted multi-distribution deep belief network," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2014, pp. 1959-1963.
    • (2014) Proc. Annu. Conf. Int. Speech Commun. Assoc. , pp. 1959-1963
    • Kang, S.1    Meng, H.2
  • 19
    • 84905262874 scopus 로고    scopus 로고
    • Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
    • H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 3844-3848.
    • (2014) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 3844-3848
    • Zen, H.1    Senior, A.2
  • 23
    • 84929157442 scopus 로고    scopus 로고
    • Combining a vector space representation of linguistic contextwith a deep neural network for text-to-speech synthesis
    • H. Lu, S. King, O. Watts, "Combining a vector space representation of linguistic contextwith a deep neural network for text-to-speech synthesis," in Proc. 8th Int. Speech Commun. Assoc. Speech Synthesis Workshop, 2013, pp. 281-285.
    • (2013) Proc. 8th Int. Speech Commun. Assoc. Speech Synthesis Workshop , pp. 281-285
    • Lu, H.1    King, S.2    Watts, O.3
  • 27
    • 84978101616 scopus 로고    scopus 로고
    • A reading list of recent advances in speech synthesis
    • S. King, "A reading list of recent advances in speech synthesis," in Proc. 18th Int. Congr. Phon. Sci., 2015.
    • (2015) Proc. 18th Int. Congr. Phon. Sci.
    • King, S.1
  • 31
    • 84946045510 scopus 로고    scopus 로고
    • Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
    • H. Zen and H. Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4470-4474.
    • (2015) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 4470-4474
    • Zen, H.1    Sak, H.2
  • 34
    • 84959135757 scopus 로고    scopus 로고
    • Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features
    • Z. Wu and S. King, "Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 309-313.
    • (2015) Proc. Annu. Conf. Int. Speech Commun. Assoc. , pp. 309-313
    • Wu, Z.1    King, S.2
  • 35
    • 84959172579 scopus 로고    scopus 로고
    • Sequence generation error (SGE) minimization based deep neural networks training for text-tospeech synthesis
    • Y. Fan, Y. Qian, F. K. Soong, L. He, "Sequence generation error (SGE) minimization based deep neural networks training for text-tospeech synthesis," in Proc. Annu. Conf. Int. SpeechCommun. Assoc., 2015, pp. 864-868.
    • (2015) Proc. Annu. Conf. Int. SpeechCommun. Assoc. , pp. 864-868
    • Fan, Y.1    Qian, Y.2    Soong, F.K.3    He, L.4
  • 38
    • 0022471098 scopus 로고
    • Learning representations by back-propagating errors
    • D. E. Rumelhart, G. E. Hinton, R. J. Williams, "Learning representations by back-propagating errors," Nature, vol. 323, no. 6088, pp. 533-536, 1986.
    • (1986) Nature , vol.323 , Issue.6088 , pp. 533-536
    • Rumelhart, D.E.1    Hinton, G.E.2    Williams, R.J.3
  • 43
    • 84865785753 scopus 로고    scopus 로고
    • Improved bottleneck features using pretrained deep neural networks
    • D. Yu and M. L. Seltzer, "Improved bottleneck features using pretrained deep neural networks," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2011, pp. 237-240.
    • (2011) Proc. Annu. Conf. Int. Speech Commun. Assoc. , pp. 237-240
    • Yu, D.1    Seltzer, M.L.2
  • 45
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Commun., vol. 27, no. 3, pp. 187-207, 1999.
    • (1999) Speech Commun. , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3
  • 46
    • 84894152556 scopus 로고    scopus 로고
    • The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
    • C. Veaux, J. Yamagishi, S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database," in Proc. Int. Conf. Oriental COCOSDA, 2013, pp. 1-4.
    • (2013) Proc. Int. Conf. Oriental COCOSDA , pp. 1-4
    • Veaux, C.1    Yamagishi, J.2    King, S.3
  • 47
    • 0012330750 scopus 로고
    • The design for the Wall Street journalbased CSR corpus
    • D. B. Paul and J. M. Baker, "The design for the Wall Street journalbased CSR corpus," in Proc. Workshop Speech Natural Lang., 1992, pp. 357-362.
    • (1992) Proc. Workshop Speech Natural Lang. , pp. 357-362
    • Paul, D.B.1    Baker, J.M.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.