SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn , Issue , 2017, Pages 4895-4899

An autoregressive recurrent mixture density network for parametric speech synthesis

(3) Wang, Xin a,b Takaki, Shinji a Yamagishi, Junichi a,b,c

a NATIONAL INSTITUTE OF INFORMATICS (Japan)

b GRADUATE UNIVERSITY FOR ADVANCED STUDIES (Japan)

c UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Autoregressive model; Mixture density network; Recurrent neural network; Speech synthesis

Indexed keywords

EID: 85023745327 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2017.7953087 Document Type: Conference Paper

Times cited : (68)

References (27)

1
- 84876687945
- Speech synthesis based on hidden Markov models
- Keiichi Tokuda, Yoshihiko Nankaku, Tomoki Toda, Heiga Zen, Junichi Yamagishi, and Keiichiro Oura, "Speech synthesis based on hidden Markov models," Proceedings of the IEEE, vol. 101, no. 5, pp. 1234-1252, 2013.
- (2013) Proceedings of the IEEE , vol.101 , Issue.5 , pp. 1234-1252
- Tokuda, K.¹ Nankaku, Y.² Toda, T.³ Zen, H.⁴ Yamagishi, J.⁵ Oura, K.⁶

2
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- Heiga Zen, Alan Senior, and Martin Schuster, "Statistical parametric speech synthesis using deep neural networks," in Proc. ICASSP, 2013, pp. 7962-7966.
- (2013) Proc. ICASSP , pp. 7962-7966
- Zen, H.¹ Senior, A.² Schuster, M.³

3
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Yuchen Fan, Yap Qian, Feilong Xie, and Frank K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks," in Proc. INTERSPEECH, 2014, pp. 1964-1968.
- (2014) Proc. INTERSPEECH , pp. 1964-1968
- Fan, Y.¹ Qian, Y.² Xie, F.³ Soong, F.K.⁴

4
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Zhen-Hua Ling, Li Deng, and Dong Yu, "Modeling spectral envelopes using restricted Boltzmann machines and deep belief networks for statistical parametric speech synthesis," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2129-2139, 2013.
- (2013) IEEE Transactions on Audio, Speech, and Language Processing , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Li, D.² Yu, D.³

5
- 84973309345
- A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis
- Shinji Takaki and Junichi Yamagishi, "A deep auto-encoder based low-dimensional feature extraction from FFT spectral envelopes for statistical parametric speech synthesis," in Proc. ICASSP, 2016, pp. 5535-5539.
- (2016) Proc. ICASSP , pp. 5535-5539
- Takaki, S.¹ Yamagishi, J.²

6
- 0004113976
- Mixture density networks
- Christopher M. Bishop, "Mixture Density Networks," Tech. Rep., Aston University, 2004, http://eprints.aston.ac.uk/373/.
- (2004) Tech. Rep., Aston University
- Bishop, C.M.¹

7
- 0003391392
- Macmillan publishing company
- Sophocles J. Orfanidis, Optimum Signal Processing: An Introduction, Macmillan publishing company, 1988.
- (1988) Optimum Signal Processing: An Introduction
- Orfanidis, S.J.¹

8
- 84872190545
- Autoregressive models for statistical parametric speech synthesis
- Matt Shannon, Heiga Zen, and William Byrne, "Autoregressive models for statistical parametric speech synthesis," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 3, pp. 587-597, 2013.
- (2013) IEEE Transactions on Audio, Speech, and Language Processing , vol.21 , Issue.3 , pp. 587-597
- Shannon, M.¹ Zen, H.² Byrne, W.³

9
- 84867625378
- Autoregressive HMM speech synthesis
- Carl Quillen, "Autoregressive HMM speech synthesis," in Proc. ICASSP, 2012, pp. 4021-4024.
- (2012) Proc. ICASSP , pp. 4021-4024
- Quillen, C.¹

10
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- Tomoki Toda and Keiichi Tokuda, "A speech parameter generation algorithm considering global variance for HMM-based speech synthesis," IEICE Transactions on Information and Systems, vol. 90, no. 5, pp. 816-824, 2007.
- (2007) IEICE Transactions on Information and Systems , vol.90 , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

11
- 70349284484
- Ph.D. thesis, Technische Universität München
- Alex Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Ph.D. thesis, Technische Universität München, 2008.
- (2008) Supervised Sequence Labelling with Recurrent Neural Networks
- Graves, A.¹

12
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- Heiga Zen and Andrew Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis," in Proc. ICASSP, 2014, pp. 3844-3848.
- (2014) Proc. ICASSP , pp. 3844-3848
- Zen, H.¹ Senior, A.²

13
- 84898948282
- Better generative models for sequential data problems: Bidirectional recurrent mixture density networks
- Mike Schuster, "Better generative models for sequential data problems: Bidirectional recurrent mixture density networks," in Proc. NIPS, 1999, pp. 589-595.
- (1999) Proc. NIPS , pp. 589-595
- Schuster, M.¹

14
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- Heiga Zen, Keiichi Tokuda, and Tadashi Kitamura, "Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences," Computer Speech & Language, vol. 21, no. 1, pp. 153-173, 2007.
- (2007) Computer Speech & Language , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

15
- 84973375140
- Trajectory training considering global variance for speech synthesis based on neural networks
- Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, and Keiichi Tokuda, "Trajectory training considering global variance for speech synthesis based on neural networks," in Proc. ICASSP, 2016, pp. 5600-5604.
- (2016) Proc. ICASSP , pp. 5600-5604
- Hashimoto, K.¹ Oura, K.² Nankaku, Y.³ Tokuda, K.⁴

16
- 79959847165
- A formulation of the autoregressive HMM for speech synthesis
- CUED/F-INFENG/TR.629
- Matt Shannon and William Byrne, "A formulation of the autoregressive HMM for speech synthesis," Tech. Rep., University of Cambridge, CUED/F-INFENG/TR.629, 2009.
- (2009) Tech. Rep., University of Cambridge
- Shannon, M.¹ Byrne, W.²

17
- 84946045510
- Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis
- Heiga Zen and Haşim Sak, "Unidirectional long short-term memory recurrent neural network with recurrent output layer for low-latency speech synthesis," in Proc. ICASSP, 2015, pp. 4470-4474.
- (2015) Proc. ICASSP , pp. 4470-4474
- Zen, H.¹ Sak, H.²

18
- 84878419996
- The blizzard challenge 2011
- Simon King and Vasilis Karaiskos, "The Blizzard Challenge 2011," in Proc. Blizzard Challenge 2011, 2011, pp. 1-10.
- (2011) Proc. Blizzard Challenge 2011 , pp. 1-10
- King, S.¹ Karaiskos, V.²

19
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- Hideki Kawahara, Ikuyo Masuda-Katsuse, and Alain de Cheveigne, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds," Speech Communication, vol. 27, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigne, A.³

20
- 84994313714
- HTS Working Group, "The English TTS system Flite+HTS engine," 2014.
- (2014) The English TTS System Flite+HTS Engine

21
- 84930639546
- Introducing CURRENT: The munich open-source CUDA recurrent neural network toolkit
- Felix Weninger, Johannes Bergmann, and Björn Schuller, "Introducing CURRENT: The Munich open-source CUDA recurrent neural network toolkit," The Journal of Machine Learning Research, vol. 16, no. 1, pp. 547-551, 2015.
- (2015) The Journal of Machine Learning Research , vol.16 , Issue.1 , pp. 547-551
- Weninger, F.¹ Bergmann, J.² Schuller, B.³

22
- 0015360527
- Digital inverse filtering: A new tool for formant trajectory estimation
- Jun
- John D. Markel, "Digital inverse filtering: A new tool for formant trajectory estimation," IEEE Transactions on Audio and Electroacoustics, vol. 20, no. 2, pp. 129-137, Jun 1972.
- (1972) IEEE Transactions on Audio and Electroacoustics , vol.20 , Issue.2 , pp. 129-137
- Markel, J.D.¹

23
- 77956356059
- Guidelines for ToBI labelling
- Mary E. Beckman and Gayle Ayers, "Guidelines for ToBI labelling," The OSU Research Foundation, vol. 3, 1997.
- (1997) The OSU Research Foundation , vol.3
- Beckman, M.E.¹ Ayers, G.²

24
- 13344250603
- Method for the subjective assessment of intermediate quality level of coding systems
- "Method for the subjective assessment of intermediate quality level of coding systems," in ITU-R BS.1534-1, International Telecommunication Union Radiocommunication Assembly, 2003, http://www.itu.int/rec/R-REC-BS.1534.
- (2003) ITU-R BS.1534-1, International Telecommunication Union Radiocommunication Assembly

25
- 84973395039
- Robust TTS duration modelling using DNNs
- Gustav E. Henter, Srikanth Ronanki, Oliver Watts, Mirjam Wester, Zhizheng Wu, and Simon King, "Robust TTS duration modelling using DNNs," in Proc. ICASSP, 2016, pp. 5130-5134.
- (2016) Proc. ICASSP , pp. 5130-5134
- Henter, G.E.¹ Ronanki, S.² Watts, O.³ Wester, M.⁴ Wu, Z.⁵ King, S.⁶

26
- 84994314564
- Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices
- Heiga Zen, Yannis Agiomyrgiannakis, Niels Egberts, Fergus Henderson, and Przemyslaw Szczepaniak, "Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices," in Proc. INTERSPEECH, 2016, pp. 2273-2277.
- (2016) Proc. INTERSPEECH , pp. 2273-2277
- Zen, H.¹ Agiomyrgiannakis, Y.² Egberts, N.³ Henderson, F.⁴ Szczepaniak, P.⁵

27
- 84905234422
- A postfilter to modify the modulation spectrum in HMM-based speech synthesis
- Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, and Satoshi Nakamura, "A postfilter to modify the modulation spectrum in HMM-based speech synthesis," in Proc. ICASSP, 2014, pp. 290-294.
- (2014) Proc. ICASSP , pp. 290-294
- Takamichi, S.¹ Toda, T.² Neubig, G.³ Sakti, S.⁴ Nakamura, S.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.