SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2015-August, Issue , 2015, Pages 4460-4464

Deep neural networks employing Multi-Task Learning and stacked bottleneck features for speech synthesis

(4) Wu, Zhizheng a Valentini Botinhao, Cassia a Watts, Oliver a King, Simon a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

acoustic model; bottleneck feature; deep neural network; multi task learning; Speech synthesis

Indexed keywords

AUDIO SIGNAL PROCESSING; COMPLEX NETWORKS; LINGUISTICS; MAPPING; SPEECH COMMUNICATION; SPEECH SYNTHESIS;

ACOUSTIC MODEL; BOTTLENECK FEATURES; LINGUISTIC FEATURES; LISTENING TESTS; MULTITASK LEARNING; PARAMETRIC SYSTEMS; SPEECH ACOUSTICS; SYNTHETIC SPEECH;

DEEP NEURAL NETWORKS;

EID: 84946033275 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2015.7178814 Document Type: Conference Paper

Times cited : (264)

References (25)

1
- 67651002140
- Statistical parametric speech synthesis
- Heiga Zen, Keiichi Tokuda, and Alan W Black, Statistical parametric speech synthesis, Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 84910105608
- Measuring a decade of progress in text-tospeech
- Simon King, Measuring a decade of progress in text-tospeech, Loquens, vol. 1, no. 1, 2014
- (2014) Loquens , vol.1 , Issue.1
- King, S.¹

3
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- Andrew J Hunt and Alan W Black, Unit selection in a concatenative speech synthesis system using a large speech database, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 1996
- (1996) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Hunt, A.J.¹ Black, A.W.²

4
- 33846429403
- Minimum generation error training for HMM-based speech synthesis
- Yi-Jian Wu and Ren-Hua Wang, Minimum generation error training for HMM-based speech synthesis, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2006
- (2006) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Wu, Y.-J.¹ Wang, R.-H.²

5
- 38549096029
- A speech parameter generation algorithm considering global variance for HMM-based speech synthesis
- Toda Tomoki and Keiichi Tokuda, A speech parameter generation algorithm considering global variance for HMM-based speech synthesis, IEICE Transactions on Information and Systems, vol. 90, no. 5, pp. 816-824, 2007
- (2007) IEICE Transactions on Information and Systems , vol.90 , Issue.5 , pp. 816-824
- Tomoki, T.¹ Tokuda, K.²

6
- 33749573927
- Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences
- Heiga Zen, Keiichi Tokuda, and Tadashi Kitamura, Reformulating the HMM as a trajectory model by imposing explicit relationships between static and dynamic feature vector sequences, Computer Speech &Language, vol. 21, no. 1, pp. 153-173, 2007
- (2007) Computer Speech &Language , vol.21 , Issue.1 , pp. 153-173
- Zen, H.¹ Tokuda, K.² Kitamura, T.³

7
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- Heiga Zen, Andrew Senior, and Mike Schuster, Statistical parametric speech synthesis using deep neural networks, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013
- (2013) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Zen, H.¹ Senior, A.² Schuster, M.³

8
- 84890527090
- Multidistribution deep belief network for speech synthesis
- Shiyin Kang, Xiaojun Qian, and Helen Meng, Multidistribution deep belief network for speech synthesis, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013
- (2013) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Kang, S.¹ Qian, X.² Meng, H.³

9
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Zhen-Hua Ling, Li Deng, and Dong Yu, Modeling spectral envelopes using Restricted Boltzmann Machines and Deep Belief Networks for statistical parametric speech synthesis, IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2129-2139, 2013
- (2013) IEEE Transactions on Audio, Speech, and Language Processing , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

10
- 84929157442
- Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis
- Heng Lu, Simon King, and Oliver Watts, Combining a vector space representation of linguistic context with a deep neural network for text-to-speech synthesis, Proc. the 8th ISCA Speech Synthesis Workshop (SSW), 2013
- (2013) Proc the 8th ISCA Speech Synthesis Workshop (SSW)
- Lu, H.¹ King, S.² Watts, O.³

11
- 84905251808
- On the training aspects of deep neural network (DNN) for parametric TTS synthesis
- Yao Qian, Yuchen Fan, Wenping Hu, and Frank K Soong, On the training aspects of deep neural network (DNN) for parametric TTS synthesis, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2014
- (2014) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Qian, Y.¹ Fan, Y.² Hu, W.³ Soong, F.K.⁴

12
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, Li Deng, Dong Yu, G.E. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T.N. Sainath, and B. Kingsbury, Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.E.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.N.¹⁰ Kingsbury, B.¹¹

13
- 1942470793
- Springer
- Rich Caruana, Multitask learning, Springer, 1998
- (1998) Multitask Learning
- Caruana, R.¹

14
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Yuchen Fan, Yao Qian, Fenglong Xie, and Frank K. Soong, TTS synthesis with bidirectional LSTM based recurrent neural networks, in Proc. Interspeech, 2014
- (2014) Proc. Interspeech
- Fan, Y.¹ Qian, Y.² Xie, F.³ Soong, F.K.⁴

15
- 84890545600
- Multi-task learning in deep neural networks for improved phoneme recognition
- Michael L Seltzer and Jasha Droppo, Multi-task learning in deep neural networks for improved phoneme recognition, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013
- (2013) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Seltzer, M.L.¹ Droppo, J.²

16
- 56449095373
- A unified architecture for natural language processing: Deep neural networks with multitask learning
- Ronan Collobert and Jason Weston, A unified architecture for natural language processing: Deep neural networks with multitask learning, in Proc. IEEE Int. Conf. on Machine Learning (ICML), 2008
- (2008) Proc. IEEE Int. Conf. on Machine Learning (ICML)
- Collobert, R.¹ Jason Weston²

17
- 0033708106
- Speech parameter generation algorithms for HMM-based speech synthesis
- Keiichi Tokuda, Takayoshi Yoshimura, Takashi Masuko, Takao Kobayashi, and Tadashi Kitamura, Speech parameter generation algorithms for HMM-based speech synthesis, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2000
- (2000) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

18
- 84865785753
- Improved bottleneck features using pretrained deep neural networks
- Dong Yu and Michael L Seltzer, Improved bottleneck features using pretrained deep neural networks, in Proc. Interspeech, 2011
- (2011) Proc. Interspeech
- Yu, D.¹ Seltzer, M.L.²

19
- 84867593213
- Auto-encoder bottleneck features using deep belief networks
- Tara N Sainath, Brian Kingsbury, and Bhuvana Ramabhadran, Auto-encoder bottleneck features using deep belief networks, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2012
- (2012) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Sainath, T.N.¹ Kingsbury, B.² Ramabhadran, B.³

20
- 84890482429
- Extracting deep bottleneck features using stacked autoencoders
- Jonas Gehring, Yajie Miao, Florian Metze, and Alex Waibel, Extracting deep bottleneck features using stacked autoencoders, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013
- (2013) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Gehring, J.¹ Miao, Y.² Metze, F.³ Waibel, A.⁴

21
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- Hideki Kawahara, Ikuyo Masuda-Katsuse, and Alain Cheveigné, Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, Speech communication, vol. 27, no. 3, pp. 187-207, 1999
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Ikuyo, M.-K.² Cheveigné, A.³

22
- 78651237606
- web resource
- D. P.W. Ellis, Gammatone-like spectrograms, web resource: http://www.ee.columbia.edu/ dpwe/resources/matlab/gammatonegram/, 2009
- (2009) Gammatone-like Spectrograms
- Ellis, D.P.W.¹

23
- 0345443172
- Glimpsing speech
- Martin Cooke, Glimpsing speech, Journal of Phonetics, vol. 31, pp. 579-584, 2003
- (2003) Journal of Phonetics , vol.31 , pp. 579-584
- Cooke, M.¹

24
- 84857819132
- Theano: A CPU and GPU math expression compiler
- June
- James Bergstra, Olivier Breuleux, Frédéric Bastien, Pascal Lamblin, Razvan Pascanu, Guillaume Desjardins, Joseph Turian, David Warde-Farley, and Yoshua Bengio, Theano: a CPU and GPU math expression compiler, in Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010
- (2010) Proceedings of the Python for Scientific Computing Conference (SciPy)
- Bergstra, J.¹ Breuleux, O.² Bastien, F.³ Lamblin, P.⁴ Pascanu, R.⁵ Desjardins, G.⁶ Turian, J.⁷ David, W.-F.⁸ Bengio, Y.⁹

25
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- Heiga Zen and Andrew Senior, Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2014
- (2014) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Zen, H.¹ Senior, A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.