SCOPUS 정보 검색 플랫폼

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

Volumn 2016-May, Issue , 2016, Pages 5145-5149

Deep neural network-guided unit selection synthesis

(5) Merritt, Thomas a Clark, Robert A J a,c Wu, Zhizheng a Yamagishi, Junichi a,b King, Simon a

a UNIVERSITY OF EDINBURGH (United Kingdom)

b NATIONAL INSTITUTE OF INFORMATICS (Japan)

c GOOGLE INC (United States)

Author keywords

deep neural networks; embedding; hybrid synthesis; speech synthesis; unit selection

Indexed keywords

EID: 84973402504 PISSN: 15206149 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICASSP.2016.7472658 Document Type: Conference Paper

Times cited : (46)

References (36)

1
- 84878419996
- The blizzard challenge 2011
- Simon King and Vasilis Karaiskos, "The Blizzard Challenge 2011, " in Proc. Blizzard Challenge, 2011.
- (2011) Proc. Blizzard Challenge
- King, S.¹ Karaiskos, V.²

2
- 84890516589
- The blizzard challenge 2012
- Simon King and Vasilis Karaiskos, "The Blizzard Challenge 2012, " in Proc. Blizzard Challenge, 2012.
- (2012) Proc. Blizzard Challenge
- King, S.¹ Karaiskos, V.²

3
- 84904680338
- The blizzard challenge 2013
- Simon King and Vasilis Karaiskos, "The Blizzard Challenge 2013, " in Proc. Blizzard Challenge, 2013.
- (2013) Proc. Blizzard Challenge
- King, S.¹ Karaiskos, V.²

4
- 84910105608
- Measuring a decade of progress in text-to-speech
- Simon King, "Measuring a decade of progress in text-to-speech, " Loquens, vol. 1, no. 1, 2014.
- (2014) Loquens , vol.1 , Issue.1
- King, S.¹

5
- 85133164767
- Investigating the shortcomings of HMM synthesis
- Thomas Merritt and Simon King, "Investigating the shortcomings of HMM synthesis, " in Proc. 8th ISCA Speech Synthesis Workshop, 2013, pp. 165-170.
- (2013) Proc. 8th ISCA Speech Synthesis Workshop , pp. 165-170
- Merritt, T.¹ King, S.²

6
- 84910070288
- Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis
- Thomas Merritt, Tuomo Raitio, and Simon King, "Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis, " in Proc. Interspeech, 2014, pp. 1509-1513.
- (2014) Proc. Interspeech , pp. 1509-1513
- Merritt, T.¹ Raitio, T.² King, S.³

7
- 84910028520
- Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech
- Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, and Simon King, "Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech, " in Proc. Interspeech, 2014, pp. 1504-1508.
- (2014) Proc. Interspeech , pp. 1504-1508
- Eje Henter, G.¹ Merritt, T.² Shannon, M.³ Mayo, C.⁴ King, S.⁵

8
- 84946042252
- Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech
- Thomas Merritt, Javier Latorre, and Simon King, "Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech, " in Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Merritt, T.¹ Latorre, J.² King, S.³

9
- 84959122693
- Deep neural network context embeddings for model selection in rich-context HMM synthesis
- Thomas Merritt, Junichi Yamagishi, Zhizheng Wu, Oliver Watts, and Simon King, "Deep neural network context embeddings for model selection in rich-context HMM synthesis, " in Proc. Interspeech, 2015.
- (2015) Proc. Interspeech
- Merritt, T.¹ Yamagishi, J.² Wu, Z.³ Watts, O.⁴ King, S.⁵

10
- 0029765811
- Unit selection in a concatenative speech synthesis system using a large speech database
- Andrew J Hunt and Alan W Black, "Unit selection in a concatenative speech synthesis system using a large speech database, " in Proc. ICASSP, 1996, pp. 373-376.
- (1996) Proc. ICASSP , pp. 373-376
- Hunt, A.J.¹ Black, A.W.²

11
- 85133526552
- Automatically clustering similar units for unit selection in speech synthesis
- Alan W Black and Paul A Taylor, "Automatically clustering similar units for unit selection in speech synthesis., " in Proc. Eurospeech, 1997.
- (1997) Proc. Eurospeech
- Black, A.W.¹ Taylor, P.A.²

12
- 85124698057
- The architecture of the festival speech synthesis system
- Paul Taylor, Alan W Black, and Richard Caley, "The architecture of the festival speech synthesis system, " in The Third ESCA/COCOSDA Workshop on Speech Synthesis, 1998.
- (1998) The Third ESCA/COCOSDA Workshop on Speech Synthesis
- Taylor, P.¹ Black, A.W.² Caley, R.³

13
- 44949153641
- The target cost formulation in unit selection speech synthesis
- Paul Taylor, "The target cost formulation in unit selection speech synthesis., " in Proc. Interspeech, 2006, pp. 2038-2041.
- (2006) Proc. Interspeech , pp. 2038-2041
- Taylor, P.¹

14
- 84871382567
- A unified trajectory tiling approach to high quality speech rendering
- Yao Qian, Frank K Soong, and Zhi-Jie Yan, "A unified trajectory tiling approach to high quality speech rendering, " Audio, Speech, and Language Processing, IEEE Transactions on, vol. 21, no. 2, pp. 280-290, 2013.
- (2013) Audio, Speech, and Language Processing, IEEE Transactions on , vol.21 , Issue.2 , pp. 280-290
- Qian, Y.¹ Soong, F.K.² Yan, Z.³

15
- 84901015944
- The USTC system for Blizzard Challenge 2008
- Zhen-Hua Ling, Heng Lu, Guo-Ping Hu, Li-Rong Dai, and Ren-Hua Wang, "The USTC system for Blizzard Challenge 2008, " in Proc. Blizzard Challenge, 2008.
- (2008) Proc. Blizzard Challenge
- Ling, Z.¹ Lu, H.² Hu, G.³ Dai, L.⁴ Wang, R.⁵

16
- 78049399368
- Rich-context unit selection (RUS) approach to high quality TTS
- Zhi-Jie Yan, Yao Qian, and Frank K Soong, "Rich-context unit selection (RUS) approach to high quality TTS, " in Proc. ICASSP, 2010, pp. 4798-4801.
- (2010) Proc. ICASSP , pp. 4798-4801
- Yan, Z.¹ Qian, Y.² Soong, F.K.³

17
- 84867217260
- Synthesis by generation and concatenation of multiform segments
- Vincent Pollet and Andrew Breen, "Synthesis by generation and concatenation of multiform segments., " in Proc. Interspeech, 2008, pp. 1825-1828.
- (2008) Proc. Interspeech , pp. 1825-1828
- Pollet, V.¹ Breen, A.²

18
- 84865718211
- Uniform speech parameterization for multi-form segment synthesis
- Alexander Sorin, Slava Shechtman, and Vincent Pollet, "Uniform Speech Parameterization for Multi-form Segment Synthesis, " in Proc. Interspeech, 2011, pp. 337-340.
- (2011) Proc. Interspeech , pp. 337-340
- Sorin, A.¹ Shechtman, S.² Pollet, V.³

19
- 84878557723
- Psychoacoustic segment scoring for multi-form speech synthesis
- Alexander Sorin, Slava Shechtman, and Vincent Pollet, "Psychoacoustic Segment Scoring for Multi-Form Speech Synthesis., " in Proc. Interspeech, 2012, pp. 2214-2217.
- (2012) Proc. Interspeech , pp. 2214-2217
- Sorin, A.¹ Shechtman, S.² Pollet, V.³

20
- 84910091105
- Refined intersegment joining in multi-form speech synthesis
- Alexander Sorin, Slava Shechtman, and Vincent Pollet, "Refined Intersegment Joining in Multi-Form Speech Synthesis, " in Proc. Interspeech, 2014, pp. 790-794.
- (2014) Proc. Interspeech , pp. 790-794
- Sorin, A.¹ Shechtman, S.² Pollet, V.³

21
- 84959124410
- Using deep bidirectional recurrent neural networks for prosodic-target prediction in a unit-selection text-to-speech system
- Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, and Ron Hoory, "Using Deep Bidirectional Recurrent Neural Networks for Prosodic-Target Prediction in a Unit-Selection Text-to-Speech System, " in Proc. Interspeech, 2015.
- (2015) Proc. Interspeech
- Fernandez, R.¹ Rendel, A.² Ramabhadran, B.³ Hoory, R.⁴

22
- 84946033275
- Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
- Zhizheng Wu, Cassia Valentini-Botinhao, Oliver Watts, and Simon King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, " in Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Wu, Z.¹ Valentini-Botinhao, C.² Watts, O.³ King, S.⁴

23
- 85032750981
- Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends
- Zhen-Hua Ling, Shi-Yin Kang, Heiga Zen, Andrew Senior, Mike Schuster, Xiao-Jun Qian, Helen M Meng, and Li Deng, "Deep learning for acoustic modeling in parametric speech generation: A systematic review of existing techniques and future trends, " IEEE Signal Processing Magazine, vol. 32, no. 3, pp. 35-52, 2015.
- (2015) IEEE Signal Processing Magazine , vol.32 , Issue.3 , pp. 35-52
- Ling, Z.¹ Kang, S.² Zen, H.³ Senior, A.⁴ Schuster, M.⁵ Qian, X.⁶ Meng, H.M.⁷ Deng, L.⁸

24
- 84973282956
- Acoustic modeling in statistical parametric speech synthesis-from hmm to lstm-rnn
- Heiga Zen, "Acoustic Modeling in Statistical Parametric Speech Synthesis-From HMM to LSTM-RNN, " in Proc. MLSLP, 2015.
- (2015) Proc. MLSLP
- Zen, H.¹

25
- 0032073761
- An RNNbased prosodic information synthesizer for Mandarin text-to-speech
- Sin-Horng Chen, Shaw-Hwa Hwang, and Yih-Ru Wang, "An RNNbased prosodic information synthesizer for Mandarin text-to-speech, " IEEE Transactions on Speech and Audio Processing, vol. 6, no. 3, pp. 226-239, 1998.
- (1998) IEEE Transactions on Speech and Audio Processing , vol.6 , Issue.3 , pp. 226-239
- Chen, S.¹ Hwang, S.² Wang, Y.³

26
- 70450161678
- Rich context modeling for high quality HMM-based TTS
- Zhi-Jie Yan, Yao Qian, and Frank K Soong, "Rich context modeling for high quality HMM-based TTS, " in Proc. Interspeech, 2009, pp. 1755-1758.
- (2009) Proc. Interspeech , pp. 1755-1758
- Yan, Z.¹ Qian, Y.² Soong, F.K.³

27
- 84973381105
- Festival 2-build your own general purpose unit selection speech synthesiser
- Robert AJ Clark, Korin Richmond, and Simon King, "Festival 2-build your own general purpose unit selection speech synthesiser, " in Proc. SSW5, 2004.
- (2004) Proc. SSW5
- Clark, R.A.J.¹ Richmond, K.² King, S.³

28
- 34047123652
- Multisyn: Opendomain unit selection for the festival speech synthesis system
- Robert AJ Clark, Korin Richmond, and Simon King, "Multisyn: Opendomain unit selection for the festival speech synthesis system, " Speech Communication, vol. 49, no. 4, pp. 317-330, 2007.
- (2007) Speech Communication , vol.49 , Issue.4 , pp. 317-330
- Clark, R.A.J.¹ Richmond, K.² King, S.³

29
- 34547516258
- Approximating the Kullback-Leibler divergence between Gaussian mixture models
- John R. Hershey and Peder A. Olsen, "Approximating the Kullback-Leibler divergence between Gaussian mixture models, " in Proc. ICASSP, 2007.
- (2007) Proc. ICASSP
- Hershey, J.R.¹ Olsen, P.A.²

30
- 85133720638
- The HMM-based speech synthesis system (HTS) version 2. 0
- Heiga Zen, Takashi Nose, Junichi Yamagishi, Shinji Sako, Takashi Masuko, Alan W. Black, and Keiichi Tokuda, "The HMM-based speech synthesis system (HTS) version 2. 0, " in Proc. SSW6, 2007, pp. 294-299.
- (2007) Proc. SSW6 , pp. 294-299
- Zen, H.¹ Nose, T.² Yamagishi, J.³ Sako, S.⁴ Masuko, T.⁵ Black, A.W.⁶ Tokuda, K.⁷

31
- 84959135757
- Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features
- Zhizheng Wu and Simon King, "Minimum trajectory error training for deep neural networks, combined with stacked bottleneck features, " in Proc. Interspeech, 2015.
- (2015) Proc. Interspeech
- Wu, Z.¹ King, S.²

32
- 84994377328
- Hurricane natural speech corpus, [sound]
- Martin Cooke, Catherine Mayo, and Cassia Valentini-Botinhao, "Hurricane natural speech corpus, [sound], " LISTA Consortium, doi: 10. 7488/ds/140, 2013.
- (2013) LISTA Consortium
- Cooke, M.¹ Mayo, C.² Valentini-Botinhao, C.³

33
- 84883051736
- Geneva, Switzerland, Objective measurement of active speech level, March
- International Telecommunication Union, Telecommunication Standardization Sector, Geneva, Switzerland, Objective measurement of active speech level, March 2011.
- (2011) International Telecommunication Union, Telecommunication Standardization Sector

34
- 84973382729
- March
- International Telecommunication Union Radiocommunication Assembly, Geneva, Switzerland, Method for the subjective assessment of intermediate quality level of coding systems, March 2003.
- (2003) International Telecommunication Union Radiocommunication Assembly, Geneva, Switzerland, Method for the Subjective Assessment of Intermediate Quality Level of Coding Systems

35
- 84973403758
- Listening test materials for
- [dataset] University of Edinburgh, The Centre for Speech Technology Research (CSTR)
- Thomas Merritt, Robert A. J. Clark, Zhizheng Wu, Junichi Yamagishi, and Simon King, "Listening test materials for "Deep neural network-guided unit selection synthesis", 2016 [dataset], " University of Edinburgh, The Centre for Speech Technology Research (CSTR), doi: 10. 7488/ds/1313.
- (2016) Deep Neural Network-guided Unit Selection Synthesis
- Merritt, T.¹ Clark, J.R.A.² Wu, Z.³ Yamagishi, J.⁴ King, S.⁵

36
- 0002609530
- Optimal coupling of diphones
- Springer
- Alistair D. Conkie and Stephen Isard, "Optimal coupling of diphones, " in Progress in speech synthesis, pp. 293-304. Springer, 1997.
- (1997) Progress in Speech Synthesis , pp. 293-304
- Conkie, A.D.¹ Isard, S.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.