SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 2207-2211

Deep neural network context embeddings for model selection in rich-context HMM synthesis

(5) Merritt, Thomas a Yamagishi, Junichi a,b Wu, Zhizheng a Watts, Oliver a King, Simon a

a UNIVERSITY OF EDINBURGH (United Kingdom)

b NATIONAL INSTITUTE OF INFORMATICS (Japan)

Author keywords

Deep neural networks; Embedding; Hidden Markov model; Rich context; Speech synthesis

Indexed keywords

DECISION TREES; LINGUISTICS; MARKOV PROCESSES; SPEECH COMMUNICATION; SPEECH SYNTHESIS; TREES (MATHEMATICS); TRELLIS CODES;

CONTEXT SYNTHESIS; DEEP NEURAL NETWORKS; EMBEDDING; GUIDE SELECTION; MODEL SELECTION; PARAMETRIC SYNTHESIS; RICH CONTEXT; STATISTICAL PARAMETRIC SPEECH SYNTHESIS;

HIDDEN MARKOV MODELS;

EID: 84959122693 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (6)

References (27)

1
- 84878419996
- The blizzard challenge 2010
- S. King and V. Karaiskos, "The Blizzard Challenge 2010, " inProc. Blizzard Challenge, 2010.
- (2010) Proc. Blizzard Challenge
- King, S.¹ Karaiskos, V.²

2
- 84878419996
- The blizzard challenge 2011
- -, "The Blizzard Challenge 2011, " in Proc. Blizzard Challenge, 2011.
- (2011) Proc. Blizzard Challenge
- King, S.¹ Karaiskos, V.²

3
- 84890516589
- The blizzard challenge 2012
- -, "The Blizzard Challenge 2012, " in Proc. Blizzard Challenge, 2012.
- (2012) Proc. Blizzard Challenge
- King, S.¹ Karaiskos, V.²

4
- 84910105608
- Measuring a decade of progress in text-to-speech
- S. King, "Measuring a decade of progress in text-to-speech, " Loquens, vol. 1, no. 1, 2014.
- (2014) Loquens , vol.1 , Issue.1
- King, S.¹

5
- 38549096029
- A Speech parameter generation algorithmconsidering global variance for HMM-based speechsynthesis
- May
- T. Toda and K. Tokuda, "A Speech Parameter Generation AlgorithmConsidering Global Variance for HMM-Based SpeechSynthesis, " IEICE Transactions on Information and Systems, vol. E90-D, no. 5, pp. 816-824, May 2007.
- (2007) IEICE Transactions on Information and Systems , vol.E90-D , Issue.5 , pp. 816-824
- Toda, T.¹ Tokuda, K.²

6
- 84856237844
- An introduction to statistical parametric speech synthesis
- S. King, "An introduction to statistical parametric speech synthesis, "Sadhana, vol. 36, pp. 837-852, 2011.
- (2011) Sadhana , vol.36 , pp. 837-852
- King, S.¹

7
- 0033708106
- Speech parameter generation algorithms for HMM-basedspeech synthesis
- K. Tokuda, T. Yoshimura, T. Masuko, T. Kobayashi, and T. Kitamura, "Speech parameter generation algorithms for HMM-basedspeech synthesis, " Proc. ICASSP, 2000.
- (2000) Proc. ICASSP
- Tokuda, K.¹ Yoshimura, T.² Masuko, T.³ Kobayashi, T.⁴ Kitamura, T.⁵

8
- 67651002140
- Statistical parametricspeech synthesis
- Nov.
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametricspeech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, Nov. 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

9
- 84946042252
- Attributing modelling errorsinHMMsynthesis by stepping gradually from natural to modelledspeech
- T. Merritt, J. Latorre, and S. King, "Attributing modelling errorsinHMMsynthesis by stepping gradually from natural to modelledspeech, " in Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Merritt, T.¹ Latorre, J.² King, S.³

10
- 85133164767
- Investigating the shortcomings of HMMsynthesis
- T. Merritt and S. King, "Investigating the shortcomings of HMMsynthesis, " in Proc. 8th ISCA Speech Synthesis Workshop, 2013, pp. 165-170.
- (2013) Proc. 8th ISCA Speech Synthesis Workshop , pp. 165-170
- Merritt, T.¹ King, S.²

11
- 84910070288
- Investigating source and filtercontributions, and their interaction, to statistical parametricspeech synthesis
- T. Merritt, T. Raitio, and S. King, "Investigating source and filtercontributions, and their interaction, to statistical parametricspeech synthesis, " in Proc. Interspeech, 2014, pp. 1509-1513.
- (2014) Proc. Interspeech , pp. 1509-1513
- Merritt, T.¹ Raitio, T.² King, S.³

12
- 84910028520
- Measuring the perceptual effects of modelling assumptions inspeech synthesis using stimuli constructed from repeated naturalspeech
- G. E. Henter, T. Merritt, M. Shannon, C. Mayo, and S. King, "Measuring the perceptual effects of modelling assumptions inspeech synthesis using stimuli constructed from repeated naturalspeech, " in Proc. Interspeech, 2014, pp. 1504-1508.
- (2014) Proc. Interspeech , pp. 1504-1508
- Henter, G.E.¹ Merritt, T.² Shannon, M.³ Mayo, C.⁴ King, S.⁵

13
- 70450161678
- Rich context modeling forhigh quality HMM-based TTS
- Z.-J. Yan, Y. Qian, and F. K. Soong, "Rich context modeling forhigh quality HMM-based TTS, " in Proc. Interspeech, 2009, pp. 1755-1758.
- (2009) Proc. Interspeech , pp. 1755-1758
- Yan, Z.-J.¹ Qian, Y.² Soong, F.K.³

14
- 78049399368
- Rich-context unit selection ( RUS) approach to high qualityTTS
- -, "Rich-context unit selection ( RUS) approach to high qualityTTS, " in Proc. ICASSP, 2010, pp. 4798-4801.
- (2010) Proc. ICASSP , pp. 4798-4801
- Yan, Z.-J.¹ Qian, Y.² Soong, F.K.³

15
- 84878421733
- An evaluation of parameter generation methods withrich context models in HMM-based speech synthesis
- S. Takamichi, T. Toda, Y. Shiga, H. Kawai, S. Sakti, and S. Nakamura, "An Evaluation of Parameter Generation Methods withRich Context Models in HMM-Based Speech Synthesis, " in Proc. Interspeech, 2012, pp. 1139-1142.
- (2012) Proc. Interspeech , pp. 1139-1142
- Takamichi, S.¹ Toda, T.² Shiga, Y.³ Kawai, H.⁴ Sakti, S.⁵ Nakamura, S.⁶

16
- 84897862522
- Parameter generation methodswith richcontext models for high-quality and flexible text-to-speechsynthesis
- S. Takamichi, T. Toda, Y. Shiga, S. Sakti, G. Neubig, S. Nakamura, and S. Member, "Parameter Generation MethodsWith RichContext Models for High-Quality and Flexible Text-To-SpeechSynthesis, " Selected Topics in Signal Processing, IEEE Journalof, vol. 8, no. 2, pp. 239-250, 2014.
- (2014) Selected Topics in Signal Processing, IEEE Journalof , vol.8 , Issue.2 , pp. 239-250
- Takamichi, S.¹ Toda, T.² Shiga, Y.³ Sakti, S.⁴ Neubig, G.⁵ Nakamura, S.⁶ Member, S.⁷

17
- 51449111086
- A cross-languagestate mapping approach to bilingual (Mand arin-English) TTS
- H. Liang, Y. Qian, F. K. Soong, and G. Liu, "A cross-languagestate mapping approach to bilingual (Mand arin-English) TTS, " inProc. ICASSP, 2008, pp. 4641-4644.
- (2008) Proc. ICASSP , pp. 4641-4644
- Liang, H.¹ Qian, Y.² Soong, F.K.³ Liu, G.⁴

18
- 84946033275
- Deep neuralnetworks employing multi-task learning and stacked bottleneckfeatures for speech synthesis
- Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neuralnetworks employing multi-task learning and stacked bottleneckfeatures for speech synthesis, " in ICASSP, 2015.
- (2015) ICASSP
- Wu, Z.¹ Valentini-Botinhao, C.² Watts, O.³ King, S.⁴

19
- 84910030525
- Word embeddings for speech recognition
- S. Bengio and G. Heigold, "Word Embeddings for Speech Recognition, "in Proc. Interspeech, 2014, pp. 1053-1057.
- (2014) Proc. Interspeech , pp. 1053-1057
- Bengio, S.¹ Heigold, G.²

20
- 44949153641
- The target cost formulation in unit selection speechsynthesis
- P. Taylor, "The target cost formulation in unit selection speechsynthesis. " in Proc. Interspeech, 2006, pp. 2038-2041.
- (2006) Proc. Interspeech , pp. 2038-2041
- Taylor, P.¹

21
- 34547516258
- Approximating the kullback-leibler divergence between Gaussian mixture models
- J. R. Hershey and P. a. Olsen, "Approximating the Kullback-Leibler divergence between Gaussian mixture models, " in Proc. ICASSP, 2007.
- (2007) Proc. ICASSP
- Hershey, J.R.¹ Olsen P, A.²

22
- 84994377328
- Hurricanenatural speech corpus, [sound]
- M. Cooke, C. Mayo, and C. Valentini-Botinhao, "Hurricanenatural speech corpus, [sound], " LISTA Consortium, doi: 10. 7488/ds/140, 2013.
- (2013) LISTA Consortium
- Cooke, M.¹ Mayo, C.² Valentini-Botinhao, C.³

23
- 33750915991
- STRAIGHT, exploitation of the other aspect ofVOCODER: Perceptually isomorphic decomposition of speechsounds
- H. Kawahara, "STRAIGHT, exploitation of the other aspect ofVOCODER: Perceptually isomorphic decomposition of speechsounds, " Acoust. Sci. Technol., vol. 27, no. 6, pp. 349-353, 2006.
- (2006) Acoust. Sci. Technol , vol.27 , Issue.6 , pp. 349-353
- Kawahara, H.¹

24
- 84883051736
- Objective measurement of active speech level, ITU RecommendationITU-T P. 56, Geneva, Switzerland, March
- Objective measurement of active speech level, ITU RecommendationITU-T P. 56, International Telecommunication Union, Telecommunication Stand ardization Sector, Geneva, Switzerland, March 2011.
- (2011) International Telecommunication Union, Telecommunication Stand Ardization Sector

25
- 84959114033
- Method for the subjective assessment of intermediate quality levelof coding systems, ITU Recommendation ITU-R BS. 1534-1, Geneva, Switzerland, March
- Method for the subjective assessment of intermediate quality levelof coding systems, ITU Recommendation ITU-R BS. 1534-1, InternationalTelecommunication Union Radiocommunication Assembly, Geneva, Switzerland, March 2003.
- (2003) InternationalTelecommunication Union Radiocommunication Assembly

26
- 84959110971
- [dataset] university of Edinburgh, The Centre for Speech Technology Research(CSTR)
- T. Merritt, J. Yamagishi, Z. Wu, O. Watts, and S. King, "Listeningtest materials for "Deep neural network context embeddings formodel selection in rich-context HMM synthesis", 2015 [dataset], "university of Edinburgh, The Centre for Speech Technology Research(CSTR), doi: 10. 7488/ds/256.
- (2015) Listeningtest Materials For, Deep Neural Network Context Embeddings Formodel Selection in Rich-context HMM Synthesis
- Merritt, T.¹ Yamagishi, J.² Wu, Z.³ Watts, O.⁴ King, S.⁵

27
- 84959127221
- Are we usingenough listeners No! an empirically-supported critique ofInterspeech 2014 TTS evaluations
- M. Wester, C. Valentini-Botinhao, and G. E. Henter, "Are we usingenough listeners No! an empirically-supported critique ofInterspeech 2014 TTS evaluations, " in Proc. Interspeech, 2015.
- (2015) Proc. Interspeech
- Wester, M.¹ Valentini-Botinhao, C.² Henter, G.E.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.