SCOPUS 정보 검색 플랫폼

Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

Volumn 2015-January, Issue , 2015, Pages 879-883

A study of speaker adaptation for DNN-based speech synthesis

(5) Wu, Zhizheng a Swietojanski, Pawel a Veaux, Christophe a Renals, Steve a King, Simon a

a UNIVERSITY OF EDINBURGH (United Kingdom)

Author keywords

Acoustic model; Deep neural network; Speaker adaptation; Speech synthesis

Indexed keywords

HIDDEN MARKOV MODELS; LINGUISTICS; MARKOV PROCESSES; SCALES (WEIGHING INSTRUMENTS); SPEECH; SPEECH SYNTHESIS; VECTOR SPACES;

ACOUSTIC MODEL; ADAPTATION TECHNIQUES; DEEP NEURAL NETWORKS; EXPERIMENTAL ANALYSIS; SPEAKER ADAPTATION; SPEAKER CHARACTERISTICS; STATISTICAL PARAMETRIC SPEECH SYNTHESIS; UNIT-SELECTION SPEECH SYNTHESIS;

SPEECH COMMUNICATION;

EID: 84959112868 PISSN: 2308457X EISSN: 19909772 Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (123)

References (30)

1
- 67651002140
- Statistical parametric speech synthesis
- H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
- (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
- Zen, H.¹ Tokuda, K.² Black, A.W.³

2
- 84904680338
- The blizzard challenge 2013
- S. King and K. Karaiskos, "The blizzard challenge 2013, " in Blizzard Challenge Workshop, 2013.
- (2013) Blizzard Challenge Workshop
- King, S.¹ Karaiskos, K.²

3
- 0029288633
- Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models
- C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models, " Computer Speech & Language, vol. 9, no. 2, pp. 171-185, 1995.
- (1995) Computer Speech & Language , vol.9 , Issue.2 , pp. 171-185
- Leggetter, C.J.¹ Woodland, P.C.²

4
- 0028419019
- Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains
- J.-L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains, " IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.
- (1994) IEEE Trans. on Speech and Audio Processing , vol.2 , Issue.2 , pp. 291-298
- Gauvain, J.-L.¹ Lee, C.-H.²

5
- 67650854725
- Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm
- J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm, " IEEE Trans. Audio, Speech and Language Processing, vol. 17, no. 1, pp. 66-83, 2009.
- (2009) IEEE Trans. Audio, Speech and Language Processing , vol.17 , Issue.1 , pp. 66-83
- Yamagishi, J.¹ Kobayashi, T.² Nakano, Y.³ Ogata, K.⁴ Isogai, J.⁵

6
- 85032751458
- Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
- G. Hinton, L. Deng, D. Yu, G. Dahl, A. Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury, "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, " IEEE Signal Processing Magazine, vol. 29, no. 6, pp. 82-97, 2012.
- (2012) IEEE Signal Processing Magazine , vol.29 , Issue.6 , pp. 82-97
- Hinton, G.¹ Deng, L.² Yu, D.³ Dahl, G.⁴ Mohamed, A.⁵ Jaitly, N.⁶ Senior, A.⁷ Vanhoucke, V.⁸ Nguyen, P.⁹ Sainath, T.¹⁰ Kingsbury, B.¹¹

7
- 84890490547
- Statistical parametric speech synthesis using deep neural networks
- H. Zen, A. Senior, and M. Schuster, "Statistical parametric speech synthesis using deep neural networks, " in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013.
- (2013) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Zen, H.¹ Senior, A.² Schuster, M.³

8
- 84901237776
- Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
- Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using Restricted Boltzmann Machines and Deep Belief Networks for statistical parametric speech synthesis, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2129-2139, 2013.
- (2013) IEEE Transactions on Audio, Speech, and Language Processing , vol.21 , Issue.10 , pp. 2129-2139
- Ling, Z.-H.¹ Deng, L.² Yu, D.³

9
- 84890527090
- Multi-distribution deep belief network for speech synthesis
- S. Kang, X. Qian, and H. Meng, "Multi-distribution deep belief network for speech synthesis, " in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013.
- (2013) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Kang, S.¹ Qian, X.² Meng, H.³

10
- 84905251808
- On the training aspects of deep neural network (DNN) for parametric TTS synthesis
- Y. Qian, Y. Fan, W. Hu, and F. K. Soong, "On the training aspects of deep neural network (DNN) for parametric TTS synthesis, " in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2014.
- (2014) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Qian, Y.¹ Fan, Y.² Hu, W.³ Soong, F.K.⁴

11
- 84905262874
- Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis
- H. Zen and A. Senior, "Deep mixture density networks for acoustic modeling in statistical parametric speech synthesis, " in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2014.
- (2014) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Zen, H.¹ Senior, A.²

12
- 84910047819
- TTS synthesis with bidirectional LSTM based recurrent neural networks
- Y. Fan, Y. Qian, F. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks, " in Proc. Interspeech, 2014.
- (2014) Proc. Interspeech
- Fan, Y.¹ Qian, Y.² Xie, F.³ Soong, F.K.⁴

13
- 84946033275
- Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis
- Z. Wu, C. Valentini-Botinhao, O. Watts, and S. King, "Deep neural networks employing multi-task learning and stacked bottleneck features for speech synthesis, " in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2015.
- (2015) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Wu, Z.¹ Valentini-Botinhao, C.² Watts, O.³ King, S.⁴

14
- 84946036894
- Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE
- B. Uriá, I. Murray, S. Renals, and C. Valentini-Botinhao, "Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE, " in Proc IEEE ICASSP, 2015.
- (2015) Proc IEEE ICASSP
- Uriá, B.¹ Murray, I.² Renals, S.³ Valentini-Botinhao, C.⁴

15
- 84890542079
- Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
- D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "Kl-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, " in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2013.
- (2013) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Yu, D.¹ Yao, K.² Su, H.³ Li, G.⁴ Seide, F.⁵

16
- 84893691530
- Speaker adaptation of neural network acoustic models using I-vectors
- G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using I-vectors, " in Proc IEEE ASRU, 2013, pp. 55-59.
- (2013) Proc IEEE ASRU , pp. 55-59
- Saon, G.¹ Soltau, H.² Nahamoo, D.³ Picheny, M.⁴

17
- 84910068089
- Adaptation of deep neural network acoustic models using factorised i-vectors
- P. Karanasou, Y. Wang, M. J. Gales, and P. C. Woodland, "Adaptation of deep neural network acoustic models using factorised i-vectors, " in Proc. Interspeech, 2014.
- (2014) Proc. Interspeech
- Karanasou, P.¹ Wang, Y.² Gales, M.J.³ Woodland, P.C.⁴

18
- 84921731072
- Speaker adaptation of deep neural network based on discriminant codes
- S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q. Liu, "Speaker adaptation of deep neural network based on discriminant codes, " IEEE Trans. Audio, Speech and Language Processing, vol. 22, no. 12, pp. 1713-1725, 2014.
- (2014) IEEE Trans. Audio, Speech and Language Processing , vol.22 , Issue.12 , pp. 1713-1725
- Xue, S.¹ Abdel-Hamid, O.² Jiang, H.³ Dai, L.⁴ Liu, Q.⁵

19
- 84905259145
- I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription
- V. Gupta, P. Kenny, P. Ouellet, and T. Stafylakis, "I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription, " in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 2014.
- (2014) Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)
- Gupta, V.¹ Kenny, P.² Ouellet, P.³ Stafylakis, T.⁴

20
- 84983119674
- Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
- P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models, " in Proc. IEEE Spoken Language Technology Workshop, 2014.
- (2014) Proc. IEEE Spoken Language Technology Workshop
- Swietojanski, P.¹ Renals, S.²

21
- 84959166808
- Preliminary work on speaker adaptation for dnn-based speech synthesis
- Tech. Rep.
- B. Potard, P. Motlicek, and D. Imseng, "Preliminary work on speaker adaptation for dnn-based speech synthesis, " Idiap, Tech. Rep., 2015.
- (2015) Idiap
- Potard, B.¹ Motlicek, P.² Imseng, D.³

22
- 79951609039
- Front-end factor analysis for speaker verification
- N. Dehak, P. Kenny, R. Dehak, P. Dumouchel, and P. Ouellet, "Front-end factor analysis for speaker verification, " IEEE Trans. Audio, Speech and Language Processing, vol. 19, no. 4, pp. 788-798, 2011.
- (2011) IEEE Trans. Audio, Speech and Language Processing , vol.19 , Issue.4 , pp. 788-798
- Dehak, N.¹ Kenny, P.² Dehak, R.³ Dumouchel, P.⁴ Ouellet, P.⁵

23
- 84865733857
- Analysis of i-vector length normalization in speaker recognition systems
- D. Garcia-Romero and C. Y. Espy-Wilson, "Analysis of i-vector length normalization in speaker recognition systems. " in Proc. Interspeech, 2011.
- (2011) Proc. Interspeech
- Garcia-Romero, D.¹ Espy-Wilson, C.Y.²

24
- 84896111913
- ALIZE 3. 0-open source toolkit for state-of-the-art speaker recognition
- A. Larcher, J.-F. Bonastre, B. G. Fauve, K.-A. Lee, C. Lévy, H. Li, J. S. Mason, and J.-Y. Parfait, "ALIZE 3. 0-open source toolkit for state-of-the-art speaker recognition. " in Proc. Interspeech, 2013.
- (2013) Proc. Interspeech
- Larcher, A.¹ Bonastre, J.-F.² Fauve, B.G.³ Lee, K.-A.⁴ Lévy, C.⁵ Li, H.⁶ Mason, J.S.⁷ Parfait, J.-Y.⁸

25
- 84946032695
- Differentiable pooling for unsupervised speaker adaptation
- P. Swietojanski and S. Renals, "Differentiable pooling for unsupervised speaker adaptation, " in Proc. ICASSP, 2015.
- (2015) Proc. ICASSP
- Swietojanski, P.¹ Renals, S.²

26
- 84906225505
- Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
- O. Abdel-Hamid and H. Jiang, "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition. " in Proc. Interspeech. ISCA, pp. 1248-1252.
- Proc. Interspeech. ISCA , pp. 1248-1252
- Abdel-Hamid, O.¹ Jiang, H.²

27
- 57749193836
- Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
- T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
- (2007) IEEE Trans. Audio, Speech and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
- Toda, T.¹ Black, A.W.² Tokuda, K.³

28
- 84894152556
- The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
- C. Veaux, J. Yamagishi, and S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database, " in Proc. Int. Conf. Oriental COCOSDA, 2013.
- (2013) Proc. Int. Conf. Oriental COCOSDA
- Veaux, C.¹ Yamagishi, J.² King, S.³

29
- 0032673049
- Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
- H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
- (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
- Kawahara, H.¹ Masuda-Katsuse, I.² De Cheveigné, A.³

30
- 78149337911
- University of Toronto, Tech. Rep.
- V. Mnih, "Cudamat: A cuda-based matrix class for python, " University of Toronto, Tech. Rep., 2009
- (2009) CUDAmat: A CUDA-based Matrix Class for Python
- Mnih, V.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.