메뉴 건너뛰기




Volumn 2015-January, Issue , 2015, Pages 879-883

A study of speaker adaptation for DNN-based speech synthesis

Author keywords

Acoustic model; Deep neural network; Speaker adaptation; Speech synthesis

Indexed keywords

HIDDEN MARKOV MODELS; LINGUISTICS; MARKOV PROCESSES; SCALES (WEIGHING INSTRUMENTS); SPEECH; SPEECH SYNTHESIS; VECTOR SPACES;

EID: 84959112868     PISSN: 2308457X     EISSN: 19909772     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (123)

References (30)
  • 1
    • 67651002140 scopus 로고    scopus 로고
    • Statistical parametric speech synthesis
    • H. Zen, K. Tokuda, and A. W. Black, "Statistical parametric speech synthesis, " Speech Communication, vol. 51, no. 11, pp. 1039-1064, 2009.
    • (2009) Speech Communication , vol.51 , Issue.11 , pp. 1039-1064
    • Zen, H.1    Tokuda, K.2    Black, A.W.3
  • 3
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden markov models, " Computer Speech & Language, vol. 9, no. 2, pp. 171-185, 1995.
    • (1995) Computer Speech & Language , vol.9 , Issue.2 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 4
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains
    • J.-L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of markov chains, " IEEE Trans. on Speech and Audio Processing, vol. 2, no. 2, pp. 291-298, 1994.
    • (1994) IEEE Trans. on Speech and Audio Processing , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.-L.1    Lee, C.-H.2
  • 5
    • 67650854725 scopus 로고    scopus 로고
    • Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm
    • J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, "Analysis of speaker adaptation algorithms for hmm-based speech synthesis and a constrained smaplr adaptation algorithm, " IEEE Trans. Audio, Speech and Language Processing, vol. 17, no. 1, pp. 66-83, 2009.
    • (2009) IEEE Trans. Audio, Speech and Language Processing , vol.17 , Issue.1 , pp. 66-83
    • Yamagishi, J.1    Kobayashi, T.2    Nakano, Y.3    Ogata, K.4    Isogai, J.5
  • 8
    • 84901237776 scopus 로고    scopus 로고
    • Modeling spectral envelopes using restricted boltzmann machines and deep belief networks for statistical parametric speech synthesis
    • Z.-H. Ling, L. Deng, and D. Yu, "Modeling spectral envelopes using Restricted Boltzmann Machines and Deep Belief Networks for statistical parametric speech synthesis, " IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 10, pp. 2129-2139, 2013.
    • (2013) IEEE Transactions on Audio, Speech, and Language Processing , vol.21 , Issue.10 , pp. 2129-2139
    • Ling, Z.-H.1    Deng, L.2    Yu, D.3
  • 12
    • 84910047819 scopus 로고    scopus 로고
    • TTS synthesis with bidirectional LSTM based recurrent neural networks
    • Y. Fan, Y. Qian, F. Xie, and F. K. Soong, "TTS synthesis with bidirectional LSTM based recurrent neural networks, " in Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • Fan, Y.1    Qian, Y.2    Xie, F.3    Soong, F.K.4
  • 14
    • 84946036894 scopus 로고    scopus 로고
    • Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE
    • B. Uriá, I. Murray, S. Renals, and C. Valentini-Botinhao, "Modelling acoustic feature dependencies with artificial neural networks: Trajectory-RNADE, " in Proc IEEE ICASSP, 2015.
    • (2015) Proc IEEE ICASSP
    • Uriá, B.1    Murray, I.2    Renals, S.3    Valentini-Botinhao, C.4
  • 16
    • 84893691530 scopus 로고    scopus 로고
    • Speaker adaptation of neural network acoustic models using I-vectors
    • G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using I-vectors, " in Proc IEEE ASRU, 2013, pp. 55-59.
    • (2013) Proc IEEE ASRU , pp. 55-59
    • Saon, G.1    Soltau, H.2    Nahamoo, D.3    Picheny, M.4
  • 17
    • 84910068089 scopus 로고    scopus 로고
    • Adaptation of deep neural network acoustic models using factorised i-vectors
    • P. Karanasou, Y. Wang, M. J. Gales, and P. C. Woodland, "Adaptation of deep neural network acoustic models using factorised i-vectors, " in Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • Karanasou, P.1    Wang, Y.2    Gales, M.J.3    Woodland, P.C.4
  • 20
    • 84983119674 scopus 로고    scopus 로고
    • Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
    • P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models, " in Proc. IEEE Spoken Language Technology Workshop, 2014.
    • (2014) Proc. IEEE Spoken Language Technology Workshop
    • Swietojanski, P.1    Renals, S.2
  • 21
    • 84959166808 scopus 로고    scopus 로고
    • Preliminary work on speaker adaptation for dnn-based speech synthesis
    • Tech. Rep.
    • B. Potard, P. Motlicek, and D. Imseng, "Preliminary work on speaker adaptation for dnn-based speech synthesis, " Idiap, Tech. Rep., 2015.
    • (2015) Idiap
    • Potard, B.1    Motlicek, P.2    Imseng, D.3
  • 23
    • 84865733857 scopus 로고    scopus 로고
    • Analysis of i-vector length normalization in speaker recognition systems
    • D. Garcia-Romero and C. Y. Espy-Wilson, "Analysis of i-vector length normalization in speaker recognition systems. " in Proc. Interspeech, 2011.
    • (2011) Proc. Interspeech
    • Garcia-Romero, D.1    Espy-Wilson, C.Y.2
  • 25
    • 84946032695 scopus 로고    scopus 로고
    • Differentiable pooling for unsupervised speaker adaptation
    • P. Swietojanski and S. Renals, "Differentiable pooling for unsupervised speaker adaptation, " in Proc. ICASSP, 2015.
    • (2015) Proc. ICASSP
    • Swietojanski, P.1    Renals, S.2
  • 26
    • 84906225505 scopus 로고    scopus 로고
    • Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
    • O. Abdel-Hamid and H. Jiang, "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition. " in Proc. Interspeech. ISCA, pp. 1248-1252.
    • Proc. Interspeech. ISCA , pp. 1248-1252
    • Abdel-Hamid, O.1    Jiang, H.2
  • 27
    • 57749193836 scopus 로고    scopus 로고
    • Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory
    • T. Toda, A. W. Black, and K. Tokuda, "Voice conversion based on maximum-likelihood estimation of spectral parameter trajectory, " IEEE Trans. Audio, Speech and Language Processing, vol. 15, no. 8, pp. 2222-2235, 2007.
    • (2007) IEEE Trans. Audio, Speech and Language Processing , vol.15 , Issue.8 , pp. 2222-2235
    • Toda, T.1    Black, A.W.2    Tokuda, K.3
  • 28
    • 84894152556 scopus 로고    scopus 로고
    • The voice bank corpus: Design, collection and data analysis of a large regional accent speech database
    • C. Veaux, J. Yamagishi, and S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database, " in Proc. Int. Conf. Oriental COCOSDA, 2013.
    • (2013) Proc. Int. Conf. Oriental COCOSDA
    • Veaux, C.1    Yamagishi, J.2    King, S.3
  • 29
    • 0032673049 scopus 로고    scopus 로고
    • Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds
    • H. Kawahara, I. Masuda-Katsuse, and A. de Cheveigné, "Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: Possible role of a repetitive structure in sounds, " Speech communication, vol. 27, no. 3, pp. 187-207, 1999.
    • (1999) Speech Communication , vol.27 , Issue.3 , pp. 187-207
    • Kawahara, H.1    Masuda-Katsuse, I.2    De Cheveigné, A.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.