메뉴 건너뛰기




Volumn 23, Issue 11, 2015, Pages 1938-1949

Speaker Adaptive Training of Deep Neural Network Acoustic Models Using I-Vectors

Author keywords

Acoustic modeling; deep neural networks (DNNs); speaker adaptive training (SAT)

Indexed keywords

OBJECT RECOGNITION;

EID: 84938688160     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2015.2457612     Document Type: Article
Times cited : (131)

References (56)
  • 1
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pretrained deep neural networks for large-vocabulary speech recognition
    • Jan.
    • G. E. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 2
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineering in context-dependent deep neural networks for conversational speech transcription
    • F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription," in Proc. IEEE Workshop Autom. Speech Recogn. Understand. (ASRU), 2011, pp. 24-29.
    • (2011) Proc. IEEE Workshop Autom. Speech Recogn. Understand. (ASRU) , pp. 24-29
    • Seide, F.1    Li, G.2    Chen, X.3    Yu, D.4
  • 6
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, no. 2, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang. , vol.9 , Issue.2 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 7
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMMbased speech recognition
    • M. J. Gales, "Maximum likelihood linear transformations for HMMbased speech recognition," Comput. Speech Lang., vol. 12, no. 2, pp. 75-98, 1998.
    • (1998) Comput. Speech Lang. , vol.12 , Issue.2 , pp. 75-98
    • Gales, M.J.1
  • 8
    • 84893703162 scopus 로고    scopus 로고
    • Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription
    • H. Liao, E. McDermott, and A. Senior, "Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription," in Proc. IEEEWorkshop Autom. Speech Recogn. Understand. (ASRU), 2013, pp. 368-373.
    • (2013) Proc. IEEEWorkshop Autom. Speech Recogn. Understand. (ASRU) , pp. 368-373
    • Liao, H.1    McDermott, E.2    Senior, A.3
  • 10
    • 79959849500 scopus 로고    scopus 로고
    • Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
    • B. Li and K. C. Sim, "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems," in Proc. 11th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2010.
    • (2010) Proc. 11th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH)
    • Li, B.1    Sim, K.C.2
  • 18
    • 84921731072 scopus 로고    scopus 로고
    • Fast adaptation of deep neural network based on discriminant codes for speech recognition
    • Dec.
    • S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q. Liu, "Fast adaptation of deep neural network based on discriminant codes for speech recognition," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1713-1725, Dec. 2014.
    • (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.22 , Issue.12 , pp. 1713-1725
    • Xue, S.1    Abdel-Hamid, O.2    Jiang, H.3    Dai, L.4    Liu, Q.5
  • 33
    • 84983119674 scopus 로고    scopus 로고
    • Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
    • P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models," in Proc. IEEE Spoken Lang. Technol. Workshop (SLT), 2014.
    • (2014) Proc. IEEE Spoken Lang. Technol. Workshop (SLT)
    • Swietojanski, P.1    Renals, S.2
  • 37
  • 41
    • 84890452886 scopus 로고    scopus 로고
    • Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
    • O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2013, pp. 7942-7946.
    • (2013) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , pp. 7942-7946
    • Abdel-Hamid, O.1    Jiang, H.2
  • 42
    • 84906225505 scopus 로고    scopus 로고
    • Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
    • O. Abdel-Hamid and H. Jiang, "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition," in Proc. 14th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH), 2013, pp. 1248-1252.
    • (2013) Proc. 14th Annu. Conf. Int. Speech Commun. Assoc. (INTERSPEECH) , pp. 1248-1252
    • Abdel-Hamid, O.1    Jiang, H.2
  • 45
    • 84881054791 scopus 로고    scopus 로고
    • Hermitian polynomial for speaker adaptation of connectionist speech recognition systems
    • Oct.
    • S. M. Siniscalchi, J. Li, and C.-H. Lee, "Hermitian polynomial for speaker adaptation of connectionist speech recognition systems," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2152-2161, Oct. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.10 , pp. 2152-2161
    • Siniscalchi, S.M.1    Li, J.2    Lee, C.-H.3
  • 48
    • 84946076428 scopus 로고    scopus 로고
    • TED-LIUM: An automatic speech recognition dedicated corpus
    • A. Rousseau, P. Deléglise, and Y. Estève, "TED-LIUM: An automatic speech recognition dedicated corpus," in Proc. LREC, 2012, pp. 125-129.
    • (2012) Proc. LREC , pp. 125-129
    • Rousseau, A.1    Deléglise, P.2    Estève, Y.3
  • 51
    • 84938725977 scopus 로고    scopus 로고
    • Kaldi+PDNN: Building DNN-based ASR systems with Kaldi and PDNN
    • Y. Miao, "Kaldi+PDNN: Building DNN-based ASR systems with Kaldi and PDNN," arXiv preprint arXiv:1401.6984, 2014.
    • (2014) ArXiv Preprint arXiv:1401.6984
    • Miao, Y.1
  • 52
    • 79551480483 scopus 로고    scopus 로고
    • Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion
    • P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol, "Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion," J. Mach. Learn. Res., vol. 11, pp. 3371-3408, 2010.
    • (2010) J. Mach. Learn. Res. , vol.11 , pp. 3371-3408
    • Vincent, P.1    Larochelle, H.2    Lajoie, I.3    Bengio, Y.4    Manzagol, P.-A.5
  • 53
    • 84872506495 scopus 로고    scopus 로고
    • A practical guide to training restricted Boltzmann machines
    • New York, NY, USA: Springer
    • G. E. Hinton, "A practical guide to training restricted Boltzmann machines," in Neural Networks: Tricks of the Trade. New York, NY, USA: Springer, 2012, pp. 599-619.
    • (2012) Neural Networks: Tricks of the Trade , pp. 599-619
    • Hinton, G.E.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.