메뉴 건너뛰기




Volumn 24, Issue 8, 2016, Pages 1450-1463

Learning hidden unit contributions for unsupervised acoustic model adaptation

Author keywords

[No Author keywords available]

Indexed keywords

ACOUSTIC MODEL ADAPTATION; ADAPTATION TECHNIQUES; FEATURE EXTRACTOR; SPEAKER ADAPTIVE TRAININGS; SPEAKER DEPENDENTS; SPEAKER INDEPENDENTS; UNSUPERVISED ADAPTATION; WORD ERROR RATE REDUCTIONS;

EID: 84976435936     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2016.2560534     Document Type: Article
Times cited : (159)

References (84)
  • 1
    • 85032751458 scopus 로고    scopus 로고
    • Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
    • Nov
    • G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, " IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012
    • (2012) IEEE Signal Process. Mag , vol.29 , Issue.6 , pp. 82-97
    • Hinton, G.1
  • 4
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineering in context-dependent deep neural networks for conversational speech transcription
    • F. Seide, X. Chen, and D. Yu, "Feature engineering in context-dependent deep neural networks for conversational speech transcription, " in Proc. 12th IEEE Workshop Automatic Speech Recog. Understanding, 2011, pp. 24-29
    • (2011) Proc. 12th IEEE Workshop Automatic Speech Recog Understanding , pp. 24-29
    • Seide, F.1    Chen, X.2    Yu, D.3
  • 5
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pretrained deep neural networks for large-vocabulary speech recognition
    • Jan
    • G. Dahl, D. Yu, L. Deng, and A. Acero, "Context-dependent pretrained deep neural networks for large-vocabulary speech recognition, " IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 1, pp. 30-42, Jan. 2012
    • (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 10
    • 84964475976 scopus 로고    scopus 로고
    • Cambridge University transcription systems for the multi-genre broadcast challenge
    • P. Woodland et al., "Cambridge University transcription systems for the multi-genre broadcast challenge, " in Proc. IEEE Workshop Automatic Speech Recog. Understanding, 2015, pp. 639-646
    • (2015) Proc IEEE Workshop Automatic Speech Recog. Understanding , pp. 639-646
    • Woodl, P.1
  • 12
    • 84906225505 scopus 로고    scopus 로고
    • Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition
    • O. Abdel-Hamid and H. Jiang, "Rapid and effective speaker adaptation of convolutional neural network based models for speech recognition, " in Proc. 14th Annu. Conf. Int. SpeechCommun. Assoc., 2013, pp. 1248-1252
    • (2013) Proc. 14th Annu. Conf. Int. SpeechCommun. Assoc , pp. 1248-1252
    • Abdel-Hamid, O.1    Jiang, H.2
  • 13
    • 84983119674 scopus 로고    scopus 로고
    • Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models
    • P. Swietojanski and S. Renals, "Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models, " in Proc. IEEE Spoken Lang. Technol. Workshop, 2014, pp. 171-176
    • (2014) Proc IEEE Spoken Lang. Technol. Workshop , pp. 171-176
    • Swietojanski, P.1    Renals, S.2
  • 16
    • 35948981862 scopus 로고    scopus 로고
    • Unleashing the killer corpus: Experiences in creating the multi-everything AMI meeting corpus."
    • J. Carletta, "Unleashing the killer corpus: Experiences in creating the multi-everything AMI meeting corpus." Language Resources Eval., vol. 41, no. 2, pp. 181-190, 2007
    • (2007) Language Resources Eval , vol.41 , Issue.2 , pp. 181-190
    • Carletta, J.1
  • 19
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMMbased speech recognition
    • Apr
    • M. Gales, "Maximum likelihood linear transformations for HMMbased speech recognition, " Comput. Speech Lang., vol. 12, pp. 75-98, Apr. 1998
    • (1998) Comput. Speech Lang , vol.12 , pp. 75-98
    • Gales, M.1
  • 21
    • 85008520364 scopus 로고    scopus 로고
    • Transcribing meetings with the AMIDA systems
    • Feb
    • T. Hain, et al., "Transcribing meetings with the AMIDA systems, " IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 2, pp. 486-498, Feb. 2012
    • (2012) IEEE Trans. Audio, Speech, Lang. Process , vol.20 , Issue.2 , pp. 486-498
    • Hain, T.1
  • 26
    • 79959849500 scopus 로고    scopus 로고
    • Comparison of discriminative input and output transformations for speaker adaptation in the hybrid nn/hmm systems
    • B. Li and K. Sim, "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid nn/hmm systems, " in Proc. 11th Annu. Conf. Int. Speech Commun. Assoc., 2010
    • (2010) Proc. 11th Annu. Conf. Int. Speech Commun. Assoc
    • Li, B.1    Sim, K.2
  • 29
    • 84893691530 scopus 로고    scopus 로고
    • Speaker adaptation of neural network acoustic models using i-vectors."
    • Online].Available:
    • G. Saon, H. Soltau, D. Nahamoo, and M. Picheny, "Speaker adaptation of neural network acoustic models using i-vectors." in Proc. IEEE Automatic Speech Recog. Understanding, 2013, pp. 55-59. [Online].Available: http://dblp.uni-Trier.de/db/conf/asru/asru2013.html#SaonSNP13
    • (2013) Proc IEEE Automatic Speech Recog. Understanding , pp. 55-59
    • Saon, G.1    Soltau, H.2    Nahamoo, D.3    Picheny, M.4
  • 32
    • 84938688160 scopus 로고    scopus 로고
    • Speaker adaptive training of deep neural network acoustic models using i-vectors
    • Nov
    • Y. Miao, H. Zhang, and F. Metze, "Speaker adaptive training of deep neural network acoustic models using i-vectors, " IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 11, pp. 1938-1949, Nov. 2015
    • (2015) IEEE/ACM Trans. Audio, Speech, Lang. Process , vol.23 , Issue.11 , pp. 1938-1949
    • Miao, Y.1    Zhang, H.2    Metze, F.3
  • 33
    • 84905269643 scopus 로고    scopus 로고
    • Using neural network front-ends on far field multiple microphones based speech recognition
    • Y. Liu, P. Zhang, and T. Hain, "Using neural network front-ends on far field multiple microphones based speech recognition, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2014, pp. 5542-5546
    • (2014) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 5542-5546
    • Liu, Y.1    Zhang, P.2    Hain, T.3
  • 34
    • 84910030053 scopus 로고
    • Recnorm: Simultaneous normalisation and classification applied to speech recognition
    • Online]. Available:
    • J. Bridle and S. Cox, "Recnorm: Simultaneous normalisation and classification applied to speech recognition, " in Proc. Adv. Neural Inf. Process Sys 3, 1990, pp. 234-240. [Online]. Available: http://papers.nips.cc/paper/328-recnorm-simultaneous-normalisationand-classification-applied-To-speech-recognition.pdf
    • (1990) Proc. Adv. Neural Inf. Process Sys , vol.3 , pp. 234-240
    • Bridle, J.1    Cox, S.2
  • 35
    • 84890452886 scopus 로고    scopus 로고
    • Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
    • O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2013, pp. 4277-4280
    • (2013) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 4277-4280
    • Abdel-Hamid, O.1    Jiang, H.2
  • 36
    • 84921731072 scopus 로고    scopus 로고
    • Fast adaptation of deep neural network based on discriminant codes for speech recognition
    • Dec
    • S. Xue, O. Abdel-Hamid, J. Hui, L. Dai, and Q. Liu, "Fast adaptation of deep neural network based on discriminant codes for speech recognition, " IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1713-1725, Dec. 2014
    • (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process , vol.22 , Issue.12 , pp. 1713-1725
    • Xue, S.1    Abdel-Hamid, O.2    Hui, J.3    Dai, L.4    Liu, Q.5
  • 37
    • 84890521103 scopus 로고    scopus 로고
    • Speaker adaptation of context dependent deep neural networks
    • H. Liao, "Speaker adaptation of context dependent deep neural networks, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2013, pp. 7947-7951
    • (2013) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 7947-7951
    • Liao, H.1
  • 38
    • 84890542079 scopus 로고    scopus 로고
    • KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition."
    • D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition." in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2013, pp. 7893-7897
    • (2013) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 7893-7897
    • Yu, D.1    Yao, K.2    Su, H.3    Li, G.4    Seide, F.5
  • 39
    • 84959166782 scopus 로고    scopus 로고
    • Regularized sequence-level deep neural network model adaptation
    • Y. Huang and Y. Gong, "Regularized sequence-level deep neural network model adaptation, " in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 1081-1085
    • (2015) Proc. Annu. Conf. Int. Speech Commun. Assoc , pp. 1081-1085
    • Huang, Y.1    Gong, Y.2
  • 40
    • 84905229915 scopus 로고    scopus 로고
    • Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network
    • J. Xue, J. Li, D. Yu, M. Seltzer, and Y. Gong, "Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2014, pp. 6359-6363
    • (2014) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 6359-6363
    • Xue, J.1    Li, J.2    Yu, D.3    Seltzer, M.4    Gong, Y.5
  • 44
    • 84946061232 scopus 로고    scopus 로고
    • Investigating online lowfootprint speaker adaptation using generalized linear regression and clickthrough data
    • Y. Zhao, J. Li, J. Xue, and Y. Gong, "Investigating online lowfootprint speaker adaptation using generalized linear regression and clickthrough data, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2015, pp. 4310-4314
    • (2015) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 4310-4314
    • Zhao, Y.1    Li, J.2    Xue, J.3    Gong, Y.4
  • 46
    • 84881054791 scopus 로고    scopus 로고
    • Hermitian polynomial for speaker adaptation of connectionist speech recognition systems
    • Oct
    • S. Siniscalchi, J. Li, and C. Lee, "Hermitian polynomial for speaker adaptation of connectionist speech recognition systems, " IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2152-2161, Oct. 2013
    • (2013) IEEE Trans. Audio, Speech, Lang. Process , vol.21 , Issue.10 , pp. 2152-2161
    • Siniscalchi, S.1    Li, J.2    Lee, C.3
  • 49
    • 84959095902 scopus 로고    scopus 로고
    • Structured output layer with auxiliary targets for context-dependent acoustic modelling
    • P. Swietojanski, P. Bell, and S. Renals, "Structured output layer with auxiliary targets for context-dependent acoustic modelling, " in Proc.Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 3605-3609
    • (2015) Proc.Annu. Conf. Int. Speech Commun. Assoc , pp. 3605-3609
    • Swietojanski, P.1    Bell, P.2    Renals, S.3
  • 50
    • 84938690750 scopus 로고    scopus 로고
    • Speaker adaptation of deep neural networks using a hierarchy of output layers
    • R. Price, K. Iso, and K. Shinoda, "Speaker adaptation of deep neural networks using a hierarchy of output layers, " in Proc. IEEE Spoken Language Technol. Workshop, 2014, pp. 153-158
    • (2014) Proc IEEE Spoken Language Technol. Workshop , pp. 153-158
    • Price, R.1    Iso, K.2    Shinoda, K.3
  • 51
    • 0024880831 scopus 로고
    • Multilayer feedforward networks are universal approximators
    • K. Hornik, M. Stinchcombe, and H. White, "Multilayer feedforward networks are universal approximators, " Neural Netw., vol. 2, no. 5, pp. 359-366, 1989
    • (1989) Neural Netw , vol.2 , Issue.5 , pp. 359-366
    • Hornik, K.1    Stinchcombe, M.2    White, H.3
  • 52
    • 0025751820 scopus 로고
    • Approximation capabilities of multilayer feedforward networks
    • K. Hornik, "Approximation capabilities of multilayer feedforward networks, " Neural Netw., vol. 4, no. 2, pp. 251-257, 1991
    • (1991) Neural Netw , vol.4 , Issue.2 , pp. 251-257
    • Hornik, K.1
  • 53
    • 0027599793 scopus 로고
    • Universal approximation bounds for superpositions of a sigmoidal function
    • May
    • A. Barron, "Universal approximation bounds for superpositions of a sigmoidal function, " IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 930-945, May 1993
    • (1993) IEEE Trans. Inf. Theory , vol.39 , Issue.3 , pp. 930-945
    • Barron, A.1
  • 54
    • 84959174678 scopus 로고    scopus 로고
    • Parameterised sigmoid and ReLU hidden activation functions for DNN acoustic modelling
    • C. Zhang and P. Woodland, "Parameterised sigmoid and ReLU hidden activation functions for DNN acoustic modelling, " in Proc. 16th Annu. Conf. Int. Speech Commun. Assoc., 2015, pp. 3224-3228
    • (2015) Proc. 16th Annu. Conf. Int. Speech Commun. Assoc , pp. 3224-3228
    • Zhang, C.1    Woodland, P.2
  • 55
    • 0035024581 scopus 로고    scopus 로고
    • Networks with trainable amplitude of activation functions
    • E. Trentin, "Networks with trainable amplitude of activation functions, " Neural Netw., vol. 14, pp. 471-493, 2001
    • (2001) Neural Netw , vol.14 , pp. 471-493
    • Trentin, E.1
  • 56
    • 84946054484 scopus 로고    scopus 로고
    • Multi-basis adaptive neural network for rapid adaptation in speech recognition
    • C. Wu and M. Gales, "Multi-basis adaptive neural network for rapid adaptation in speech recognition, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2015, pp. 4315-4319
    • (2015) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 4315-4319
    • Wu, C.1    Gales, M.2
  • 64
  • 67
    • 84901999583 scopus 로고    scopus 로고
    • Convolutional neural networks for distant speech recognition
    • Sep
    • P. Swietojanski, A. Ghoshal, and S. Renals, "Convolutional neural networks for distant speech recognition, " IEEE Signal Process. Lett., vol. 21, no. 9, pp. 1120-1124, Sep. 2014
    • (2014) IEEE Signal Process. Lett , vol.21 , Issue.9 , pp. 1120-1124
    • Swietojanski, P.1    Ghoshal, A.2    Renals, S.3
  • 71
    • 0032923221 scopus 로고    scopus 로고
    • Catastrophic forgetting in connectionist networks: Causes, consequences and solutions
    • Online]. Available:
    • R. French, "Catastrophic forgetting in connectionist networks: Causes, consequences and solutions, " Trends Cognitive Sci., vol. 3, pp. 128-135, 1999. [Online]. Available: http://citeseerx.ist.psu. edu/viewdoc/summary?.doi=10.1.1.36.3676
    • (1999) Trends Cognitive Sci , vol.3 , pp. 128-135
    • French, R.1
  • 72
    • 84871614543 scopus 로고    scopus 로고
    • A novel loss function for the overall risk criterion based discriminative training of HMMmodels
    • J. Kaiser, B. Horvat, and Z. Kacic, "A novel loss function for the overall risk criterion based discriminative training of HMMmodels, " in Proc. 6th Int. Conf. Spoken Language Process., 2000, pp. 887-890
    • (2000) Proc. 6th Int. Conf. Spoken Language Process , pp. 887-890
    • Kaiser, J.1    Horvat, B.2    Kacic, Z.3
  • 73
    • 70349213445 scopus 로고    scopus 로고
    • Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
    • B. Kingsbury, "Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2009, pp. 3761-3764
    • (2009) Proc IEEE Int. Conf. Acoust., Speech Signal Process , pp. 3761-3764
    • Kingsbury, B.1
  • 76
    • 84973352080 scopus 로고    scopus 로고
    • On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalization in dnn acoustic models
    • L. Samarakoon andK.C. Sim, "On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalization in dnn acoustic models, " in Proc. IEEE Int. Conf. Acoust., Speech Signal Process., 2016, pp. 5275-5279
    • (2016) Proc IEEE Int. Conf. Acoust., Speech Signal Process. , pp. 5275-5279
    • Samarakoonk, L.1    Sim, C.2
  • 78
    • 84910084579 scopus 로고    scopus 로고
    • 2000 NIST evaluation of conversational speech recognition over the telephone: English and Mandarin performance results
    • J. Fiscus, W. Fisher, A. Martin, M. Przybocki, and D. Pallett, "2000 NIST evaluation of conversational speech recognition over the telephone: English and Mandarin performance results, " in Proc. Speech Transcription Workshop, 2000
    • (2000) Proc. Speech Transcription Workshop
    • Fiscus, J.1    Fisher, W.2    Martin, A.3    Przybocki, M.4    Pallett, D.5
  • 80
  • 83
    • 84904163933 scopus 로고    scopus 로고
    • Dropout: 2014 A simple way to prevent neural networks from overfitting
    • [Online]. Available:
    • N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, "Dropout: 2014 A simple way to prevent neural networks from overfitting, " J. Mach. Learn. Res, 15, pp. 1929-1958, . [Online]. Available: http://jmlr.org/papers/v15/srivastava14a.html
    • J. Mach. Learn. Res , vol.15 , pp. 1929-1958
    • Srivastava, N.1    Hinton, G.2    Krizhevsky, A.3    Sutskever, I.4    Salakhutdinov, R.5
  • 84
    • 57249084011 scopus 로고    scopus 로고
    • Visualizing high-dimensional data using t-sne
    • Nov
    • L. Van der Maaten and G. Hinton, "Visualizing high-dimensional data using t-sne, " J. Mach. Learn. Res., vol. 9, pp. 2579-2605, Nov. 2008
    • (2008) J. Mach. Learn. Res , vol.9 , pp. 2579-2605
    • Van der Maaten, L.1    Hinton, G.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.