메뉴 건너뛰기




Volumn 218, Issue , 2016, Pages 448-459

A unified approach to transfer learning of deep neural networks with applications to speaker adaptation in automatic speech recognition

Author keywords

Deep neural network; Multi task learning; Speaker adaptation; Transfer learning

Indexed keywords

DEEP LEARNING; DEEP NEURAL NETWORKS; LEARNING SYSTEMS; METADATA; MULTI-TASK LEARNING; NEURAL NETWORKS; SPEECH; TRANSFER LEARNING;

EID: 84994096935     PISSN: 09252312     EISSN: 18728286     Source Type: Journal    
DOI: 10.1016/j.neucom.2016.09.018     Document Type: Article
Times cited : (63)

References (59)
  • 2
    • 0032140546 scopus 로고    scopus 로고
    • On stochastic feature and model compensation approaches to robust speech recognition
    • [2] Lee, C.-H., On stochastic feature and model compensation approaches to robust speech recognition. Speech Commun. 25:1–3 (1998), 29–47.
    • (1998) Speech Commun. , vol.25 , Issue.1-3 , pp. 29-47
    • Lee, C.-H.1
  • 3
    • 0035426931 scopus 로고    scopus 로고
    • Language independent and language adaptive acoustic modeling for speech recognition
    • [3] Schultz, T., Waibel, A., Language independent and language adaptive acoustic modeling for speech recognition. Speech Commun. 35 (2001), 31–51.
    • (2001) Speech Commun. , vol.35 , pp. 31-51
    • Schultz, T.1    Waibel, A.2
  • 4
    • 84862931515 scopus 로고    scopus 로고
    • Experiments on cross-language attribute detection and phone recognition with minimal target specific training data
    • [4] Siniscalchi, S.M., Lyu, D.-C., Svendsen, T., Lee, C.-H., Experiments on cross-language attribute detection and phone recognition with minimal target specific training data. IEEE Trans. Audio Speech Lang. Process. 20:3 (2012), 875–887.
    • (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , Issue.3 , pp. 875-887
    • Siniscalchi, S.M.1    Lyu, D.-C.2    Svendsen, T.3    Lee, C.-H.4
  • 6
    • 0027683813 scopus 로고
    • Shared-distribution hidden markov models for speech recognition
    • [6] Hwang, M.-Y.M.-Y., Huang, X., Shared-distribution hidden markov models for speech recognition. IEEE Trans. Speech Audio Process. 1:4 (1993), 414–420.
    • (1993) IEEE Trans. Speech Audio Process. , vol.1 , Issue.4 , pp. 414-420
    • Hwang, M.-Y.M.-Y.1    Huang, X.2
  • 7
    • 84858972572 scopus 로고    scopus 로고
    • Making deep belief networks effective for large vocabulary continuous speech recognition
    • [7] T. N. Sainath, B. Kingsbury, B. Ramabhadran, P. Fousek, P. Novak, A. Mohamed, Making deep belief networks effective for large vocabulary continuous speech recognition, in: Proc. ASRU, 2011, pp. 30–35.
    • (2011) Proc. ASRU , pp. 30-35
    • Sainath, T.N.1    Kingsbury, B.2    Ramabhadran, B.3    Fousek, P.4    Novak, P.5    Mohamed, A.6
  • 8
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • [8] Dahl, G.E., Yu, D., Deng, L., Acero, A., Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Proc. 20:1 (2012), 30–42.
    • (2012) IEEE Trans. Audio, Speech Lang. Proc. , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 9
    • 84906274730 scopus 로고    scopus 로고
    • Sequence-discriminative training of deep neural networks
    • [9] K. Vesely`, A. Ghoshal, L. Burget, D. Povey, Sequence-discriminative training of deep neural networks, in: Proc. Interspeech, 2013, pp. 2345–2349.
    • (2013) Proc. Interspeech , pp. 2345-2349
    • K.1    Vesely2    Ghoshal, A.4    Burget, L.5    Povey, D.6
  • 10
    • 0032923221 scopus 로고    scopus 로고
    • Catastrophic forgetting in connectionist networks: causes, consequences and solutions, Trends in Cognitive Sciences, vol. 3 (4).
    • [10] M. Franch, Catastrophic forgetting in connectionist networks: causes, consequences and solutions, Trends in Cognitive Sciences, vol. 3 (4).
    • Franch, M.1
  • 11
    • 84890542079 scopus 로고    scopus 로고
    • KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
    • [11] D. Yu, K. Yao, H. Su, G. Li, F. Seide, KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition, in: Proc. ICASSP, 2013, pp. 7893–7897.
    • (2013) Proc. ICASSP , pp. 7893-7897
    • Yu, D.1    Yao, K.2    Su, H.3    Li, G.4    Seide, F.5
  • 12
    • 34548012893 scopus 로고    scopus 로고
    • Linear hidden transformations for adaptation of hybrid ANN/HMM models
    • [12] Gemello, R., Mana, F., Scanzio, S., Laface, P., De Mori, R., Linear hidden transformations for adaptation of hybrid ANN/HMM models. Speech Commun. 49:10–11 (2007), 827–835.
    • (2007) Speech Commun. , vol.49 , Issue.10-11 , pp. 827-835
    • Gemello, R.1    Mana, F.2    Scanzio, S.3    Laface, P.4    De Mori, R.5
  • 13
    • 84876672166 scopus 로고    scopus 로고
    • Machine learning paradigms for speech recognition: an overview
    • [13] Deng, L., Li, X., Machine learning paradigms for speech recognition: an overview. IEEE Trans. Audio, Speech Lang. Process. 21 (2013), 1060–1089.
    • (2013) IEEE Trans. Audio, Speech Lang. Process. , vol.21 , pp. 1060-1089
    • Deng, L.1    Li, X.2
  • 15
    • 77956031473 scopus 로고    scopus 로고
    • A survey on transfer learning
    • [15] Pan, S.J., Yang, Q., A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22 (2010), 1245–1359.
    • (2010) IEEE Trans. Knowl. Data Eng. , vol.22 , pp. 1245-1359
    • Pan, S.J.1    Yang, Q.2
  • 16
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains
    • [16] Gauvain, J., Lee, C., Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2:2 (1994), 291–298.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.1    Lee, C.2
  • 17
    • 85121045899 scopus 로고
    • Multitask learning: A knowledge-based source of inductive bias
    • [17] R. Caruna, Multitask learning: A knowledge-based source of inductive bias, in: Proc. ICML, 1993, pp. 41–48.
    • (1993) Proc. ICML , pp. 41-48
    • Caruna, R.1
  • 18
    • 85009167968 scopus 로고    scopus 로고
    • Multitask learning in connectionist robust asr using recurrent neural networks., in: Proc. INTERSPEECH, 2003.
    • [18] S. Parveen, P. Green, Multitask learning in connectionist robust asr using recurrent neural networks., in: Proc. INTERSPEECH, 2003.
    • Parveen, S.1    Green, P.2
  • 20
    • 84890545600 scopus 로고    scopus 로고
    • Multi-task learning in deep neural networks for improved phoneme recognition
    • [20] M. Seltzer, J. Droppo, Multi-task learning in deep neural networks for improved phoneme recognition, in: Proc. ICASSP, 2013, pp. 6965–6969.
    • (2013) Proc. ICASSP , pp. 6965-6969
    • Seltzer, M.1    Droppo, J.2
  • 21
    • 85043800698 scopus 로고    scopus 로고
    • Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement, 2015, submitted to INTERSPEECH.
    • [21] Y. Xu, J. Du, Z. Huang, L.-R. Dai, C.-H. Lee, Multi-objective learning and mask-based post-processing for deep neural network based speech enhancement, 2015, submitted to INTERSPEECH.
    • Xu, Y.1    Du, J.2    Huang, Z.3    Dai, L.-R.4    Lee, C.-H.5
  • 24
    • 79959849500 scopus 로고    scopus 로고
    • Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
    • [24] B. Li, K.C. Sim, Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems, in: Proc. INTERSPEECH, 2010, pp. 526–529.
    • (2010) Proc. INTERSPEECH , pp. 526-529
    • Li, B.1    Sim, K.C.2
  • 25
    • 84921817164 scopus 로고
    • Learning representations by back-propagating errors
    • [25] Rumelhart, D.E., Hinton, G.E., Williams, R.J., Learning representations by back-propagating errors. Cogn. Model., 5(3), 1988, 1.
    • (1988) Cogn. Model. , vol.5 , Issue.3 , pp. 1
    • Rumelhart, D.E.1    Hinton, G.E.2    Williams, R.J.3
  • 28
    • 0003413187 scopus 로고
    • Neural Networks: A Comprehensive Foundation
    • Macmillan
    • [28] Haykin, S., Neural Networks: A Comprehensive Foundation. 1994, Macmillan.
    • (1994)
    • Haykin, S.1
  • 30
    • 33745805403 scopus 로고    scopus 로고
    • A fast learning algorithm for deep belief nets
    • [30] Hinton, G.E., Osindero, S., Teh, Y., A fast learning algorithm for deep belief nets. Neural Comput. 18:7 (2006), 1527–1554.
    • (2006) Neural Comput. , vol.18 , Issue.7 , pp. 1527-1554
    • Hinton, G.E.1    Osindero, S.2    Teh, Y.3
  • 31
    • 84905239342 scopus 로고    scopus 로고
    • Improving deep neural network acoustic models using generalized maxout networks
    • [31] X. Zhang, J. Trmal, D. Povey, S. Khudanpur, Improving deep neural network acoustic models using generalized maxout networks, in: Proc. ICASSP 2014, 2006, pp. 215–219.
    • (2006) Proc. ICASSP , vol.2014 , pp. 215-219
    • Zhang, X.1    Trmal, J.2    Povey, D.3    Khudanpur, S.4
  • 32
    • 84865801985 scopus 로고    scopus 로고
    • Conversational speech transcription using context-dependent deep neural networks
    • [32] F. Seide, G. Li, D. Yu, Conversational speech transcription using context-dependent deep neural networks, in: Proc. Interspeech, Florence, Italy, 2011, pp. 437–440.
    • (2011) Proc. Interspeech, Florence, Italy , pp. 437-440
    • Seide, F.1    Li, G.2    Yu, D.3
  • 33
    • 84921731072 scopus 로고    scopus 로고
    • Fast adaptation of deep neural network based on discriminant codes for speech recognition
    • [33] Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L., Liu, Q., Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Proc. 22:12 (2014), 1713–1725.
    • (2014) IEEE/ACM Trans. Audio Speech Lang. Proc. , vol.22 , Issue.12 , pp. 1713-1725
    • Xue, S.1    Abdel-Hamid, O.2    Jiang, H.3    Dai, L.4    Liu, Q.5
  • 34
    • 84912109599 scopus 로고    scopus 로고
    • Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition
    • [34] S. Xue, H. Jiang, L. Dai, Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition, in: Proc. ISCSLP, 2014.
    • (2014) Proc. ISCSLP
    • Xue, S.1    Jiang, H.2    Dai, L.3
  • 35
    • 80051654263 scopus 로고    scopus 로고
    • Deep belief networks using discriminative features for phone recognition
    • Proc. ICASSP, 2011, p. 5060–5063.
    • [35] A. Mohamed, T. Sainath, G. Dahl, B. Ramabhadran, G. Hinton, M. Picheny, Deep belief networks using discriminative features for phone recognition, in: Proc. ICASSP, 2011, p. 5060–5063.
    • Mohamed, A.1    Sainath, T.2    Dahl, G.3    Ramabhadran, B.4    Hinton, G.5    Picheny, M.6
  • 37
    • 84893691530 scopus 로고    scopus 로고
    • Speaker adaptation of neural network acoustic models using i-vectors
    • [37] G. Saon, H. Soltau, D. Nahamoo, M. Picheny, Speaker adaptation of neural network acoustic models using i-vectors, in: Proc. ASRU, 2013, pp. 55–59.
    • (2013) Proc. ASRU , pp. 55-59
    • Saon, G.1    Soltau, H.2    Nahamoo, D.3    Picheny, M.4
  • 38
    • 0000159105 scopus 로고    scopus 로고
    • On adaptive decision rules and decision parameter adaptation for automatic speech recognition
    • [38] Lee, C.-H., Huo, Q., On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc. IEEE 88:8 (2000), 1241–1269.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1241-1269
    • Lee, C.-H.1    Huo, Q.2
  • 39
    • 0004119259 scopus 로고
    • The Sound Pattern of English
    • Harper & Row
    • [39] Chomsky, N., Halle, M., The Sound Pattern of English. 1968, Harper & Row.
    • (1968)
    • Chomsky, N.1    Halle, M.2
  • 40
    • 84910035297 scopus 로고    scopus 로고
    • Learning small-size dnn with output-distribution-based criteria
    • [40] J. Li, R. Zhao, J.-T. Huang, Y. Gong, Learning small-size dnn with output-distribution-based criteria, in: Proc. Interspeech, 2014.
    • (2014) Proc. Interspeech
    • Li, J.1    Zhao, R.2    Huang, J.-T.3    Gong, Y.4
  • 42
    • 0035279111 scopus 로고    scopus 로고
    • A structural Bayes approach to speaker adaptation
    • [42] Shinoda, K., Lee, C.-H., A structural Bayes approach to speaker adaptation. IEEE Trans. Speech Audio Process. 9:3 (2001), 276–287.
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.3 , pp. 276-287
    • Shinoda, K.1    Lee, C.-H.2
  • 43
    • 0025629882 scopus 로고
    • Tied mixture continuous parameter modeling for speech recognition
    • [43] Bellegarda, J.R., Nahamoo, D., Tied mixture continuous parameter modeling for speech recognition. IEEE Trans. Acoust., Speech Signal Process. 38:12 (1990), 2033–2045.
    • (1990) IEEE Trans. Acoust., Speech Signal Process. , vol.38 , Issue.12 , pp. 2033-2045
    • Bellegarda, J.R.1    Nahamoo, D.2
  • 44
    • 0000250399 scopus 로고
    • Semi-continuous hidden markov models for speech signal
    • [44] Huang, X., Jack, M.A., Semi-continuous hidden markov models for speech signal. Comput. Speech Lang. 3:3 (1989), 239–251.
    • (1989) Comput. Speech Lang. , vol.3 , Issue.3 , pp. 239-251
    • Huang, X.1    Jack, M.A.2
  • 45
    • 84912122097 scopus 로고    scopus 로고
    • Decision tree based state tying for speech recognition using DNN derived embeddings
    • [45] X. Li, X. Wu, Decision tree based state tying for speech recognition using DNN derived embeddings, in: Proc. ISCSLP, 2014, pp. 123–127.
    • (2014) Proc. ISCSLP , pp. 123-127
    • Li, X.1    Wu, X.2
  • 46
    • 84976220626 scopus 로고    scopus 로고
    • Discriminative transfer learning with tree-based priors
    • [46] N. Srivastava, R. Salakhutdinov, Discriminative transfer learning with tree-based priors, in: Proc. NIST, 2013.
    • (2013) Proc. NIST
    • Srivastava, N.1    Salakhutdinov, R.2
  • 47
    • 64849090489 scopus 로고    scopus 로고
    • Conditional random fields for integrating local discriminative classifiers
    • [47] Morris, J., Fosler-Lussier, E., Conditional random fields for integrating local discriminative classifiers. IEEE Trans. Audio Speech Lang. Process. 16:3 (2008), 617–628.
    • (2008) IEEE Trans. Audio Speech Lang. Process. , vol.16 , Issue.3 , pp. 617-628
    • Morris, J.1    Fosler-Lussier, E.2
  • 50
    • 0001596920 scopus 로고    scopus 로고
    • Large vocabulary continuous speech recognition: advances and applications
    • [50] Gauvain, J.-L., Lamel, L., Large vocabulary continuous speech recognition: advances and applications. Proc. IEEE 88:8 (2000), 1181–1200.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1181-1200
    • Gauvain, J.-L.1    Lamel, L.2
  • 52
    • 84906227589 scopus 로고    scopus 로고
    • Restructuring of deep neural network acoustic models with singular value decomposition
    • [52] J. Xue, J. Li, Y. Gong, Restructuring of deep neural network acoustic models with singular value decomposition, in: Proc. Interspeech 2014, 2013, pp. 2365–2369.
    • (2013) Proc. Interspeech , vol.2014 , pp. 2365-2369
    • Xue, J.1    Li, J.2    Gong, Y.3
  • 53
    • 0029375590 scopus 로고
    • Speaker adaptation using constrained estimation of gaussian mixtures
    • [53] Digalakis, V.V., Rtischev, D., Neumeye, L.G., Speaker adaptation using constrained estimation of gaussian mixtures. IEEE Trans. Speech Audio Process. 3:4 (1995), 357–366.
    • (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.4 , pp. 357-366
    • Digalakis, V.V.1    Rtischev, D.2    Neumeye, L.G.3
  • 54
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • [54] Gales, M.J.F., Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12 (1998), 75–98.
    • (1998) Comput. Speech Lang. , vol.12 , pp. 75-98
    • Gales, M.J.F.1
  • 56
    • 85043823916 scopus 로고    scopus 로고
    • Switchboard-1 release 2, Linguistic Data Consortium, Philadelphia.
    • [56] J. J. Godfrey, E. Holliman, Switchboard-1 release 2, Linguistic Data Consortium, Philadelphia.
    • Godfrey, J.J.1    Holliman, E.2
  • 59
    • 84890483489 scopus 로고    scopus 로고
    • Initialization schemes for multilayer perceptron training and their impact on ASR performance using multilingual data
    • [59] N.T. Vu, W. Breiter, F. Metze, T. Schultz, Initialization schemes for multilayer perceptron training and their impact on ASR performance using multilingual data, in: Proc. Interspeech, Portland, OR, USA, 2012.
    • (2012) Proc. Interspeech, Portland, OR, USA
    • Vu, N.T.1    Breiter, W.2    Metze, F.3    Schultz, T.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.