메뉴 건너뛰기




Volumn 25, Issue 1, 2017, Pages 60-71

Bayesian unsupervised batch and online speaker adaptation of activation function parameters in deep models for automatic speech recognition

Author keywords

Automatic speech recognition; Bayesian learning; deep neural networks; online adaptation; prior evolution; transfer learning; unsupervised speaker adaptation

Indexed keywords

BAYESIAN NETWORKS; BENCHMARKING; CHEMICAL ACTIVATION; COMPUTATIONAL EFFICIENCY; DIGITAL STORAGE; HIDDEN MARKOV MODELS; LEARNING ALGORITHMS; MARKOV PROCESSES; MAXIMUM LIKELIHOOD; MAXIMUM LIKELIHOOD ESTIMATION;

EID: 85002900398     PISSN: 23299290     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASLP.2016.2621669     Document Type: Article
Times cited : (16)

References (60)
  • 1
    • 85032751458 scopus 로고    scopus 로고
    • Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups
    • Nov
    • G. Hinton et al., "Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups," IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82-97, Nov. 2012.
    • (2012) IEEE Signal Process. Mag. , vol.29 , Issue.6 , pp. 82-97
    • Hinton, G.1
  • 3
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech recognition
    • Feb
    • L. Rabiner, "A tutorial on hidden Markov models and selected applications in speech recognition," Proc. IEEE, vol. 77, no. 2, pp. 257-286, Feb. 1989.
    • (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.1
  • 4
    • 0032140546 scopus 로고    scopus 로고
    • On stochastic feature and model compensation approaches to robust speech recognition
    • C.-H. Lee, "On stochastic feature and model compensation approaches to robust speech recognition," Speech Commun., vol. 25, no. 1-3, pp. 29-47, 1998.
    • (1998) Speech Commun. , vol.25 , Issue.1-3 , pp. 29-47
    • Lee, C.-H.1
  • 5
    • 84862931515 scopus 로고    scopus 로고
    • Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data
    • Mar
    • S. M. Siniscalchi, D.-C. Lyu, T. Svendsen, and C.-H. Lee, "Experiments on cross-language attribute detection and phone recognition with minimal target-specific training data," IEEE Trans. Audio, Speech, Lang. Process., vol. 20, no. 3, pp. 875-887, Mar. 2012.
    • (2012) IEEE Trans. Audio, Speech, Lang. Process. , vol.20 , Issue.3 , pp. 875-887
    • Siniscalchi, S.M.1    Lyu, D.-C.2    Svendsen, T.3    Lee, C.-H.4
  • 6
    • 84890542079 scopus 로고    scopus 로고
    • KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
    • D. Yu, K. Yao, H. Su, G. Li, and F. Seide, "KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 7893-7897.
    • (2013) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 7893-7897
    • Yu, D.1    Yao, K.2    Su, H.3    Li, G.4    Seide, F.5
  • 7
    • 84890452886 scopus 로고    scopus 로고
    • Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
    • O. Abdel-Hamid and H. Jiang, "Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 7942-7946.
    • (2013) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 7942-7946
    • Abdel-Hamid, O.1    Jiang, H.2
  • 8
    • 84938690750 scopus 로고    scopus 로고
    • Speaker adaptation of deep neural networks using a hierarchy of output layers
    • R. Price, K.-I. Iso, and K. Shinoda, "Speaker adaptation of deep neural networks using a hierarchy of output layers," in Proc. Spoken Lang. Technol. Workshop, 2014, pp. 153-158.
    • (2014) Proc. Spoken Lang. Technol. Workshop , pp. 153-158
    • Price, R.1    Iso, K.-I.2    Shinoda, K.3
  • 10
    • 84976435936 scopus 로고    scopus 로고
    • Learning hidden unit contributions for unsupervised acoustic model adaptation
    • Aug
    • P. Swietojanski, J. Li, and S. Renals, "Learning hidden unit contributions for unsupervised acoustic model adaptation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 8, pp. 1450-1463, Aug. 2016.
    • (2016) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.8 , pp. 1450-1463
    • Swietojanski, P.1    Li, J.2    Renals, S.3
  • 11
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineering in contextdependent deep neural networks for conversational speech transcription
    • F. Seide, G. Li, X. Chen, and D. Yu, "Feature engineering in contextdependent deep neural networks for conversational speech transcription," in Proc. IEEE Workshop Automat. Speech Recogn. Understanding, 2011, pp. 24-29.
    • (2011) Proc. IEEE Workshop Automat. Speech Recogn. Understanding , pp. 24-29
    • Seide, F.1    Li, G.2    Chen, X.3    Yu, D.4
  • 13
    • 84881054791 scopus 로고    scopus 로고
    • Hermitian polynomial for speaker adaptation of connectionist speech recognition systems
    • Oct
    • S. M. Siniscalchi, J. Li, and C.-H. Lee, "Hermitian polynomial for speaker adaptation of connectionist speech recognition systems," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 10, pp. 2152-2161, Oct. 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.10 , pp. 2152-2161
    • Siniscalchi, S.M.1    Li, J.2    Lee, C.-H.3
  • 15
    • 84946054484 scopus 로고    scopus 로고
    • Multi-basis adaptive neural network for rapid adaptation in speech recognition
    • C. Wu and M. Gales, "Multi-basis adaptive neural network for rapid adaptation in speech recognition," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4315-4319.
    • (2015) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 4315-4319
    • Wu, C.1    Gales, M.2
  • 16
    • 84929376602 scopus 로고    scopus 로고
    • Bounded conditional mean imputation with observation uncertainties and acoustic model adaptation
    • U. Remes, A. R. López, and D. Palomäki, "Bounded conditional mean imputation with observation uncertainties and acoustic model adaptation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 23, no. 7, pp. 1198-1208, 2015.
    • (2015) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.23 , Issue.7 , pp. 1198-1208
    • Remes, U.1    López, A.R.2    Palomäki, D.3
  • 17
    • 84991404490 scopus 로고    scopus 로고
    • Factorized hidden layer adaptation for deep neural network based acoustic modeling
    • Dec
    • L. Samarakoon and K. C. Sim, "Factorized hidden layer adaptation for deep neural network based acoustic modeling," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 12, pp. 2241-2250, Dec. 2016.
    • (2016) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.12 , pp. 2241-2250
    • Samarakoon, L.1    Sim, K.C.2
  • 18
    • 84986193646 scopus 로고    scopus 로고
    • Differentiable pooling for unsupervised acoustic model adaptation
    • Oct
    • P. Swietojanski and S. Renals, "Differentiable pooling for unsupervised acoustic model adaptation," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 24, no. 10, pp. 1773-1784, Oct. 2016.
    • (2016) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.24 , Issue.10 , pp. 1773-1784
    • Swietojanski, P.1    Renals, S.2
  • 19
    • 0027683813 scopus 로고
    • Shared-distribution hidden Markov models for speech recognition
    • Oct
    • M.-Y. Hwang and X. Huang, "Shared-distribution hidden Markov models for speech recognition," IEEE Trans. Speech Audio Process., vol. 1, no. 4, pp. 414-420, Oct. 1993.
    • (1993) IEEE Trans. Speech Audio Process. , vol.1 , Issue.4 , pp. 414-420
    • Hwang, M.-Y.1    Huang, X.2
  • 20
    • 0032923221 scopus 로고
    • Catastrophic forgetting in connectionist networks: Causes, consequences and solutions
    • R. M. French, "Catastrophic forgetting in connectionist networks: Causes, consequences and solutions," Trends Cogn. Sci., vol. 3, 1994, pp. 128-135.
    • (1994) Trends Cogn. Sci. , vol.3 , pp. 128-135
    • French, R.M.1
  • 21
    • 34548012893 scopus 로고    scopus 로고
    • Linear hidden transformations for adaptation of hybrid ANN/HMM models
    • R. Gemello, F. Mana, S. Scanzio, P. Laface, and R. D. Mori, "Linear hidden transformations for adaptation of hybrid ANN/HMM models," Speech Commun., vol. 49, no. 10, pp. 827-835, 2007.
    • (2007) Speech Commun. , vol.49 , Issue.10 , pp. 827-835
    • Gemello, R.1    Mana, F.2    Scanzio, S.3    Laface, P.4    Mori, R.D.5
  • 22
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • Apr
    • J.-L. Gauvain and C.-H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 291-298, Apr. 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.-L.1    Lee, C.-H.2
  • 23
    • 0035279111 scopus 로고    scopus 로고
    • A structural Bayes approach to speaker adaptation
    • Mar
    • K. Shinoda and C.-H. Lee, "A structural Bayes approach to speaker adaptation," IEEE Trans. Speech Audio Process., vol. 9, no. 3, pp. 276-287, Mar. 2001.
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.3 , pp. 276-287
    • Shinoda, K.1    Lee, C.-H.2
  • 25
    • 0000159105 scopus 로고    scopus 로고
    • On adaptive decision rules and decision parameter adaptation for automatic speech recognition
    • Aug
    • C.-H. Lee and Q. Huo, "On adaptive decision rules and decision parameter adaptation for automatic speech recognition," Proc. IEEE, vol. 88, no. 8, pp. 1241-1269, Aug. 2000.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1241-1269
    • Lee, C.-H.1    Huo, Q.2
  • 26
    • 84876672166 scopus 로고    scopus 로고
    • Machine learning paradigms for speech recognition: An overview
    • May
    • L. Deng and X. Li, "Machine learning paradigms for speech recognition: An overview," IEEE Trans. Audio, Speech, Lang. Process., vol. 21, no. 5, pp. 1060-1089, May 2013.
    • (2013) IEEE Trans. Audio, Speech, Lang. Process. , vol.21 , Issue.5 , pp. 1060-1089
    • Deng, L.1    Li, X.2
  • 29
    • 70349213445 scopus 로고    scopus 로고
    • Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling
    • B. Kingsbury, "Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2009, pp. 3761-3764.
    • (2009) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 3761-3764
    • Kingsbury, B.1
  • 31
    • 84994372786 scopus 로고    scopus 로고
    • Compact feedforward sequential memory networks for large vocabulary continuous speech recognition
    • Sep
    • S. Zhang, H. Jiang, S. Xiong, S. Wei and L. Dai, "Compact Feedforward Sequential Memory Networks for Large Vocabulary Continuous Speech Recognition," in Proc. Interspeech, Sep. 2016, pp. 3389-3393.
    • (2016) Proc. Interspeech , pp. 3389-3393
    • Zhang, S.1    Jiang, H.2    Xiong, S.3    Wei, S.4    Dai, L.5
  • 34
    • 84910046405 scopus 로고    scopus 로고
    • Long short-term memory recurrent neural network architectures for large scale acoustic modeling
    • H. Sak, A. Senior, and F. Beaufays, "Long short-term memory recurrent neural network architectures for large scale acoustic modeling," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2014, pp. 338-342.
    • (2014) Proc. Annu. Conf. Int. Speech Commun. Assoc , pp. 338-342
    • Sak, H.1    Senior, A.2    Beaufays, F.3
  • 36
    • 0029375590 scopus 로고
    • Speaker adaptation using constrained estimation of Gaussian mixtures
    • Sep.
    • V. V. Digalakis, D. Rtischev, and L. G. Neumeye, "Speaker adaptation using constrained estimation of Gaussian mixtures," IEEE Trans. Speech Audio Process., vol. 3, no 4, 357-366, Sep. 1995.
    • (1995) IEEE Trans. Speech Audio Process. , vol.3 , Issue.4 , pp. 357-366
    • Digalakis, V.V.1    Rtischev, D.2    Neumeye, L.G.3
  • 37
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMMbased speech recognition
    • M. J. F. Gales, "Maximum likelihood linear transformations for HMMbased speech recognition," Comput., Speech, Lang., vol. 12, pp. 75-98, 1998.
    • (1998) Comput., Speech, Lang. , vol.12 , pp. 75-98
    • Gales, M.J.F.1
  • 39
    • 84938688160 scopus 로고    scopus 로고
    • Speaker adaptive training of deep neural network acoustic models using i-vectors
    • Nov
    • Y. Miao, H. Zhang, and F. Metze, "Speaker adaptive training of deep neural network acoustic models using i-vectors," IEEE/ACMTrans. Audio, Speech, Lang. Process., vol. 23, no. 11, pp. 1938-1949, Nov. 2015.
    • (2015) IEEE/ACMTrans. Audio, Speech, Lang. Process. , vol.23 , Issue.11 , pp. 1938-1949
    • Miao, Y.1    Zhang, H.2    Metze, F.3
  • 40
    • 84921731072 scopus 로고    scopus 로고
    • Fast adaptation of deep neural network based on discriminant codes for speech recognition
    • Dec
    • S. Xue, O. Abdel-Hamid, H. Jiang, L. Dai, and Q. Liu, "Fast adaptation of deep neural network based on discriminant codes for speech recognition," IEEE/ACM Trans. Audio, Speech, Lang. Process., vol. 22, no. 12, pp. 1713-1725, Dec. 2014.
    • (2014) IEEE/ACM Trans. Audio, Speech, Lang. Process. , vol.22 , Issue.12 , pp. 1713-1725
    • Xue, S.1    Abdel-Hamid, O.2    Jiang, H.3    Dai, L.4    Liu, Q.5
  • 42
    • 84937854847 scopus 로고
    • Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system
    • J. Neto et al., "Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system," in Proc. Eur. Conf. SpeechCommun. Technol., 1995, pp. 2171-2174.
    • (1995) Proc. Eur. Conf. SpeechCommun. Technol , pp. 2171-2174
    • Neto, J.1
  • 43
    • 84874226579 scopus 로고    scopus 로고
    • Adaptation of context-dependent deep neural networks for automatic speech recognition
    • K. Yao, D. Yu, F. Seide, H. Su, L. Deng, and Y. Gong, "Adaptation of context-dependent deep neural networks for automatic speech recognition," in Proc. Spoken Lang. Technol. Workshop, 2012, pp. 366-369.
    • (2012) Proc. Spoken Lang. Technol. Workshop , pp. 366-369
    • Yao, K.1    Yu, D.2    Seide, F.3    Su, H.4    Deng, L.5    Gong, Y.6
  • 44
    • 79959849500 scopus 로고    scopus 로고
    • Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
    • B. Li and K. C. Sim, "Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2010, pp. 526-529.
    • (2010) Proc. Annu. Conf. Int. Speech Commun. Assoc , pp. 526-529
    • Li, B.1    Sim, K.C.2
  • 45
    • 84912109599 scopus 로고    scopus 로고
    • Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition
    • S. Xue, H. Jiang, and L. Dai, "Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition," in Proc. 9th Int. Symp. Chin. Spoken Lang. Process., 2014, pp. 1-5.
    • (2014) Proc. 9th Int. Symp. Chin. Spoken Lang. Process , pp. 1-5
    • Xue, S.1    Jiang, H.2    Dai, L.3
  • 46
    • 84905229915 scopus 로고    scopus 로고
    • Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network
    • J. Xue, J. Li, D. Yu, M. Seltzer, and Y. Gong, "Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2014, pp. 6359-6363.
    • (2014) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 6359-6363
    • Xue, J.1    Li, J.2    Yu, D.3    Seltzer, M.4    Gong, Y.5
  • 47
    • 84906227589 scopus 로고    scopus 로고
    • Restructuring of deep neural network acoustic models with singular value decomposition
    • J. Xue, J. Li, and Y. Gong, "Restructuring of deep neural network acoustic models with singular value decomposition," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2013, pp. 2365-2369.
    • (2013) Proc. Annu. Conf. Int. Speech Commun. Assoc , pp. 2365-2369
    • Xue, J.1    Li, J.2    Gong, Y.3
  • 48
    • 84946061232 scopus 로고    scopus 로고
    • Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data
    • Y. Zhao, J. Li, J. Xue, and Y. Gong, "Investigating online low-footprint speaker adaptation using generalized linear regression and click-through data," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2015, pp. 4310-4314.
    • (2015) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 4310-4314
    • Zhao, Y.1    Li, J.2    Xue, J.3    Gong, Y.4
  • 49
    • 84973321190 scopus 로고    scopus 로고
    • DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions
    • C. Zhang and P. C. Woodland, "DNN speaker adaptation using parameterised sigmoid and ReLU hidden activation functions," in Proc. Int. Conf. Acoust., Speech, Signal Process., 2016, pp. 5300-5304.
    • (2016) Proc. Int. Conf. Acoust., Speech, Signal Process , pp. 5300-5304
    • Zhang, C.1    Woodland, P.C.2
  • 53
    • 84871614543 scopus 로고    scopus 로고
    • A novel loss function for the overall risk criterion based discriminative training of HMMmodels
    • J. Kaiser, B. Horvat, and Z. Kacic, "A novel loss function for the overall risk criterion based discriminative training of HMMmodels," in Proc. 6th Int. Conf. Spoken Lang. Process., 2000, pp. 887-890.
    • (2000) Proc. 6th Int. Conf. Spoken Lang. Process , pp. 887-890
    • Kaiser, J.1    Horvat, B.2    Kacic, Z.3
  • 54
    • 44949182698 scopus 로고    scopus 로고
    • Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition
    • M. Gibson and T. Hain, "Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2006, pp. 2406-2409.
    • (2006) Proc. Annu. Conf. Int. Speech Commun. Assoc , pp. 2406-2409
    • Gibson, M.1    Hain, T.2
  • 55
    • 33746600649 scopus 로고    scopus 로고
    • Reducing the dimensionality of data with neural networks
    • G. E. Hinton and R. R. Salakhutdinov, "Reducing the dimensionality of data with neural networks," Science, vol. 313, pp. 504-507, 2006.
    • (2006) Science , vol.313 , pp. 504-507
    • Hinton, G.E.1    Salakhutdinov, R.R.2
  • 56
    • 0002533801 scopus 로고
    • The empirical Bayes approach to statistical decision problems
    • H. Robbins, "The empirical Bayes approach to statistical decision problems," Ann. Math. Statist., vol. 35, no. 1, 1964.
    • (1964) Ann. Math. Statist. , vol.35 , Issue.1
    • Robbins, H.1
  • 59
    • 84910084579 scopus 로고    scopus 로고
    • 2000 NIST evaluation of conversational speech recognition over the telephone: English and Mandarin performance results
    • J. Fiscus, W. M. Fisher, A. F. Martin, M. A. Przybocki, and D. S. Pallett, "2000 NIST evaluation of conversational speech recognition over the telephone: English and Mandarin performance results," in Proc. Speech Transcription Workshop, 2000, pp. 1-5.
    • (2000) Proc. Speech Transcription Workshop , pp. 1-5
    • Fiscus, J.1    Fisher, W.M.2    Martin, A.F.3    Przybocki, M.A.4    Pallett, D.S.5
  • 60
    • 84865801985 scopus 로고    scopus 로고
    • Conversational speech transcription using context-dependent deep neural networks
    • F. Seide, G. Li, and D. Yu, "Conversational speech transcription using context-dependent deep neural networks," in Proc. Annu. Conf. Int. Speech Commun. Assoc., 2011, pp. 437-440.
    • (2011) Proc. Annu. Conf. Int. Speech Commun. Assoc , pp. 437-440
    • Seide, F.1    Li, G.2    Yu, D.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.