메뉴 건너뛰기




Volumn 98, Issue , 2017, Pages 1-7

Hierarchical Bayesian combination of plug-in maximum a posteriori decoders in deep neural networks-based speech recognition and speaker adaptation

Author keywords

Automatic speech recognition; Bayesian learning; Deep neural networks; Sequential patterns; System combination

Indexed keywords

BAYESIAN NETWORKS; DECODING; DEEP NEURAL NETWORKS; HIDDEN MARKOV MODELS; HIERARCHICAL SYSTEMS; LOUDSPEAKERS; PATTERN RECOGNITION SYSTEMS;

EID: 85026840738     PISSN: 01678655     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.patrec.2017.08.001     Document Type: Article
Times cited : (10)

References (63)
  • 1
    • 84890452886 scopus 로고    scopus 로고
    • Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code
    • Abdel-Hamid, O., Jiang, H., Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. Proceedings of ICASSP, 2013, 7942–7946.
    • (2013) Proceedings of ICASSP , pp. 7942-7946
    • Abdel-Hamid, O.1    Jiang, H.2
  • 3
    • 33846516584 scopus 로고    scopus 로고
    • Pattern Recognition and Machine Learning
    • Springer
    • Bishop, C.M., Pattern Recognition and Machine Learning. 2006, Springer.
    • (2006)
    • Bishop, C.M.1
  • 5
    • 0030211964 scopus 로고    scopus 로고
    • Bagging predictors
    • Breiman, L., Bagging predictors. Mach. Learn. 24 (1996), 123–140.
    • (1996) Mach. Learn. , vol.24 , pp. 123-140
    • Breiman, L.1
  • 6
    • 0035478854 scopus 로고    scopus 로고
    • Random forests
    • Breiman, L., Random forests. Mach. Learn., 45, 2001, 2001.
    • (2001) Mach. Learn. , vol.45 , pp. 2001
    • Breiman, L.1
  • 7
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • Dahl, G.E., Yu, D., Deng, L., Acero, A., Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20:1 (2012), 30–42.
    • (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , Issue.1 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 8
    • 0003759417 scopus 로고    scopus 로고
    • Optimal statistical decisions
    • John Wiley & Sons
    • DeGroot, M.H., Morris, H., Optimal statistical decisions. 82, 2005, John Wiley & Sons.
    • (2005) , vol.82
    • DeGroot, M.H.1    Morris, H.2
  • 9
    • 84876672166 scopus 로고    scopus 로고
    • Machine learning paradigms for speech recognition: an overview
    • Deng, L., Li, X., Machine learning paradigms for speech recognition: an overview. IEEE Trans. Audio Speech Lang. Process. 21 (2013), 1060–1089.
    • (2013) IEEE Trans. Audio Speech Lang. Process. , vol.21 , pp. 1060-1089
    • Deng, L.1    Li, X.2
  • 10
    • 84910048046 scopus 로고    scopus 로고
    • Ensemble deep learning for speech recognition.
    • Deng, L., Platt, J.C., Ensemble deep learning for speech recognition. Proceedings of INTERSPEECH, 2014, 1915–1919.
    • (2014) Proceedings of INTERSPEECH , pp. 1915-1919
    • Deng, L.1    Platt, J.C.2
  • 13
    • 0030638031 scopus 로고    scopus 로고
    • A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover)
    • Fiscus, J.G., A post-processing system to yield reduced word error rates: recognizer output voting error reduction (rover). Proceedings of ASRU, 1997, 347–354.
    • (1997) Proceedings of ASRU , pp. 347-354
    • Fiscus, J.G.1
  • 15
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains
    • Gauvain, J., Lee, C.-H., Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process., 2, 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2
    • Gauvain, J.1    Lee, C.-H.2
  • 16
    • 0001596920 scopus 로고    scopus 로고
    • Large vocabulary continuous speech recognition: advances and applications
    • Gauvain, J.-L., Lamel, L., Large vocabulary continuous speech recognition: advances and applications. Proc. IEEE 88:8 (2000), 1181–1200.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1181-1200
    • Gauvain, J.-L.1    Lamel, L.2
  • 17
    • 34548012893 scopus 로고    scopus 로고
    • Linear hidden transformations for adaptation of hybrid ANN/HMM models
    • Gemello, R., Mana, F., Scanzio, S., Laface, P., Mori, R.D., Linear hidden transformations for adaptation of hybrid ANN/HMM models. Speech Commun. 49:10 (2007), 827–835.
    • (2007) Speech Commun. , vol.49 , Issue.10 , pp. 827-835
    • Gemello, R.1    Mana, F.2    Scanzio, S.3    Laface, P.4    Mori, R.D.5
  • 19
    • 84886580175 scopus 로고    scopus 로고
    • Bayesian model combination
    • Gatsby Computational Neuroscience Unit University College London
    • Ghahramani, Z., Kim, H.-C., Bayesian model combination. Technical Report, 2003, Gatsby Computational Neuroscience Unit, University College London.
    • (2003) Technical Report
    • Ghahramani, Z.1    Kim, H.-C.2
  • 23
    • 33746600649 scopus 로고    scopus 로고
    • Reducing the dimensionality of data with neural networks
    • Hinton, G.E., Salakhutdinov, R.R., Reducing the dimensionality of data with neural networks. Science 313:5786 (2006), 504–507.
    • (2006) Science , vol.313 , Issue.5786 , pp. 504-507
    • Hinton, G.E.1    Salakhutdinov, R.R.2
  • 25
    • 84959161626 scopus 로고    scopus 로고
    • Maximum a posteriori adaptation of network parameters in deep models
    • Huang, Z., Siniscalchi, S.M., Chen, I.-F., Li, J., Wu, J., Lee, C.-H., Maximum a posteriori adaptation of network parameters in deep models. INTERSPEECH, 2015, 1076–1080.
    • (2015) INTERSPEECH , pp. 1076-1080
    • Huang, Z.1    Siniscalchi, S.M.2    Chen, I.-F.3    Li, J.4    Wu, J.5    Lee, C.-H.6
  • 26
    • 85002900398 scopus 로고    scopus 로고
    • Bayesian unsupervised batch and online speaker adaptation of activation function parameters in deep models for automatic speech recognition
    • Huang, Z., Siniscalchi, S.M., Lee, C.-H., Bayesian unsupervised batch and online speaker adaptation of activation function parameters in deep models for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process., 25, 2017.
    • (2017) IEEE/ACM Trans. Audio Speech Lang. Process. , vol.25
    • Huang, Z.1    Siniscalchi, S.M.2    Lee, C.-H.3
  • 28
    • 84910107057 scopus 로고    scopus 로고
    • A dempster-shafer theory based combination of handwriting recognition systems with multiple rejection strategies
    • Kessentini, Y., Burger, T., Paquet, T., A dempster-shafer theory based combination of handwriting recognition systems with multiple rejection strategies. Pattern Recognit. Lett. 48 (2015), 534–544.
    • (2015) Pattern Recognit. Lett. , vol.48 , pp. 534-544
    • Kessentini, Y.1    Burger, T.2    Paquet, T.3
  • 30
    • 0035509488 scopus 로고    scopus 로고
    • Speech recognition and utterance verification based on a generalized confidence score
    • Koo, M.-W., Lee, C.-H., Juang, B.-H., Speech recognition and utterance verification based on a generalized confidence score. IEEE Trans. Speech Audio Process. 9:8 (2001), 821–831.
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.8 , pp. 821-831
    • Koo, M.-W.1    Lee, C.-H.2    Juang, B.-H.3
  • 31
    • 0030351374 scopus 로고    scopus 로고
    • On designing pronunciation lexicons for large vocabulary continuous speech recognition
    • Lamel, L., Adda, G., On designing pronunciation lexicons for large vocabulary continuous speech recognition. Proceedings of ICSLP, 1, 1996.
    • (1996) Proceedings of ICSLP , vol.1
    • Lamel, L.1    Adda, G.2
  • 32
    • 0000159105 scopus 로고    scopus 로고
    • On adaptive decision rules and decision parameter adaptation for automatic speech recognition
    • Lee, C.-H., Huo, Q., On adaptive decision rules and decision parameter adaptation for automatic speech recognition. Proc. IEEE 88 (2000), 1241–1269.
    • (2000) Proc. IEEE , vol.88 , pp. 1241-1269
    • Lee, C.-H.1    Huo, Q.2
  • 33
    • 84876694595 scopus 로고    scopus 로고
    • An information-extraction approach to speech processing: analysis, detection, verification, and recognition
    • Lee, C.-H., Siniscalchi, S.M., An information-extraction approach to speech processing: analysis, detection, verification, and recognition. Proc. IEEE 101:5 (2013), 1089–1115.
    • (2013) Proc. IEEE , vol.101 , Issue.5 , pp. 1089-1115
    • Lee, C.-H.1    Siniscalchi, S.M.2
  • 35
    • 0027683813 scopus 로고
    • Shared-distribution hidden markov models for speech recognition
    • M.-Y. Hwang, M.-Y., Huang, X., Shared-distribution hidden markov models for speech recognition. IEEE Trans. Speech Audio Process. 1:4 (1993), 414–420.
    • (1993) IEEE Trans. Speech Audio Process. , vol.1 , Issue.4 , pp. 414-420
    • M.-Y. Hwang, M.-Y.1    Huang, X.2
  • 36
    • 0034296009 scopus 로고    scopus 로고
    • Finding consensus in speech recognition: word error minimization and other applications of confusion networks
    • Mangu, L., Brill, E., Stolcke, A., Finding consensus in speech recognition: word error minimization and other applications of confusion networks. Comput. Speech Lang. 14:4 (2000), 373–400.
    • (2000) Comput. Speech Lang. , vol.14 , Issue.4 , pp. 373-400
    • Mangu, L.1    Brill, E.2    Stolcke, A.3
  • 37
    • 84994235596 scopus 로고    scopus 로고
    • Fusion strategies for robust speech recognition and keyword spotting for channel- and noise-degraded speech
    • Mitra, V., et al. Fusion strategies for robust speech recognition and keyword spotting for channel- and noise-degraded speech. Proceedings of Interspeech, San Francisco, CA, USA, 2016, 3683–3687.
    • (2016) Proceedings of Interspeech, San Francisco, CA, USA , pp. 3683-3687
    • Mitra, V.1
  • 38
    • 64849090489 scopus 로고    scopus 로고
    • Conditional random fields for integrating local discriminative classifiers
    • Morris, J., Fosler-Lussier, E., Conditional random fields for integrating local discriminative classifiers. IEEE Trans. Audio Speech Lang. Process. 16:3 (2008), 617–628.
    • (2008) IEEE Trans. Audio Speech Lang. Process. , vol.16 , Issue.3 , pp. 617-628
    • Morris, J.1    Fosler-Lussier, E.2
  • 39
    • 0022012892 scopus 로고
    • Optimal solution of a training problem in speech recognition
    • Nadas, A., Optimal solution of a training problem in speech recognition. IEEE Trans. Acoust. Speech Signal Process. 33:1 (1985), 326–329.
    • (1985) IEEE Trans. Acoust. Speech Signal Process. , vol.33 , Issue.1 , pp. 326-329
    • Nadas, A.1
  • 40
    • 0000635720 scopus 로고    scopus 로고
    • Progresses in dynamic programming search for LVCSR
    • Ney, H., Ortmanns, S., Progresses in dynamic programming search for LVCSR. Proc. IEEE 88:8 (2000), 1224–1240.
    • (2000) Proc. IEEE , vol.88 , Issue.8 , pp. 1224-1240
    • Ney, H.1    Ortmanns, S.2
  • 41
    • 0003781238 scopus 로고    scopus 로고
    • Markov Chains 2
    • Cambridge University Press
    • Norris, J.R., Markov Chains 2. 1998, Cambridge University Press.
    • (1998)
    • Norris, J.R.1
  • 43
    • 84991384259 scopus 로고    scopus 로고
    • Very deep convolutional neural networks for noise robust speech recognition
    • Qian, Y., Bi, M., Tan, T., Yu, K., Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24:12 (2016), 2263–2276.
    • (2016) IEEE/ACM Trans. Audio Speech Lang. Process. , vol.24 , Issue.12 , pp. 2263-2276
    • Qian, Y.1    Bi, M.2    Tan, T.3    Yu, K.4
  • 44
    • 0024610919 scopus 로고
    • A tutorial on hidden Markov models and selected applications in speech recognition
    • Rabiner, L., A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:2 (1989), 257–286.
    • (1989) Proc. IEEE , vol.77 , Issue.2 , pp. 257-286
    • Rabiner, L.1
  • 45
    • 84929376602 scopus 로고    scopus 로고
    • Bounded conditional mean imputation with observation uncertainties and acoustic model adaptation
    • Remes, U., López, A.R., Palomäki, D., Bounded conditional mean imputation with observation uncertainties and acoustic model adaptation. IEEE/ACM Trans. Audio Speech Lang. Process. 23 (2015), 1198–1208.
    • (2015) IEEE/ACM Trans. Audio Speech Lang. Process. , vol.23 , pp. 1198-1208
    • Remes, U.1    López, A.R.2    Palomäki, D.3
  • 46
    • 75149176174 scopus 로고    scopus 로고
    • Ensemble-based classifiers
    • Rokach, L., Ensemble-based classifiers. Artif. Intell. Rev. 33:1–2 (2010), 1–39.
    • (2010) Artif. Intell. Rev. , vol.33 , Issue.1-2 , pp. 1-39
    • Rokach, L.1
  • 47
    • 84893691530 scopus 로고    scopus 로고
    • Speaker adaptation of neural network acoustic models using i-vectors
    • Saon, G., Soltau, H., Nahamoo, D., Picheny, M., Speaker adaptation of neural network acoustic models using i-vectors. Proc. ASRU, 2013, 55–59.
    • (2013) Proc. ASRU , pp. 55-59
    • Saon, G.1    Soltau, H.2    Nahamoo, D.3    Picheny, M.4
  • 48
    • 84858976070 scopus 로고    scopus 로고
    • Feature engineering in context-dependent deep neural networks for conversational speech transcription
    • Seide, F., Li, G., Chen, X., Yu, D., Feature engineering in context-dependent deep neural networks for conversational speech transcription. Proc. ASRU, 2011, 24–29.
    • (2011) Proc. ASRU , pp. 24-29
    • Seide, F.1    Li, G.2    Chen, X.3    Yu, D.4
  • 49
    • 0035279111 scopus 로고    scopus 로고
    • A structural Bayes approach to speaker adaptation
    • Shinoda, K., Lee, C.-H., A structural Bayes approach to speaker adaptation. IEEE Trans. Speech Audio Process. 9:3 (2001), 276–287.
    • (2001) IEEE Trans. Speech Audio Process. , vol.9 , Issue.3 , pp. 276-287
    • Shinoda, K.1    Lee, C.-H.2
  • 50
    • 84881054791 scopus 로고    scopus 로고
    • Hermitian polynomial for speaker adaptation of connectionist speech recognition systems
    • Siniscalchi, S.M., Li, J., Lee, C.-H., Hermitian polynomial for speaker adaptation of connectionist speech recognition systems. IEEE Trans. Audio Speech Lang. Process. 21:10 (2013), 2152–2161.
    • (2013) IEEE Trans. Audio Speech Lang. Process. , vol.21 , Issue.10 , pp. 2152-2161
    • Siniscalchi, S.M.1    Li, J.2    Lee, C.-H.3
  • 51
    • 84890492591 scopus 로고    scopus 로고
    • Revisiting hybrid and GMM-HMM system combination techniques
    • Swietojanski, P., Ghoshal, A., Renals, S., Revisiting hybrid and GMM-HMM system combination techniques. Proceedings of ICASSP, 2013, 6744–6748.
    • (2013) Proceedings of ICASSP , pp. 6744-6748
    • Swietojanski, P.1    Ghoshal, A.2    Renals, S.3
  • 52
    • 84976435936 scopus 로고    scopus 로고
    • Learning hidden unit contributions for unsupervised acoustic model adaptation
    • Swietojanski, P., Li, J., Renals, S., Learning hidden unit contributions for unsupervised acoustic model adaptation. IEEE/ACM Trans. Audio Speech Lang. Process. 24 (2016), 1450–1463.
    • (2016) IEEE/ACM Trans. Audio Speech Lang. Process. , vol.24 , pp. 1450-1463
    • Swietojanski, P.1    Li, J.2    Renals, S.3
  • 53
    • 85019835456 scopus 로고    scopus 로고
    • Using line segments to train multi-stream stacked autoencoders for image classification
    • Tang, X.-S., Has, K., Wei, H., Ding, Y., Using line segments to train multi-stream stacked autoencoders for image classification. Pattern Recognit. Lett. 94 (2017), 55–61.
    • (2017) Pattern Recognit. Lett. , vol.94 , pp. 55-61
    • Tang, X.-S.1    Has, K.2    Wei, H.3    Ding, Y.4
  • 54
    • 84935113569 scopus 로고
    • Error bounds for convolutional codes and an asymptotically optimum decoding algorithm
    • Viterbi, A., Error bounds for convolutional codes and an asymptotically optimum decoding algorithm. IEEE Trans. Inf. Theory 13:2 (1967), 260–269.
    • (1967) IEEE Trans. Inf. Theory , vol.13 , Issue.2 , pp. 260-269
    • Viterbi, A.1
  • 55
    • 84906237512 scopus 로고    scopus 로고
    • Investigations on hessian-free optimization for cross-entropy training of deep neural networks.
    • Wiesler, S., Li, J., Xue, J., Investigations on hessian-free optimization for cross-entropy training of deep neural networks. Proc. INTERSPEECH, 2013, 3317–3321.
    • (2013) Proc. INTERSPEECH , pp. 3317-3321
    • Wiesler, S.1    Li, J.2    Xue, J.3
  • 56
    • 0026692226 scopus 로고
    • Stacked generalization
    • Wolpert, D., Stacked generalization. Neural Networks 5 (1992), 241–259.
    • (1992) Neural Networks , vol.5 , pp. 241-259
    • Wolpert, D.1
  • 57
    • 79953250475 scopus 로고    scopus 로고
    • Minimum Byes risk decoding and system combination based on a recursion for edit distance
    • Xu, H., Povey, D., Mangu, L., Zhu, J., Minimum Byes risk decoding and system combination based on a recursion for edit distance. Comput. Speech Lang. 25:4 (2011), 802–828.
    • (2011) Comput. Speech Lang. , vol.25 , Issue.4 , pp. 802-828
    • Xu, H.1    Povey, D.2    Mangu, L.3    Zhu, J.4
  • 58
    • 84921731072 scopus 로고    scopus 로고
    • Fast adaptation of deep neural network based on discriminant codes for speech recognition
    • Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L., Liu, Q., Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22:12 (2014), 1713–1725.
    • (2014) IEEE/ACM Trans. Audio Speech Lang. Process. , vol.22 , Issue.12 , pp. 1713-1725
    • Xue, S.1    Abdel-Hamid, O.2    Jiang, H.3    Dai, L.4    Liu, Q.5
  • 59
    • 84906225757 scopus 로고    scopus 로고
    • A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR.
    • Yan, Z., Huo, Q., Xu, J., A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. Proceedings of INTERSPEECH, 2013, 104–108.
    • (2013) Proceedings of INTERSPEECH , pp. 104-108
    • Yan, Z.1    Huo, Q.2    Xu, J.3
  • 62
    • 84865785753 scopus 로고    scopus 로고
    • Improved bottleneck features using pretrained deep neural networks.
    • Yu, D., Seltzer, M., Improved bottleneck features using pretrained deep neural networks. Proceedings of INTERSPEECH, 2011, 237–240.
    • (2011) Proceedings of INTERSPEECH , pp. 237-240
    • Yu, D.1    Seltzer, M.2
  • 63
    • 84890542079 scopus 로고    scopus 로고
    • KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
    • Yu, D., Yao, K., Su, H., Li, G., Seide, F., KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. Proceedings of ICASSP, 2013, 7893–7897.
    • (2013) Proceedings of ICASSP , pp. 7893-7897
    • Yu, D.1    Yao, K.2    Su, H.3    Li, G.4    Seide, F.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.