메뉴 건너뛰기




Volumn 31, Issue 1, 2015, Pages 65-86

Environmentally robust ASR front-end for deep neural network acoustic models

Author keywords

Deep neural network; Environmental robustness; Front end; Meeting transcription

Indexed keywords

ALIGNMENT; FILTER BANKS; LOUDSPEAKERS; SPEECH ENHANCEMENT; TRANSCRIPTION;

EID: 84919935784     PISSN: 08852308     EISSN: 10958363     Source Type: Journal    
DOI: 10.1016/j.csl.2014.11.008     Document Type: Article
Times cited : (58)

References (78)
  • 3
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • S.F. Boll Suppression of acoustic noise in speech using spectral subtraction IEEE Trans. Acoust. Speech Signal Process. 27 1979 113 120
    • (1979) IEEE Trans. Acoust. Speech Signal Process. , vol.27 , pp. 113-120
    • Boll, S.F.1
  • 6
    • 84890488704 scopus 로고    scopus 로고
    • Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction
    • S.Y. Chang, B.T. Meyer, and N. Morgan Spectro-temporal features for noise-robust speech recognition using power-law nonlinearity and power-bias subtraction Proc. Int. Conf. Acoust., Speech, Signal Process. 2013 7063 7067
    • (2013) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 7063-7067
    • Chang, S.Y.1    Meyer, B.T.2    Morgan, N.3
  • 7
    • 0036543522 scopus 로고    scopus 로고
    • Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator
    • I. Cohen Optimal speech enhancement under signal presence uncertainty using log-spectral amplitude estimator IEEE Signal Process. Lett. 9 2002 113 116
    • (2002) IEEE Signal Process. Lett. , vol.9 , pp. 113-116
    • Cohen, I.1
  • 8
    • 0041360463 scopus 로고    scopus 로고
    • Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
    • I. Cohen Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging IEEE Trans. Speech Audio Process. 11 2003 466 475
    • (2003) IEEE Trans. Speech Audio Process. , vol.11 , pp. 466-475
    • Cohen, I.1
  • 10
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition
    • G.E. Dahl, D. Yu, L. Deng, and A. Acero Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition IEEE Trans. Audio Speech Lang. Process. 20 2012 30 42
    • (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , pp. 30-42
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 12
    • 2142756950 scopus 로고    scopus 로고
    • Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise
    • L. Deng, J. Droppo, and A. Acero Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise IEEE Trans. Speech Audio Process. 12 2004 133 143
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , pp. 133-143
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 14
    • 20444414457 scopus 로고    scopus 로고
    • Analysis and comparison of two speech feature extraction/compensation algorithms
    • L. Deng, J. Wu, J. Droppo, and A. Acero Analysis and comparison of two speech feature extraction/compensation algorithms IEEE Signal Process. Lett. 12 2005 477 480
    • (2005) IEEE Signal Process. Lett. , vol.12 , pp. 477-480
    • Deng, L.1    Wu, J.2    Droppo, J.3    Acero, A.4
  • 15
    • 84901773892 scopus 로고    scopus 로고
    • Environmental robustness
    • J. Benesty, M.M. Sondhi, Y. Huang, Springer
    • J. Droppo, and A. Acero Environmental robustness J. Benesty, M.M. Sondhi, Y. Huang, Springer Handbook of Speech Processing 2008 Springer 653 679
    • (2008) Springer Handbook of Speech Processing , pp. 653-679
    • Droppo, J.1    Acero, A.2
  • 16
    • 85006734596 scopus 로고    scopus 로고
    • Evaluation of the SPLICE algorithm on the Aurora2 database
    • J. Droppo, A. Acero, and L. Deng Evaluation of the SPLICE algorithm on the Aurora2 database Proc. Eurospeech 2001 217 220
    • (2001) Proc. Eurospeech , pp. 217-220
    • Droppo, J.1    Acero, A.2    Deng, L.3
  • 17
    • 84905284245 scopus 로고    scopus 로고
    • Synthesized stereo mapping via deep neural networks for noisy speech recognition
    • J. Du, L.R. Dai, and Q. Huo Synthesized stereo mapping via deep neural networks for noisy speech recognition Proc. Int. Conf. Acoust., Speech, Signal Process. 2014 1764 1768
    • (2014) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 1764-1768
    • Du, J.1    Dai, L.R.2    Huo, Q.3
  • 18
    • 0021892216 scopus 로고
    • Speech enhancement using a minimum mean-square error log-spectral amplitude estimator
    • Y. Ephraim Speech enhancement using a minimum mean-square error log-spectral amplitude estimator IEEE Trans. Acoust. Speech Signal Process. 33 1985 443 445
    • (1985) IEEE Trans. Acoust. Speech Signal Process. , vol.33 , pp. 443-445
    • Ephraim, Y.1
  • 20
    • 0032050110 scopus 로고    scopus 로고
    • Maximum likelihood linear transformations for HMM-based speech recognition
    • M.J.F. Gales Maximum likelihood linear transformations for HMM-based speech recognition Comput. Speech Lang. 12 1998 75 98
    • (1998) Comput. Speech Lang. , vol.12 , pp. 75-98
    • Gales, M.J.F.1
  • 21
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • M.J.F. Gales Semi-tied covariance matrices for hidden Markov models IEEE Trans. Speech Audio Process. 7 1999 272 281
    • (1999) IEEE Trans. Speech Audio Process. , vol.7 , pp. 272-281
    • Gales, M.J.F.1
  • 22
    • 0034227757 scopus 로고    scopus 로고
    • Cluster adaptive training of hidden Markov models
    • M.J.F. Gales Cluster adaptive training of hidden Markov models IEEE Trans. Speech Audio Process. 8 2000 417 428
    • (2000) IEEE Trans. Speech Audio Process. , vol.8 , pp. 417-428
    • Gales, M.J.F.1
  • 23
    • 84878418279 scopus 로고    scopus 로고
    • Model-based approaches for degraded channel modelling in robust ASR
    • M.J.F. Gales, and F. Flego Model-based approaches for degraded channel modelling in robust ASR Proc. Interspeech 2012
    • (2012) Proc. Interspeech
    • Gales, M.J.F.1    Flego, F.2
  • 29
    • 34047249084 scopus 로고    scopus 로고
    • Quantile based histogram equalization for noise robust large vocabulary speech recognition
    • F. Hilger, and H. Ney Quantile based histogram equalization for noise robust large vocabulary speech recognition IEEE Trans. Audio Speech Lang. Process. 14 2006 845 854
    • (2006) IEEE Trans. Audio Speech Lang. Process. , vol.14 , pp. 845-854
    • Hilger, F.1    Ney, H.2
  • 33
    • 84867608537 scopus 로고    scopus 로고
    • Power-normalized cepstral coefficients (pncc) for robust speech recognition
    • C. Kim, and R.M. Sterm Power-normalized cepstral coefficients (pncc) for robust speech recognition Proc. Int. Conf. Acoust., Speech, Signal Process. 2012 4101 4104
    • (2012) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 4101-4104
    • Kim, C.1    Sterm, R.M.2
  • 35
    • 77955673019 scopus 로고    scopus 로고
    • Model-based feature enhancement for reverberant speech recognition
    • A. Krueger, and R. Haeb-Umbach Model-based feature enhancement for reverberant speech recognition IEEE Trans. Audio Speech Lang. Process. 18 2010 1692 1707
    • (2010) IEEE Trans. Audio Speech Lang. Process. , vol.18 , pp. 1692-1707
    • Krueger, A.1    Haeb-Umbach, R.2
  • 36
    • 14344274593 scopus 로고    scopus 로고
    • A new method based on spectral subtraction for speech dereverberation
    • K. Lebart, J.M. Boucher, and P.N. Denbigh A new method based on spectral subtraction for speech dereverberation Acta Acust. Unit. Acust. 87 2001 359 366
    • (2001) Acta Acust. Unit. Acust. , vol.87 , pp. 359-366
    • Lebart, K.1    Boucher, J.M.2    Denbigh, P.N.3
  • 37
    • 79959849500 scopus 로고    scopus 로고
    • Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems
    • B. Li, and K.C. Sim Comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems Proc. Interspeech 2010 526 529
    • (2010) Proc. Interspeech , pp. 526-529
    • Li, B.1    Sim, K.C.2
  • 38
    • 84890532503 scopus 로고    scopus 로고
    • Noise adaptive front-end normalization based on vector Taylor series for deep neural networks in robust speech recognition
    • B. Li, and K.C. Sim Noise adaptive front-end normalization based on vector Taylor series for deep neural networks in robust speech recognition Proc. Int. Conf. Acoust., Speech, Signal Process. 2013 7408 7412
    • (2013) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 7408-7412
    • Li, B.1    Sim, K.C.2
  • 39
    • 84905216746 scopus 로고    scopus 로고
    • An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition
    • B. Li, and K.C. Sim An ideal hidden-activation mask for deep neural networks based noise-robust speech recognition Proc. Int. Conf. Acoust., Speech, Signal Process. 2014 200 204
    • (2014) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 200-204
    • Li, B.1    Sim, K.C.2
  • 40
  • 41
  • 46
    • 0029306621 scopus 로고
    • Continuous speech recognition: An introduction to the hybrid HMM/connectionist approach
    • N. Morgan, and H. Bourlard Continuous speech recognition: an introduction to the hybrid HMM/connectionist approach IEEE Signal Process. Mag. 12 1995 24 42
    • (1995) IEEE Signal Process. Mag. , vol.12 , pp. 24-42
    • Morgan, N.1    Bourlard, H.2
  • 49
    • 79251574977 scopus 로고    scopus 로고
    • The efficient incorporation of MLP features into automatic speech recognition systems
    • J. Park, F. Diehl, M.J.F. Gales, M. Tomalin, and P.C. Woodland The efficient incorporation of MLP features into automatic speech recognition systems Comput. Speech Lang. 25 2011 519 534
    • (2011) Comput. Speech Lang. , vol.25 , pp. 519-534
    • Park, J.1    Diehl, F.2    Gales, M.J.F.3    Tomalin, M.4    Woodland, P.C.5
  • 51
    • 84858985237 scopus 로고    scopus 로고
    • Improved acoustic feature combination for LVCSR by neural networks
    • C. Plahl, R. Schlüter, and H. Ney Improved acoustic feature combination for LVCSR by neural networks Proc. Interspeech 2011 1237 1240
    • (2011) Proc. Interspeech , pp. 1237-1240
    • Plahl, C.1    Schlüter, R.2    Ney, H.3
  • 61
    • 70349206345 scopus 로고    scopus 로고
    • Bayesian feature enhancement using a mixture of unscented transformations for uncertainty decoding of noisy speech
    • Y. Shinohara, and M. Akamine Bayesian feature enhancement using a mixture of unscented transformations for uncertainty decoding of noisy speech Proc. Int. Conf. Acoust., Speech, Signal Process. 2009 4569 4572
    • (2009) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 4569-4572
    • Shinohara, Y.1    Akamine, M.2
  • 66
    • 67650107416 scopus 로고    scopus 로고
    • Recognition of reverberant speech using frequency domain linear prediction
    • S.S. Thomas, S. Ganapathy, and H. Hermansky Recognition of reverberant speech using frequency domain linear prediction IEEE Signal Process. Lett. 2008 681 684
    • (2008) IEEE Signal Process. Lett. , pp. 681-684
    • Thomas, S.S.1    Ganapathy, S.2    Hermansky, H.3
  • 67
    • 84862293102 scopus 로고    scopus 로고
    • Speaker and noise factorization for robust speech recognition
    • Y. Wang, and M.J.F. Gales Speaker and noise factorization for robust speech recognition IEEE Trans. Audio Speech Lang. Process. 20 2012 2149 2158
    • (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , pp. 2149-2158
    • Wang, Y.1    Gales, M.J.F.2
  • 69
    • 84878418827 scopus 로고    scopus 로고
    • A feature space transformation method for personalization using generalized i-vector clustering
    • K. Yao, Y. Gong, and C. Liu A feature space transformation method for personalization using generalized i-vector clustering Proc. Interspeech 2011
    • (2011) Proc. Interspeech
    • Yao, K.1    Gong, Y.2    Liu, C.3
  • 72
    • 84867693894 scopus 로고    scopus 로고
    • Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening
    • T. Yoshioka, and T. Nakatani Generalization of multi-channel linear prediction methods for blind MIMO impulse response shortening IEEE Trans. Audio Speech Lang. Process. 20 2012 2707 2720
    • (2012) IEEE Trans. Audio Speech Lang. Process. , vol.20 , pp. 2707-2720
    • Yoshioka, T.1    Nakatani, T.2
  • 73
    • 84881043147 scopus 로고    scopus 로고
    • Noise model transfer: Novel approach to robustness against nonstationary noise
    • T. Yoshioka, and T. Nakatani Noise model transfer: novel approach to robustness against nonstationary noise IEEE Trans. Audio Speech Lang. Process. 21 2013 2182 2192
    • (2013) IEEE Trans. Audio Speech Lang. Process. , vol.21 , pp. 2182-2192
    • Yoshioka, T.1    Nakatani, T.2
  • 74
    • 85032751613 scopus 로고    scopus 로고
    • Making machines understand us in reverberant rooms: Robustness against reverberation for automatic speech recognition
    • T. Yoshioka, A. Sehr, M. Delcroix, K. Kinoshita, R. Maas, T. Nakatani, and W. Kellermann Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition IEEE Signal Process. Mag. 29 2012 114 126
    • (2012) IEEE Signal Process. Mag. , vol.29 , pp. 114-126
    • Yoshioka, T.1    Sehr, A.2    Delcroix, M.3    Kinoshita, K.4    Maas, R.5    Nakatani, T.6    Kellermann, W.7
  • 77
    • 66149101303 scopus 로고    scopus 로고
    • Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor
    • D. Yu, L. Deng, J. Droppo, J. Wu, Y. Gong, and A. Acero Robust speech recognition using a cepstral minimum-mean-square-error-motivated noise suppressor IEEE Trans. Audio Speech Lang. Process. 16 2008 1061 1070
    • (2008) IEEE Trans. Audio Speech Lang. Process. , vol.16 , pp. 1061-1070
    • Yu, D.1    Deng, L.2    Droppo, J.3    Wu, J.4    Gong, Y.5    Acero, A.6
  • 78
    • 84890542079 scopus 로고    scopus 로고
    • KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition
    • D. Yu, K. Yao, H. Su, G. Li, and F. Seide KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition Proc. Int. Conf. Acoust., Speech, Signal Process. 2013 7893 7897
    • (2013) Proc. Int. Conf. Acoust., Speech, Signal Process. , pp. 7893-7897
    • Yu, D.1    Yao, K.2    Su, H.3    Li, G.4    Seide, F.5


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.