메뉴 건너뛰기




Volumn 20, Issue 8, 2012, Pages 2191-2206

Nonlinear compensation using the gauss-newton method for noise-robust speech recognition

Author keywords

Gauss Newton method; nonlinear compensation; robust speech recognition; vector Taylor series (VTS)

Indexed keywords

CROSS-COVARIANCE; GAUSS-NEWTON; GAUSS-NEWTON METHODS; GAUSSIAN MIXTURE MODEL; ITERATIVE ESTIMATION; JACOBIANS; NOISE ESTIMATION; NOISE ESTIMATION ALGORITHM; NOISE PARAMETERS; NOISE ROBUST SPEECH RECOGNITION; NOISE VARIANCE ESTIMATION; NONLINEAR COMPENSATION; PARALLEL MODEL COMBINATIONS; PERFORMANCE IMPROVEMENTS; ROBUST SPEECH RECOGNITION; SAMPLING-BASED; UNIFIED APPROACH; UNSCENTED TRANSFORM; VECTOR TAYLOR SERIES;

EID: 84865208051     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2012.2199107     Document Type: Article
Times cited : (9)

References (55)
  • 1
    • 0029288202 scopus 로고
    • Speech recognition in noisy environments: A survey
    • Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol. 16, pp. 261-291, 1995.
    • (1995) Speech Commun. , vol.16 , pp. 261-291
    • Gong, Y.1
  • 2
    • 0032140546 scopus 로고    scopus 로고
    • On stochastic feature and model compensation approaches to robust speech recognition
    • C. H. Lee, "On stochastic feature and model compensation approaches to robust speech recognition," Speech Commun., vol. 25, pp. 29-47, 1998.
    • (1998) Speech Commun. , vol.25 , pp. 29-47
    • Lee, C.H.1
  • 3
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models," Comput. Speech Lang., vol. 9, pp. 171-186, 1995.
    • (1995) Comput. Speech Lang. , vol.9 , pp. 171-186
    • Leggetter, C.J.1    Woodland, P.C.2
  • 4
    • 0029375590 scopus 로고
    • Speaker adaptation using constrained estimation of Gaussian mixtures
    • Sep.
    • V. V. Digalakis, D. Rtischev, and L. G. Neumeyer, "Speaker adaptation using constrained estimation of Gaussian mixtures," IEEE Trans. Speech Audio Process., vol. 3, no. 5, pp. 357-366, Sep. 1995.
    • (1995) IEEE Trans. Speech Audio Process , vol.3 , Issue.5 , pp. 357-366
    • Digalakis, V.V.1    Rtischev, D.2    Neumeyer, L.G.3
  • 7
    • 85009113852 scopus 로고    scopus 로고
    • HMM adaptation using vector taylor series for noisy speech recognition
    • Beijing, China
    • A. Acero, L. Deng, T. Kristjansson, and J. Zhang, "HMM adaptation using vector Taylor series for noisy speech recognition," in Proc. ICSLP, Beijing, China, 2000, pp. 869-872.
    • (2000) Proc. ICSLP , pp. 869-872
    • Acero, A.1    Deng, L.2    Kristjansson, T.3    Zhang, J.4
  • 8
    • 77249100743 scopus 로고    scopus 로고
    • An HMM compensation approach using unscented transformation for noisy speech recognition
    • Kent Ridge, Singapore
    • Y. Hu and Q. Huo, "An HMM compensation approach using unscented transformation for noisy speech recognition," in Proc. ISCSLP, Kent Ridge, Singapore, 2006, pp. 346-357.
    • (2006) Proc. ISCSLP , pp. 346-357
    • Hu, Y.1    Huo, Q.2
  • 10
    • 0016067897 scopus 로고
    • Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification
    • Jun.
    • B. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. Acoust. Soc. Amer., vol. 55, no. 6, pp. 1304-1312, Jun. 1974.
    • (1974) J. Acoust. Soc. Amer. , vol.55 , Issue.6 , pp. 1304-1312
    • Atal, B.1
  • 11
    • 0002629270 scopus 로고
    • Maximum likelihood from incomplete data via the EM algorithm
    • A. P. Dempster, N. M. Laird, and D. B. Rubin, "Maximum likelihood from incomplete data via the EM algorithm," J. R. Statist. Soc. B, vol. 39, no. 1, pp. 1-38, 1977.
    • (1977) J. R. Statist. Soc. B , vol.39 , Issue.1 , pp. 1-38
    • Dempster, A.P.1    Laird, N.M.2    Rubin, D.B.3
  • 12
    • 62249130045 scopus 로고    scopus 로고
    • A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions
    • J. Li, L. Deng, D. Yu, Y. Gong, and A. Acero, "A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions," Comput. Speech Lang., vol. 23, pp. 389-405, 2009.
    • (2009) Comput. Speech Lang. , vol.23 , pp. 389-405
    • Li, J.1    Deng, L.2    Yu, D.3    Gong, Y.4    Acero, A.5
  • 14
    • 79959834126 scopus 로고    scopus 로고
    • Unscented transform with online distortion estimation for HMM adaptation
    • Makuhari, Japan
    • J. Li, D. Yu, Y. Gong, and L. Deng, "Unscented transform with online distortion estimation for HMM adaptation," in Proc. Interspeech, Makuhari, Japan, 2010, pp. 1660-1663.
    • (2010) Proc. Interspeech , pp. 1660-1663
    • Li, J.1    Yu, D.2    Gong, Y.3    Deng, L.4
  • 15
    • 77956296425 scopus 로고    scopus 로고
    • Noise adaptive training for robust automatic speech recognition
    • Nov.
    • O. Kalinli, M. Seltzer, J. Droppo, and A. Acero, "Noise adaptive training for robust automatic speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 1889-1901, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process , vol.18 , Issue.8 , pp. 1889-1901
    • Kalinli, O.1    Seltzer, M.2    Droppo, J.3    Acero, A.4
  • 16
    • 78049361008 scopus 로고    scopus 로고
    • On noise estimation for robust speech recognition using vector taylor series
    • Dallas, TX
    • Y. Zhao and B. H. Juang, "On noise estimation for robust speech recognition using vector Taylor series," in Proc. ICASSP, Dallas, TX, 2010, pp. 4290-4293.
    • (2010) Proc. ICASSP , pp. 4290-4293
    • Zhao, Y.1    Juang, B.H.2
  • 17
    • 80051643241 scopus 로고    scopus 로고
    • Non-linear noise compensation for robust speech recognition using gauss-newton method
    • Praque, Czech Republic
    • Y. Zhao and B. H. Juang, "Non-linear noise compensation for robust speech recognition using Gauss-Newton method," in Proc. ICASSP, Praque, Czech Republic, 2011, pp. 4796-4799.
    • (2011) Proc. ICASSP , pp. 4796-4799
    • Zhao, Y.1    Juang, B.H.2
  • 18
    • 2142756950 scopus 로고    scopus 로고
    • Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise
    • Mar.
    • L. Deng, J. Droppo, and A. Acero, "Enhancement of log mel power spectra of speech using a phase-sensitive model of the acoustic environment and sequential estimation of the corrupting noise," IEEE Trans. Speech Audio Process., vol. 12, no. 2, pp. 133-143, Mar. 2004.
    • (2004) IEEE Trans. Speech Audio Process , vol.12 , Issue.2 , pp. 133-143
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 19
    • 44849122740 scopus 로고    scopus 로고
    • Irrelevant variability Normalization based HMM training using VTS approximation of an explicit model of environmental distortions
    • Antwerp, Belgium
    • Y. Hu and Q. Huo, "Irrelevant variability normalization based HMM training using VTS approximation of an explicit model of environmental distortions," in Proc. Interspeech, Antwerp, Belgium, 2007, pp. 1042-1045.
    • (2007) Proc. Interspeech , pp. 1042-1045
    • Hu, Y.1    Huo, Q.2
  • 21
    • 33947694706 scopus 로고    scopus 로고
    • Model adaptation for long convolutional distortion by maximum likelihood based state filtering approach
    • Toulouse, France
    • C. K. Raut, T. Nishimoto, and S. Sagayama, "Model adaptation for long convolutional distortion by maximum likelihood based state filtering approach," in Proc. ICASSP, Toulouse, France, 2006, pp. 1133-1136.
    • (2006) Proc. ICASSP , pp. 1133-1136
    • Raut, C.K.1    Nishimoto, T.2    Sagayama, S.3
  • 22
    • 0028420014 scopus 로고
    • Integrated models of signal and background with application to speaker identification in noise
    • Apr.
    • R. C. Rose, E. M. Hofstetter, and D. A. Reynolds, "Integrated models of signal and background with application to speaker identification in noise," IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 245-257, Apr. 1994.
    • (1994) IEEE Trans. Speech Audio Process , vol.2 , Issue.2 , pp. 245-257
    • Rose, R.C.1    Hofstetter, E.M.2    Reynolds, D.A.3
  • 23
    • 0032048385 scopus 로고    scopus 로고
    • Speech recognition in noisy environments using first-order vector Taylor series
    • PII S0167639397000617
    • D. Y. Kim, C. K. Un, and N. S. Kim, "Speech recognition in noisy environments using first order vector Taylor series," Speech Commun., vol. 24, pp. 39-49, 1998. (Pubitemid 128435865)
    • (1998) Speech Communication , vol.24 , Issue.1 , pp. 39-49
    • Kim, D.Y.1    Un, C.K.2    Kim, N.S.3
  • 24
    • 85015389627 scopus 로고    scopus 로고
    • A comparative study of noise estimation algorithms for non-linear compensation in robust speech recognition
    • submitted for publication
    • Y. Zhao and B. H. Juang, "A comparative study of noise estimation algorithms for non-linear compensation in robust speech recognition," Speech Commun., submitted for publication.
    • Speech Commun.
    • Zhao, Y.1    Juang, B.H.2
  • 25
    • 84893675167 scopus 로고    scopus 로고
    • Model-based approaches to handling uncertainty
    • D. Kolossa and R. Haeb-Umbach, Eds. New York: Springer-Verlag
    • M. J. F. Gales, "Model-based approaches to handling uncertainty," in Robust Speech Recognition of Uncertain or Missing Data, D. Kolossa and R. Haeb-Umbach, Eds. New York: Springer-Verlag, 2011.
    • (2011) Robust Speech Recognition of Uncertain or Missing Data
    • Gales, M.J.F.1
  • 26
    • 85079234583 scopus 로고
    • On the limitations of cepstral features in noise
    • Adelaide, Australia
    • J. P. Openshaw and J. S. Masan, "On the limitations of cepstral features in noise," in Proc. ICASSP, Adelaide, Australia, 1994, pp. 49-52.
    • (1994) Proc. ICASSP , pp. 49-52
    • Openshaw, J.P.1    Masan, J.S.2
  • 28
    • 33646776744 scopus 로고    scopus 로고
    • Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement
    • Philadelphia, PA
    • V. Stouten, H. Van hamme, and P. Wambacq, "Effect of phase-sensitive environment model and higher order VTS on noisy speech feature enhancement," in Proc. ICASSP, Philadelphia, PA, 2005, pp. 433-436.
    • (2005) Proc. ICASSP , pp. 433-436
    • Stouten, V.1    Van Hamme, H.2    Wambacq, P.3
  • 29
    • 70349203876 scopus 로고    scopus 로고
    • Joint uncertainty decoding with the second order approximation for noise robust speech recognition
    • Taipei, Taiwan
    • H. Xu and K. K. Chin, "Joint uncertainty decoding with the second order approximation for noise robust speech recognition," in Proc. ICASSP, Taipei, Taiwan, 2009, pp. 3841-3844.
    • (2009) Proc. ICASSP , pp. 3841-3844
    • Xu, H.1    Chin, K.K.2
  • 30
    • 80052076093 scopus 로고    scopus 로고
    • A feature compensation approach using high-order vector taylor series approximation of an explicit distortion model for noisy speech recognition
    • Nov.
    • J. Du and Q. Huo, "A feature compensation approach using high-order vector Taylor series approximation of an explicit distortion model for noisy speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 8, pp. 2285-2293, Nov. 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.8 , pp. 2285-2293
    • Du, J.1    Huo, Q.2
  • 31
    • 21244437999 scopus 로고    scopus 로고
    • Unscented filtering and nonlinear estimation
    • DOI 10.1109/JPROC.2003.823141, Sequential State Estimation: From Kalman Filters to Particles Filters
    • S. J. Julier and J. K. Uhlmann, "Unscented filtering and nonlinear estimation," Proc. IEEE, vol. 92, no. 3, pp. 401-422, Mar. 2005. (Pubitemid 40890750)
    • (2004) Proceedings of the IEEE , vol.92 , Issue.3 , pp. 401-422
    • Julier, S.J.1    Uhlmann, J.K.2
  • 32
    • 84865262272 scopus 로고    scopus 로고
    • HMM compensation for noisy speech recognition based on cepstral parameter generation
    • Rhodes, Greece
    • T. Kobayashi, T. Masuko, and K. Tokuda, "HMM compensation for noisy speech recognition based on cepstral parameter generation," in Proc. Eurospeech, Rhodes, Greece, 1997, pp. 1583-1586.
    • (1997) Proc. Eurospeech , pp. 1583-1586
    • Kobayashi, T.1    Masuko, T.2    Tokuda, K.3
  • 33
    • 79951668781 scopus 로고    scopus 로고
    • Extended VTS for noise-robust speech recognition
    • May
    • R. C. van Dalen and M. J. F. Gales, "Extended VTS for noise-robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 4, pp. 733-743, May 2011.
    • (2011) IEEE Trans. Audio, Speech, Lang. Process , vol.19 , Issue.4 , pp. 733-743
    • Van Dalen, R.C.1    Gales, M.J.F.2
  • 36
    • 0032638856 scopus 로고    scopus 로고
    • Semi-tied covariance matrices for hidden Markov models
    • May
    • M. J. F. Gales, "Semi-tied covariance matrices for hidden Markov models," IEEE Trans. Speech Audio Process., vol. 7, no. 3, pp. 272-281, May 1999.
    • (1999) IEEE Trans. Speech Audio Process , vol.7 , Issue.3 , pp. 272-281
    • Gales, M.J.F.1
  • 37
    • 0742272654 scopus 로고    scopus 로고
    • Modeling inverse covariance matrices by basis expansion
    • Jan.
    • P. A. Olsen and R. A. Gopinath, "Modeling inverse covariance matrices by basis expansion," IEEE Trans. Speech Audio Process., vol. 12, no. 1, pp. 37-46, Jan. 2004.
    • (2004) IEEE Trans. Speech Audio Process , vol.12 , Issue.1 , pp. 37-46
    • Olsen, P.A.1    Gopinath, R.A.2
  • 38
    • 85009289957 scopus 로고    scopus 로고
    • Modeling with a subspace constraint on inverse covariance matrices
    • Denver, CO
    • S. Axelrod, R. Gopinath, and P. Olsen, "Modeling with a subspace constraint on inverse covariance matrices," in Proc. ICSLP, Denver, CO, 2002, pp. 2177-2180.
    • (2002) Proc. ICSLP , pp. 2177-2180
    • Axelrod, S.1    Gopinath, R.2    Olsen, P.3
  • 39
    • 0033884177 scopus 로고    scopus 로고
    • Maximum likelihood and minimum classification error factor analysis for automatic speech recognition
    • DOI 10.1109/89.824696
    • L. K. Saul and M. G. Rahim, "Maximum likelihood and minimum classification error factor analysis for automatic speech recognition," IEEE Trans. Speech Audio Process., vol. 8, no. 2, pp. 115-125, Mar. 2000. (Pubitemid 30578364)
    • (2000) IEEE Transactions on Speech and Audio Processing , vol.8 , Issue.2 , pp. 115-125
    • Saul, L.K.1    Rahim, M.G.2
  • 40
    • 2442457791 scopus 로고    scopus 로고
    • Mixtures of inverse covariances
    • May
    • V. Vanhoucke and A. Sankar, "Mixtures of inverse covariances," IEEE Trans. Speech Audio Process., vol. 12, no. 3, pp. 250-264, May 2004.
    • (2004) IEEE Trans. Speech Audio Process , vol.12 , Issue.3 , pp. 250-264
    • Vanhoucke, V.1    Sankar, A.2
  • 42
    • 0038669544 scopus 로고    scopus 로고
    • The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions
    • Paris, France
    • H. G. Hirsch and D. Pearce, "The Aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions," in Proc. ISCA ITRW ASR, Paris, France, 2000, pp. 181-188.
    • (2000) Proc. ISCA ITRW ASR , pp. 181-188
    • Hirsch, H.G.1    Pearce, D.2
  • 43
    • 0001927585 scopus 로고
    • On information and sufficiency
    • Mar.
    • S. Kullback and R. A. Leibler, "On information and sufficiency," Ann. Math. Stat., vol. 21, no. 1, pp. 79-86, Mar. 1951.
    • (1951) Ann. Math. Stat. , vol.21 , Issue.1 , pp. 79-86
    • Kullback, S.1    Leibler, R.A.2
  • 44
    • 0000792515 scopus 로고
    • Multidimensional stochastic approximation methods
    • J. R. Blum, "Multidimensional stochastic approximation methods," Ann. Math. Stat., vol. 25, no. 4, pp. 737-744, 1954.
    • (1954) Ann. Math. Stat. , vol.25 , Issue.4 , pp. 737-744
    • Blum, J.R.1
  • 46
    • 70349206345 scopus 로고    scopus 로고
    • Bayesian feature enhancement using a mixture of unscented transformation for uncertainty decoding of noisy speech
    • Taipei, Taiwan
    • Y. Shinohara and M. Akamine, "Bayesian feature enhancement using a mixture of unscented transformation for uncertainty decoding of noisy speech," in Proc. ICASSP, Taipei, Taiwan, 2009, pp. 4569-4572.
    • (2009) Proc. ICASSP , pp. 4569-4572
    • Shinohara, Y.1    Akamine, M.2
  • 47
    • 70450167541 scopus 로고    scopus 로고
    • Comparison of estimation techniques in joint uncertainty decoding for noise robust speech recognition
    • Brighton, U.K.
    • H. Xu and K. K. Chin, "Comparison of estimation techniques in joint uncertainty decoding for noise robust speech recognition," in Proc. Interspeech, Brighton, U.K., 2009, pp. 2403-2406.
    • (2009) Proc. Interspeech , pp. 2403-2406
    • Xu, H.1    Chin, K.K.2
  • 48
    • 60849117157 scopus 로고    scopus 로고
    • Static and dynamic spectral features: Their noise robustness and optimal weights for ASR
    • Mar.
    • C. Yang, F. K. Soong, and T. Lee, "Static and dynamic spectral features: Their noise robustness and optimal weights for ASR," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 3, pp. 1087-1097, Mar. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process , vol.15 , Issue.3 , pp. 1087-1097
    • Yang, C.1    Soong, F.K.2    Lee, T.3
  • 49
    • 44849125798 scopus 로고    scopus 로고
    • High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector taylor series
    • Kyoto, Japan
    • J. Li, L. Deng, D. Yu, Y. Gong, and A. Acero, "High-performance HMM adaptation with joint compensation of additive and convolutive distortions via vector Taylor series," in Proc. ASRU, Kyoto, Japan, 2007.
    • (2007) Proc. ASRU
    • Li, J.1    Deng, L.2    Yu, D.3    Gong, Y.4    Acero, A.5
  • 51
    • 4544293653 scopus 로고    scopus 로고
    • Nonlinear noise compensation in feature domain for speech recognition with numerical methods
    • Montreal, QC, Canada
    • H. Jiang and Q. Wang, "Nonlinear noise compensation in feature domain for speech recognition with numerical methods," in Proc. ICASSP, Montreal, QC, Canada, 2004, pp. 985-988.
    • (2004) Proc. ICASSP , pp. 985-988
    • Jiang, H.1    Wang, Q.2
  • 52
    • 79959834612 scopus 로고    scopus 로고
    • Signal interaction and the devil function
    • Makuhari, Japan
    • J. R. Hershey, P. A. Olsen, and S. J. Rennie, "Signal interaction and the devil function," in Proc. Interspeech, Makuhari, Japan, 2010.
    • (2010) Proc. Interspeech
    • Hershey, J.R.1    Olsen, P.A.2    Rennie, S.J.3
  • 53
    • 79959842341 scopus 로고    scopus 로고
    • Asymptotically exact noise-corrupted speech likelihoods
    • Makuhari, Japan
    • R. C. van Dalen and M. J. F. Gales, "Asymptotically exact noise-corrupted speech likelihoods," in Proc. Interspeech, Makuhari, Japan, 2010, pp. 709-712.
    • (2010) Proc. Interspeech , pp. 709-712
    • Van Dalen, R.C.1    Gales, M.J.F.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.