메뉴 건너뛰기




Volumn 20, Issue 1, 2011, Pages 74-84

ASR systems in noisy environment: Analysis and solutions for increasing noise robustness

Author keywords

Feature extraction; Front end; Noisy speech; Parameterization; Robust ASR; Robust speech recognition; Spectral subtraction; Voice activity detection

Indexed keywords


EID: 79955978656     PISSN: 12102512     EISSN: None     Source Type: Journal    
DOI: None     Document Type: Article
Times cited : (15)

References (36)
  • 1
    • 85079234583 scopus 로고
    • On the limitations of Cepstral features in noise
    • OPENSHAW, J. P., MASON, J. S. On the limitations of Cepstral features in noise. In Proc. ICASSP, 1994, vol. 2, p. 49-52.
    • (1994) Proc. ICASSP , vol.2 , pp. 49-52
    • Openshaw, J.P.1    Mason, J.S.2
  • 2
    • 85009065130 scopus 로고    scopus 로고
    • A comparison of LPC and FFT-based acoustic features for noise robust ASR
    • WET, F. de, CRANEN, B., VETH, J. de, BOVES, L. A comparison of LPC and FFT-based acoustic features for noise robust ASR. In Eurospeech 2001, p. 865-868.
    • (2001) Eurospeech , pp. 865-868
    • de Wet, F.1    Cranen, B.2    de Veth, J.3    Boves, L.4
  • 3
    • 33745208682 scopus 로고    scopus 로고
    • Noise robust front-end for ASR using spectral subtraction, spectral flooring and cumulative distribution mapping
    • SST'04. Sydney (Australia), Dec
    • CHOI, E. Noise robust front-end for ASR using spectral subtraction, spectral flooring and cumulative distribution mapping. In Proc. 10th Australian Int. Conf. on Speech Science and Technology, SST'04. Sydney (Australia), Dec. 2004, p. 451-456.
    • (2004) Proc. 10th Australian Int. Conf. on Speech Science and Technology , pp. 451-456
    • Choi, E.1
  • 6
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • Aug
    • DAVIS, S., MERMELSTEIN, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions on Acoustics, Speech and Signal Processing, Aug 1980, vol. 28, p. 357-366.
    • (1980) IEEE Transactions on Acoustics, Speech and Signal Processing , vol.28 , pp. 357-366
    • Davis, S.1    Mermelstein, P.2
  • 7
    • 0025041264 scopus 로고
    • Perceptual linear predictive (PLP) analysis of speech
    • April
    • HERMANSKY, H. Perceptual linear predictive (PLP) analysis of speech. In Proc. JASA, April 1990, vol. 87, no. 4.
    • (1990) In Proc. JASA , vol.87 , Issue.4
    • Hermansky, H.1
  • 8
    • 0024679985 scopus 로고
    • Quality improvement of LPCprocessed noisy speech by using spectral subtraction
    • June
    • KANG, G. S., FRANSEN, L. J. Quality improvement of LPCprocessed noisy speech by using spectral subtraction. IEEE Trans. on ASSP, June 1989, vol. 37, no. 6, p. 939-942.
    • (1989) IEEE Trans. on ASSP , vol.37 , Issue.6 , pp. 939-942
    • Kang, G.S.1    Fransen, L.J.2
  • 9
    • 65549171142 scopus 로고    scopus 로고
    • Likelihoodmaximizing-based multi-band spectral subtraction for robust speech recognition
    • Article ID 878105
    • BABA ALI, B., SAMETI, H., SAFAYANI, M. Likelihoodmaximizing-based multi-band spectral subtraction for robust speech recognition. EURASIP Journal on Advances in Signal Processing, 2009. Article ID 878105, 15 p.
    • (2009) EURASIP Journal on Advances in Signal Processing , pp. 15
    • Baba Ali, B.1    Sameti, H.2    Safayani, M.3
  • 11
    • 0021645331 scopus 로고
    • Speech enhancement using a minimum mean square error short time spectral amplitude estimator
    • Dec
    • EPHRAIM, Y., MALAH, D. Speech enhancement using a minimum mean square error short time spectral amplitude estimator. IEEE Trans. on ASSP, Dec. 1984, vol. 32, no. 6, p. 1109-1121.
    • (1984) IEEE Trans. on ASSP , vol.32 , Issue.6 , pp. 1109-1121
    • Ephraim, Y.1    Malah, D.2
  • 12
    • 85009154262 scopus 로고    scopus 로고
    • Modeling the mixtures of known noise and unknown unexpected noise for robust speech recognition
    • Aalborg (Denmark)
    • MING, J., JANCOVIC, P., HANNA, P., STEWART, D. Modeling the mixtures of known noise and unknown unexpected noise for robust speech recognition. In Proc. of Eurospeech'2001. Aalborg (Denmark), 2001, p. 579-582.
    • (2001) Proc. of Eurospeech'2001 , pp. 579-582
    • Ming, J.1    Jancovic, P.2    Hanna, P.3    Stewart, D.4
  • 13
    • 0025681008 scopus 로고
    • Hidden Markov model decomposition of speech and noise
    • VARGA, A. P., MOORE, R. E. Hidden Markov model decomposition of speech and noise. In Proc. ICASSP, 1990, p. 845-848.
    • (1990) Proc. ICASSP , pp. 845-848
    • Varga, A.P.1    Moore, R.E.2
  • 14
    • 84867213516 scopus 로고    scopus 로고
    • Eigen-MLLR environment/ speaker compensation for robust speech recognition
    • Brisbane (Australia), September
    • LIAO, Y. F., FANG, H. H., HSU, C. H. Eigen-MLLR environment/ speaker compensation for robust speech recognition. In Proc. Interspeech'08. Brisbane (Australia), September 2008, p. 1249-1252.
    • (2008) Proc. Interspeech'08 , pp. 1249-1252
    • Liao, Y.F.1    Fang, H.H.2    Hsu, C.H.3
  • 15
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models
    • April
    • LEGGETTER, C. J., WOODLAND, P. C. Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Computer Speech & Language, April 1995, vol. 9, no. 2, p. 171-185.
    • (1995) Computer Speech & Language , vol.9 , Issue.2 , pp. 171-185
    • Leggetter, C.J.1    Woodland, C.2
  • 16
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • GAUVAIN, J. L., LEE, C. H. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. on SAP, 1994, vol. 2, no. 2, p. 291-298.
    • (1994) IEEE Trans. on SAP , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.L.1    Lee, C.H.2
  • 17
    • 0032638028 scopus 로고    scopus 로고
    • Database and online adaptation for improved speech recognition in car environments
    • FISHER, A., STAHL, V. Database and online adaptation for improved speech recognition in car environments. In Proc. ICASSP'99, p. 445-448.
    • Proc. ICASSP'99 , pp. 445-448
    • Fisher, A.1    Stahl, V.2
  • 18
    • 84946730259 scopus 로고    scopus 로고
    • TRAP-TANDEM: Data-driven extraction of temporal features from speech
    • Martigny (Switzerland)
    • HERMANSKY, H. TRAP-TANDEM: Data-driven extraction of temporal features from speech. In Proc. of ASRU'03. Martigny (Switzerland), 2003, p. 255-260.
    • (2003) Proc. of ASRU'03 , pp. 255-260
    • Hermansky, H.1
  • 19
    • 0032638669 scopus 로고    scopus 로고
    • Fitting the Mel scale
    • UMESH, S., COHEN, L., NELSON, D. Fitting the Mel scale. In Proc. ICASSP, 1999, vol. 1, p. 217-220.
    • (1999) Proc. ICASSP , vol.1 , pp. 217-220
    • Umesh, S.1    Cohen, L.2    Nelson, D.3
  • 20
  • 21
    • 33646767079 scopus 로고    scopus 로고
    • Acoustic feature combination for robust speech recognition
    • Philadelphia (PA, USA), March
    • ZOLNAY, A., SCHLÜTER, R., NEY, H. Acoustic feature combination for robust speech recognition. In ICASSP'05. Philadelphia (PA, USA), March 2005, vol. 1, p. 457-460.
    • (2005) ICASSP'05 , vol.1 , pp. 457-460
    • Zolnay, A.1    Schlüter, R.2    Ney, H.3
  • 22
    • 34547539413 scopus 로고    scopus 로고
    • Gamma tone features and feature combination for large vocabulary speech recognition
    • Honolulu (HI, USA), April
    • SCHLÜTER, R., BEZRUKOV, I., WAGNER, H., NEY, H. Gamma tone features and feature combination for large vocabulary speech recognition. In ICASSP 2007. Honolulu (HI, USA), April 2007, p. 649-652.
    • (2007) ICASSP 2007 , pp. 649-652
    • Schlüter, R.1    Bezrukov, I.2    Wagner, H.3    Ney, H.4
  • 23
    • 85009115888 scopus 로고    scopus 로고
    • An auditory system-based feature for robust speech recognition
    • LI, Q., SOONG, F. K., SIOHAN, O. An auditory system-based feature for robust speech recognition. In Eurospeech 2001, p. 619-622.
    • (2001) Eurospeech , pp. 619-622
    • Li, Q.1    Soong, F.K.2    Siohan, O.3
  • 24
    • 84942591958 scopus 로고    scopus 로고
    • The influence of a filter shape in the telephone-based recognition module using PLP parameterization
    • Berlin, Springer-Verlag
    • PSUTKA, J., MÜLLER, L., PSUTKA, J. V. The influence of a filter shape in the telephone-based recognition module using PLP parameterization. In TSD 2001. Berlin, Springer-Verlag 2001, p. 222-228.
    • (2001) TSD 2001 , pp. 222-228
    • Psutka, J.1    Müller, L.2    Psutka, J.V.3
  • 25
    • 74549179210 scopus 로고    scopus 로고
    • Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task
    • PSUTKA, J., MÜLLER, L., PSUTKA, J. V. Comparison of MFCC and PLP parameterizations in the speaker independent continuous speech recognition task. In Eurospeech 2001, p. 1813-1816.
    • (2001) Eurospeech , pp. 1813-1816
    • Psutka, J.1    Müller, L.2    Psutka, J.V.3
  • 26
    • 84871076921 scopus 로고    scopus 로고
    • Additive noise and channel distortionrobust parameterization tool - performance evaluation on Aurora 2&3
    • FOUSEK, P., POLLÁK, P. Additive noise and channel distortionrobust parameterization tool - performance evaluation on Aurora 2&3. In Eurospeech 2003, p. 1785-1788.
    • (2003) Eurospeech , pp. 1785-1788
    • Fousek, P.1    Pollák, P.2
  • 28
    • 0035396555 scopus 로고    scopus 로고
    • Noise power spectral density estimation based on optimal smoothing and minimum statistics
    • July
    • MARTIN, R. Noise power spectral density estimation based on optimal smoothing and minimum statistics. IEEE Tran. on Speech and Audio Processing, July 2001, vol. 9, no. 5, p. 504 - 512.
    • (2001) IEEE Tran. on Speech and Audio Processing , vol.9 , Issue.5 , pp. 504-512
    • Martin, R.1
  • 31
    • 79955952939 scopus 로고    scopus 로고
    • HTK speech recognition toolkit. [Online]. Ver. 3.3. July. Available at
    • HTK speech recognition toolkit. [Online]. Ver. 3.3. July 2005. Available at: http://htk.eng.cam.ac.uk/
    • (2005)
  • 32
    • 79955976718 scopus 로고    scopus 로고
    • CtuCopy. [Online]. Ver. 3.0.11. Available at
    • CtuCopy. [Online]. Ver. 3.0.11. Available at: http://noel.feld.cvut.cz/speechlab/en/download/CtuCopy_3.0.11.tar.bz2
    • (2020)
  • 34
    • 77949361917 scopus 로고    scopus 로고
    • The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions
    • Paris (France), September
    • HIRSCH, H. G., PEARCE, D. The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions. In ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium. Paris (France), September 2000.
    • (2000) ISCA ITRW ASR2000 Automatic Speech Recognition: Challenges for the Next Millennium
    • Hirsch, H.G.1    Pearce, D.2
  • 35
    • 79955945159 scopus 로고    scopus 로고
    • Voice activity detection based on perceptual cepstral analysis
    • Prague: HUMUSOFT (in Czech)
    • RAJNOHA, J., POLLÁK, P. Voice activity detection based on perceptual cepstral analysis. In Technical Computing Prague 2008 [CD-ROM]. Prague: HUMUSOFT, 2008, vol. 1, p. 1-9. (in Czech).
    • (2008) Technical Computing Prague 2008 [CD-ROM] , vol.1 , pp. 1-9
    • Rajnoha, J.1    Pollák, P.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.