메뉴 건너뛰기




Volumn 19, Issue 5, 2011, Pages 1434-1443

A novel mask estimation method employing posterior-based representative mean estimate for missing-feature speech recognition

Author keywords

Background noise; mask estimation; missing feature; posterior based representative mean (PRM) estimate; robust speech recognition

Indexed keywords

BACKGROUND NOISE; COMPONENT CLASSIFIERS; ESTIMATION METHODS; FEATURE COMPENSATION; IN-VEHICLE; MISSING-FEATURE; MODEL COMBINATION; NOISE CONDITIONS; NOISE SIGNALS; PERFORMANCE EVALUATION; POSTERIOR PROBABILITY; POSTERIOR-BASED REPRESENTATIVE MEAN (PRM) ESTIMATE; RECOGNITION PERFORMANCE; ROBUST SPEECH RECOGNITION; SPECTRAL COMPONENTS; SPECTRAL SUBTRACTIONS; SPEECH MODELS; SPEECH RECOGNITION PERFORMANCE; SPEECH UTTERANCE; WEIGHTED SUM; WORD ERROR RATE;

EID: 79956289561     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2010.2091633     Document Type: Article
Times cited : (12)

References (43)
  • 1
    • 77949404049 scopus 로고    scopus 로고
    • Mask estimation employing posterior- based representative mean for missing-feature speech recognition with time-varying background noise
    • Merano, Italy, Dec.
    • W. Kim and J. H. L. Hansen, "Mask estimation employing posterior- based representative mean for missing-feature speech recognition with time-varying background noise," in Proc. IEEE ASRU'09, Merano, Italy, Dec. 2009, pp. 194-198.
    • (2009) Proc. IEEE ASRU'09 , pp. 194-198
    • Kim, W.1    Hansen, J.H.L.2
  • 2
    • 56949089751 scopus 로고    scopus 로고
    • Feature compensation in the cepstral domain employing model combination
    • W. Kim and J. H. L. Hansen, "Feature compensation in the cepstral domain employing model combination," Speech Commun., vol. 51, no. 2, pp. 83-96, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.2 , pp. 83-96
    • Kim, W.1    Hansen, J.H.L.2
  • 3
    • 0004319968 scopus 로고
    • The NOISEX-92 study on the effect of additive noise on automatic speech recognition
    • Malvern, U.K., (Available from NOISEX-92 CD-ROMS)
    • A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, "The NOISEX-92 study on the effect of additive noise on automatic speech recognition," in Tech. Rep., Speech Res. Unit, Defense Res. Agency, Malvern, U.K., 1992, (Available from NOISEX-92 CD-ROMS).
    • (1992) Tech. Rep., Speech Res. Unit, Defense Res. Agency
    • Varga, A.P.1    Steeneken, M.H.J.2    Tomlinson, M.3    Jones, D.4
  • 4
    • 85135275880 scopus 로고    scopus 로고
    • The Speechdat-Car multilingual speech databases for in-car applications: Some first validation results
    • Sep.
    • H. Heuvel, J. Boudy, R. Comeyne, S. Euler,A.Moreno, and G. Richard, "The Speechdat-Car multilingual speech databases for in-car applications: Some first validation results," in Proc. Eurospeech'99, Sep. 1999.
    • (1999) Proc. Eurospeech'99
    • Heuvel, H.1    Boudy, J.2    Comeyne, R.3    Euler, S.4    Moreno, A.5    Richard, G.6
  • 5
    • 0141477988 scopus 로고    scopus 로고
    • Speech in noisy environments (SPINE) adds news dimension to speech recognition R&D
    • San Diego, CA, Mar.
    • T. Crystal, A. Schmidt-Nelson, and E. Marsh, "Speech in noisy environments (SPINE) adds news dimension to speech recognition R&D," in Proc. HLT Conf., San Diego, CA, Mar. 2002.
    • (2002) Proc. HLT Conf.
    • Crystal, T.1    Schmidt-Nelson, A.2    Marsh, E.3
  • 9
    • 44849128293 scopus 로고    scopus 로고
    • Speechfind for CDP: Advances in spoken document retrieval for the U.S. collaborative digitization program
    • W. Kim and J. H. L. Hansen, "Speechfind for CDP: Advances in spoken document retrieval for the U.S. collaborative digitization program," in Proc. IEEE ASRU2007, 2007, pp. 687-692.
    • (2007) Proc. IEEE ASRU2007 , pp. 687-692
    • Kim, W.1    Hansen, J.H.L.2
  • 10
    • 0030283741 scopus 로고    scopus 로고
    • Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition
    • J. H. L. Hansen, "Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition," Speech Commun., vol. 20, no. 2, pp. 151-170, 1996.
    • (1996) Speech Commun. , vol.20 , Issue.2 , pp. 151-170
    • Hansen, J.H.L.1
  • 11
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • Apr.
    • S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
    • (1979) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll, S.F.1
  • 12
    • 0021645331 scopus 로고
    • Speech enhancement using minimum mean square error short time spectral amplitude estimator
    • Dec.
    • Y. Ephraim and D. Malah, "Speech enhancement using minimum mean square error short time spectral amplitude estimator," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-32, no. 6, pp. 1109-1121, Dec. 1984.
    • (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-32 , Issue.6 , pp. 1109-1121
    • Ephraim, Y.1    Malah, D.2
  • 13
    • 0026135903 scopus 로고
    • Constrained iterative speech enhancement with application to speech recognition
    • Apr.
    • J. H. L. Hansen and M. Clements, "Constrained iterative speech enhancement with application to speech recognition," IEEE Trans. Signal Process., vol. 39, no. 4, pp. 795-805, Apr. 1991.
    • (1991) IEEE Trans. Signal Process. , vol.39 , Issue.4 , pp. 795-805
    • Hansen, J.H.L.1    Clements, M.2
  • 14
    • 0028516405 scopus 로고
    • Morphological constrained enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect
    • Oct.
    • J. H. L. Hansen, "Morphological constrained enhancement with adaptive cepstral compensation (MCE-ACC) for speech recognition in noise and Lombard effect," IEEE Trans. Speech Audio Process., vol. 2, no. 4, pp. 598-614, Oct. 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.4 , pp. 598-614
    • Hansen, J.H.L.1
  • 15
    • 0032116601 scopus 로고    scopus 로고
    • Data-driven environmental compensation for speech recognition: A unified approach
    • PII S0167639398000259
    • P. J. Moreno, B. Raj, and R. M. Stern, "Data-driven environmental compensation for speech recognition: A unified approach," Speech Commun., vol. 24, no. 4, pp. 267-285, 1998. (Pubitemid 128424259)
    • (1998) Speech Communication , vol.24 , Issue.4 , pp. 267-285
    • Moreno, P.J.1    Raj, B.2    Stern, R.M.3
  • 16
    • 0036642712 scopus 로고    scopus 로고
    • Feature domain compensation of nonstationary noise for robust speech recognition
    • DOI 10.1016/S0167-6393(01)00013-9, PII S0167639301000139
    • N. S. Kim, "Feature domain compensation of nonstationary noise for robust speech recognition," Speech Commun., vol. 37, pp. 231-248, 2002. (Pubitemid 34524841)
    • (2002) Speech Communication , vol.37 , Issue.3-4 , pp. 231-248
    • Kim, N.S.1
  • 17
    • 4544288024 scopus 로고    scopus 로고
    • Joint removal of additive and convolutional noise with model-based feature enhancement
    • V. Stouten, H.Van hamme, and P.Wambacq, "Joint removal of additive and convolutional noise with model-based feature enhancement," in Proc. ICASSP'04, 2004, pp. 949-952.
    • (2004) Proc. ICASSP'04 , pp. 949-952
    • Stouten, V.1    Vanhamme, H.2    Wambacq, P.3
  • 18
    • 38749150838 scopus 로고    scopus 로고
    • HMM-based feature compensation methods: An evaluation using the Aurora2
    • A. Sasou, T. Tanaka, S. Nakamura, and F. Asano, "HMM-based feature compensation methods: An evaluation using the Aurora2," in Proc. ICSLP'04, 2004, pp. 121-124.
    • (2004) Proc. ICSLP'04 , pp. 121-124
    • Sasou, A.1    Tanaka, T.2    Nakamura, S.3    Asano, F.4
  • 19
    • 0028419019 scopus 로고
    • Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains
    • Apr.
    • J. L. Gauvain and C. H. Lee, "Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains," IEEE Trans. Speech Audio Process., vol. 2, no. 2, pp. 291-298, Apr. 1994.
    • (1994) IEEE Trans. Speech Audio Process. , vol.2 , Issue.2 , pp. 291-298
    • Gauvain, J.L.1    Lee, C.H.2
  • 20
    • 0029288633 scopus 로고
    • Maximum likelihood linear regression for speaker adaptation of continuous density HMMs
    • C. J. Leggetter and P. C. Woodland, "Maximum likelihood linear regression for speaker adaptation of continuous density HMMs," Comput. Speech Lang., vol. 9, pp. 171-185, 1995.
    • (1995) Comput. Speech Lang. , vol.9 , pp. 171-185
    • Leggetter, C.J.1    Woodland, P.C.2
  • 21
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • PII S1063667696067120
    • M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 352-359, Sep. 1996. (Pubitemid 126753023)
    • (1996) IEEE Transactions on Speech and Audio Processing , vol.4 , Issue.5 , pp. 352-359
    • Gales, M.J.F.1    Young, S.J.2
  • 22
    • 85009106519 scopus 로고    scopus 로고
    • Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise
    • J. Barker,M. Cooke, and P. Green, "Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise," in Proc. Eurospeech'01, 2001, pp. 213-216.
    • (2001) Proc. Eurospeech'01 , pp. 213-216
    • Barker, J.1    Cooke, M.2    Green, P.3
  • 23
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • DOI 10.1016/S0167-6393(00)00034-0, PII S0167639300000340
    • M. Cook, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data," Speech Commun., vol. 34, no. 3, pp. 267-285, 2001. (Pubitemid 32284867)
    • (2001) Speech Communication , vol.34 , Issue.3 , pp. 267-285
    • Cooke, M.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 24
    • 2942539074 scopus 로고    scopus 로고
    • Techniques for handling convolutional distortion with missing data automatic speech recognition
    • K. J. Palomaki, G. J. Brown, and J. P. Barker, "Techniques for handling convolutional distortion with missing data automatic speech recognition," Speech Commun., vol. 43, pp. 123-142, 2004.
    • (2004) Speech Commun. , vol.43 , pp. 123-142
    • Palomaki, K.J.1    Brown, G.J.2    Barker, J.P.3
  • 25
    • 4644336054 scopus 로고    scopus 로고
    • Reconstruction of missing features for robust speech recognition
    • B. Raj, M. L. Seltzer, and R. M. Stern, "Reconstruction of missing features for robust speech recognition," Speech Commun., vol. 43, no. 4, pp. 275-296, 2004.
    • (2004) Speech Commun. , vol.43 , Issue.4 , pp. 275-296
    • Raj, B.1    Seltzer, M.L.2    Stern, R.M.3
  • 26
    • 4544315110 scopus 로고    scopus 로고
    • Robust speech recognition using cepstral domain missing data techniques and noisy masks
    • May
    • H. Van Hamme, "Robust speech recognition using cepstral domain missing data techniques and noisy masks," in Proc. ICASSP'04, May 2004, pp. 213-216.
    • (2004) Proc. ICASSP'04 , pp. 213-216
    • Van Hamme, H.1
  • 27
    • 85032752225 scopus 로고    scopus 로고
    • Missing-feature approaches in speech recognition
    • DOI 10.1109/MSP.2005.1511828
    • B. Raj and R. M. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol. 22, no. 5, pp. 101-116, Sep. 2005. (Pubitemid 41488524)
    • (2005) IEEE Signal Processing Magazine , vol.22 , Issue.5 , pp. 101-116
    • Raj, B.1    Stern, R.M.2
  • 28
    • 44849096627 scopus 로고    scopus 로고
    • Missing-feature reconstruction for bandlimited speech recognition in spoken document retrieval
    • Sep.
    • W. Kim and J. H. L. Hansen, "Missing-feature reconstruction for bandlimited speech recognition in spoken document retrieval," in Proc. Interspeech' 06, Sep. 2006, pp. 2306-2309.
    • (2006) Proc. Interspeech' , vol.6 , pp. 2306-2309
    • Kim, W.1    Hansen, J.H.L.2
  • 29
    • 68549126848 scopus 로고    scopus 로고
    • Time-frequency correlation based missing-feature reconstruction for robust speech recognition in band-restricted conditions
    • Sep.
    • W. Kim and J. H. L. Hansen, "Time-frequency correlation based missing-feature reconstruction for robust speech recognition in band-restricted conditions," IEEE Trans. Audio, Speech, Lang. Process., vol. 17, no. 7, pp. 1292-1304, Sep. 2009.
    • (2009) IEEE Trans. Audio, Speech, Lang. Process. , vol.17 , Issue.7 , pp. 1292-1304
    • Kim, W.1    Hansen, L.J.H.2
  • 30
    • 0002603206 scopus 로고    scopus 로고
    • Missing data theory, spectral subtraction and signal-to-Noise estimation for robust ASR: An integrated study
    • Sep.
    • A. Vizinho, P. Green, M. M. Cooke, and L. Josifovski, "Missing data theory, spectral subtraction and signal-to-Noise estimation for robust ASR: An integrated study," in Proc. Eurospeech'99, Sep. 1999, pp. 2407-2410.
    • (1999) Proc. Eurospeech'99 , pp. 2407-2410
    • Vizinho, A.1    Green, P.2    Cooke, M.M.3    Josifovski, L.4
  • 31
    • 4644317224 scopus 로고    scopus 로고
    • A Bayesian classifier for spectrographic mask estimation for missing-feature speech recognition
    • M. L. Seltzer, B. Raj, and R. M. Stern, "A Bayesian classifier for spectrographic mask estimation for missing-feature speech recognition," Speech Commun., vol. 43, no. 4, pp. 379-393, 2004.
    • (2004) Speech Commun. , vol.43 , Issue.4 , pp. 379-393
    • Seltzer, M.L.1    Raj, B.2    Stern, R.M.3
  • 32
    • 33745200501 scopus 로고    scopus 로고
    • Environment-independent mask estimation for missing-feature reconstruction
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • W. Kim, R. M. Stern, and H. Ko, "Environment-independent mask estimation for missing-feature reconstruction," in Proc. Interspeech'05, Sep. 2005, pp. 2637-2640. (Pubitemid 43908637)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 2637-2640
    • Kim, W.1    Stern, R.M.2    Ko, H.3
  • 33
    • 33947703708 scopus 로고    scopus 로고
    • Band-independent mask estimation for missing-feature reconstruction in the presence of unknown background noise
    • May
    • W. Kim and R. M. Stern, "Band-independent mask estimation for missing-feature reconstruction in the presence of unknown background noise," in Proc. ICASSP'06, May 2006, pp. 305-308.
    • (2006) Proc. ICASSP'06 , pp. 305-308
    • Kim, W.1    Stern, R.M.2
  • 34
    • 85009165807 scopus 로고    scopus 로고
    • High-likelihood model based on reliability statistics for robust combination of features: Application to noisy speech recognition
    • P. Jancovic, M.Kokuer, and F. Murtagh, "High-likelihood model based on reliability statistics for robust combination of features: Application to noisy speech recognition," in Proc. Eurospeech'03, 2003, pp. 2161-2164.
    • (2003) Proc. Eurospeech'03 , pp. 2161-2164
    • Jancovic, P.1    Kokuer, M.2    Murtagh, F.3
  • 35
    • 33646780537 scopus 로고    scopus 로고
    • Mask estimation based on sound localisation for missing data speech recognition
    • S. Harding, J. Barker, and G. J. Brown, "Mask estimation based on sound localisation for missing data speech recognition," in Proc. ICASSP'05, 2005, pp. 537-540.
    • (2005) Proc. ICASSP'05 , pp. 537-540
    • Harding, S.1    Barker, J.2    Brown, G.J.3
  • 36
    • 33750311718 scopus 로고    scopus 로고
    • Binary and ratio time-frequency masks for robust speech recognition
    • DOI 10.1016/j.specom.2006.09.003, PII S0167639306001129
    • S. Srinivasan, N. Roman, and D. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, no. 11, pp. 1486-1501, 2006. (Pubitemid 44634774)
    • (2006) Speech Communication , vol.48 , Issue.11 , pp. 1486-1501
    • Srinivasan, S.1    Roman, N.2    Wang, D.3
  • 37
    • 55049096120 scopus 로고    scopus 로고
    • Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero crossings
    • H. Park and R. M. Stern, "Spatial separation of speech signals using amplitude estimation based on interaural comparisons of zero crossings," Speech Commun., vol. 51, no. 1, pp. 15-25, 2009.
    • (2009) Speech Commun. , vol.51 , Issue.1 , pp. 15-25
    • Park, H.1    Stern, R.M.2
  • 38
    • 0038669544 scopus 로고    scopus 로고
    • The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions
    • H. G. Hirsch and D. Pearce, "The AURORA experimental framework for the performance evaluations of speech recognition systems under noisy conditions," in Proc. ISCA ITRW ASR2000, 2000.
    • (2000) Proc. ISCA ITRW ASR2000
    • Hirsch, H.G.1    Pearce, D.2
  • 39
    • 77953863507 scopus 로고    scopus 로고
    • ETSI ES 201 108 v1.1.2 (2000-04)
    • ETSI Standard Document, ETSI ES 201 108 v1.1.2 (2000-04), 2000.
    • (2000) ETSI Standard Document
  • 40
    • 79956265286 scopus 로고    scopus 로고
    • [Online], Available
    • [Online]. Available: http://spib.rice.edu/spib/select-noise.html
  • 41
    • 0003089362 scopus 로고
    • Spectral subtraction based on minimum statistics
    • R. Martin, "Spectral subtraction based on minimum statistics," in Proc. EUSIPCO-94, 1994, pp. 1182-1185.
    • (1994) Proc. EUSIPCO-94 , pp. 1182-1185
    • Martin, R.1
  • 42
    • 79956264228 scopus 로고    scopus 로고
    • ETSI ES 202 050 v1.1.1 (2002-10)
    • ETSI Standard Document, ETSI ES 202 050 v1.1.1 (2002-10), 2002.
    • (2002) ETSI Standard Document
  • 43
    • 79956267259 scopus 로고    scopus 로고
    • NIST SPeech Quality Assurance (SPQA) Package Version 2.3 [Online], Available
    • NIST SPeech Quality Assurance (SPQA) Package Version 2.3 [Online]. Available: http://www.nist.gov/speech


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.