메뉴 건너뛰기




Volumn 21, Issue 10, 2013, Pages 1993-2005

A direct masking approach to robust ASR

Author keywords

Direct masking; ideal binary mask; robust automatic speech recognition

Indexed keywords

BINARY MASKS; CEPSTRAL FEATURES; IDEAL BINARY MASK; LARGE VOCABULARY; MISSING ENERGY; MISSING FEATURES; ROBUST AUTOMATIC SPEECH RECOGNITION; SPEECH SEGREGATION;

EID: 84881088302     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2013.2263802     Document Type: Article
Times cited : (35)

References (44)
  • 1
    • 0029288202 scopus 로고
    • Speech recognition in noisy environments: A survey
    • Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol. 16, pp. 261-291, 1995.
    • (1995) Speech Commun. , vol.16 , pp. 261-291
    • Gong, Y.1
  • 2
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • Sep.
    • M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 352-359, Sep. 1996.
    • (1996) IEEE Trans. Speech Audio Process. , vol.4 , Issue.5 , pp. 352-359
    • Gales, M.J.F.1    Young, S.J.2
  • 3
    • 0027166410 scopus 로고
    • Recognition of speech in additive and convolutional noise based on RASTA spectral processing
    • H. Hermansky, N. Morgan, and H.-G. Hirsch, "Recognition of speech in additive and convolutional noise based on RASTA spectral processing," in Proc. ICASSP, 1993, vol. 10, pp. 509-512.
    • (1993) Proc. ICASSP , vol.10 , pp. 509-512
    • Hermansky, H.1    Morgan, N.2    Hirsch, H.-G.3
  • 4
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • Apr.
    • S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
    • (1979) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll, S.F.1
  • 6
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary mask as the computational goal of auditory scene analysis
    • P. Divenyi, Ed. Norwell, MA, USA: Kluwer
    • D. L. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA, USA: Kluwer, 2005, pp. 181-197.
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.L.1
  • 7
    • 85032752225 scopus 로고    scopus 로고
    • Missing-feature approaches in speech recognition
    • Sep.
    • B. Raj and R. M. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol. 22, no. 2, pp. 101-116, Sep. 2005.
    • (2005) IEEE Signal Process. Mag. , vol.22 , Issue.2 , pp. 101-116
    • Raj, B.1    Stern, R.M.2
  • 8
    • 84877621926 scopus 로고    scopus 로고
    • The role of binary mask patterns in automatic speech recognition in background noise
    • A. Narayanan and D. L. Wang, "The role of binary mask patterns in automatic speech recognition in background noise," J. Acoust. Soc. Amer., vol. 133, no. 5, pp. 8083-8093, 2013.
    • (2013) J. Acoust. Soc. Amer. , vol.133 , Issue.5 , pp. 8083-8093
    • Narayanan, A.1    Wang, D.L.2
  • 9
    • 0021176902 scopus 로고
    • The GRASP sound separation system
    • M. Weintraub, "The GRASP sound separation system," in Proc. IEEE ICASSP, 1984, pp. 18A.6.1-18A.6.4.
    • (1984) Proc. IEEE ICASSP
    • Weintraub, M.1
  • 10
    • 0028531926 scopus 로고
    • Computational auditory scene analysis
    • G. J. Brown and M. Cooke, "Computational auditory scene analysis," Comput. Speech Lang., vol. 8, pp. 297-336, 1994.
    • (1994) Comput. Speech Lang. , vol.8 , pp. 297-336
    • Brown, G.J.1    Cooke, M.2
  • 11
    • 0032682770 scopus 로고    scopus 로고
    • Separation of speech from interfering sounds based on oscillatory correlation
    • May
    • D. L. Wang and G. J. Brown, "Separation of speech from interfering sounds based on oscillatory correlation," IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 684-697, May 1999.
    • (1999) IEEE Trans. Neural Netw. , vol.10 , Issue.3 , pp. 684-697
    • Wang, D.L.1    Brown, G.J.2
  • 12
    • 64649103540 scopus 로고    scopus 로고
    • Speech intelligibility in background noise with ideal binary timefre-quency masking
    • D. L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary timefre-quency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
    • (2009) J. Acoust. Soc. Amer. , vol.125 , pp. 2336-2347
    • Wang, D.L.1    Kjems, U.2    Pedersen, M.S.3    Boldt, J.B.4    Lunner, T.5
  • 13
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • M. Cooke, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data," Speech Commun., vol. 34, pp. 267-285, 2001.
    • (2001) Speech Commun. , vol.34 , pp. 267-285
    • Cooke, M.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 14
    • 4644336054 scopus 로고    scopus 로고
    • Reconstruction of missing features for robust speech recognition
    • B. Raj, M. L. Seltzer, and R. M. Stern, "Reconstruction of missing features for robust speech recognition," Speech Commun., vol. 43, pp. 275-296, 2004.
    • (2004) Speech Commun. , vol.43 , pp. 275-296
    • Raj, B.1    Seltzer, M.L.2    Stern, R.M.3
  • 15
    • 77957739976 scopus 로고    scopus 로고
    • Advances in missing feature techniques for robut large-vocabulary continuous speech recognition
    • Jan.
    • M. V. Segbroeck and H. V. Hamme, "Advances in missing feature techniques for robut large-vocabulary continuous speech recognition," IEEE Trans. Acoust., Speech, Signal Process., vol. 19, no. 1, pp. 123-137, Jan. 2011.
    • (2011) IEEE Trans. Acoust., Speech, Signal Process. , vol.19 , Issue.1 , pp. 123-137
    • Segbroeck, M.V.1    Hamme, H.V.2
  • 16
    • 80051633766 scopus 로고    scopus 로고
    • Investigations into the incorporation of the ideal binary mask in ASR
    • Prague, Czech Republic, May
    • W. Hartmann and E. Fosler-Lussier, "Investigations into the incorporation of the ideal binary mask in ASR," in Proc. IEEE ICASSP, Prague, Czech Republic, May 2011, pp. 4804-4807.
    • (2011) Proc. IEEE ICASSP , pp. 4804-4807
    • Hartmann, W.1    Fosler-Lussier, E.2
  • 18
    • 84869001637 scopus 로고
    • Handling missing data in speech recognition
    • M. Cooke, P. Green, and M. Crawford, "Handling missing data in speech recognition," in Proc. ICSLP, 1994.
    • (1994) Proc. ICSLP
    • Cooke, M.1    Green, P.2    Crawford, M.3
  • 19
    • 0000652102 scopus 로고
    • Some solutions to the missing feature problem in vision
    • S. J. Hanson, J. D. Cowen, and C. L. Giles, Eds. San Mateo, CA, USA: Morgan Kaufmann
    • S. Ahmad and V. Tresp, "Some solutions to the missing feature problem in vision," in Advances in Neural Information Processing Systems 5 (NIPS'92), S. J. Hanson, J. D. Cowen, and C. L. Giles, Eds. San Mateo, CA, USA: Morgan Kaufmann, 1993.
    • (1993) Advances in Neural Information Processing Systems 5 (NIPS'92)
    • Ahmad, S.1    Tresp, V.2
  • 20
    • 16344396527 scopus 로고    scopus 로고
    • Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise
    • R. Lippmann and B. A. Carlson, "Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise," in Proc. Eurospeech'97, 1997, pp. 37-40.
    • (1997) Proc. Eurospeech'97 , pp. 37-40
    • Lippmann, R.1    Carlson, B.A.2
  • 21
    • 0019053271 scopus 로고
    • Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences
    • Aug.
    • S. B. Davis and P. Mermelstein, "Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980.
    • (1980) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis, S.B.1    Mermelstein, P.2
  • 22
    • 33750311718 scopus 로고    scopus 로고
    • Binary and ratio time-frequency masks for robust speech recognition
    • S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, pp. 1486-1501, 2006.
    • (2006) Speech Commun. , vol.48 , pp. 1486-1501
    • Srinivasan, S.1    Roman, N.2    Wang, D.L.3
  • 23
    • 56249136428 scopus 로고    scopus 로고
    • Transforming binary uncertainties for robust speech recognition
    • Sep.
    • S. Srinivasan and D. L. Wang, "Transforming binary uncertainties for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2130-2140, Sep. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 2130-2140
    • Srinivasan, S.1    Wang, D.L.2
  • 24
    • 69249203845 scopus 로고    scopus 로고
    • Monaural speech separation based on MAXVQ and CASA for robust speech recognition
    • Jan.
    • P. Li, Y. Guan, S. Wang, B. Xu, and W. Liu, "Monaural speech separation based on MAXVQ and CASA for robust speech recognition," Comput. Speech Lang., vol. 24, no. 1, pp. 30-44, Jan. 2010.
    • (2010) Comput. Speech Lang. , vol.24 , Issue.1 , pp. 30-44
    • Li, P.1    Guan, Y.2    Wang, S.3    Xu, B.4    Liu, W.5
  • 25
    • 85009063707 scopus 로고    scopus 로고
    • Soft decisions in missing data techniques for robust automatic speech recognition
    • Beijing, China
    • J. Barker, L. Josifovski, M. Cooke, and P. Green, "Soft decisions in missing data techniques for robust automatic speech recognition," in Proc. Int. Conf. Spoken Lang., Beijing, China, 2000, pp. 373-376.
    • (2000) Proc. Int. Conf. Spoken Lang , pp. 373-376
    • Barker, J.1    Josifovski, L.2    Cooke, M.3    Green, P.4
  • 26
    • 84867596016 scopus 로고    scopus 로고
    • A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition
    • J. V. Hout and A. Alwan, "A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2012, pp. 4105-4108.
    • (2012) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 4105-4108
    • Hout, J.V.1    Alwan, A.2
  • 27
    • 77956506956 scopus 로고    scopus 로고
    • Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions
    • Nov.
    • W. Kim and J. H. L. Hansen, "Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2111-2120, Nov. 2010.
    • (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2111-2120
    • Kim, W.1    Hansen, J.H.L.2
  • 29
    • 84867227925 scopus 로고    scopus 로고
    • Noise reduction through compressed sensing
    • J. Gemmeke and B. Cranen, "Noise reduction through compressed sensing," in Proc. Interspeech, 2008.
    • (2008) Proc. Interspeech
    • Gemmeke, J.1    Cranen, B.2
  • 30
    • 84873833546 scopus 로고    scopus 로고
    • Multi-candidate missing data imputation for robust speech recognition
    • doi:10.1186/1687-4722-2012-17
    • Y. Wang and H. V. Hamme, "Multi-candidate missing data imputation for robust speech recognition," EURASIP J. Audio, Speech, Music Process., vol. 17, 2012, doi:10.1186/1687-4722-2012-17.
    • (2012) EURASIP J. Audio, Speech, Music Process. , vol.17
    • Wang, Y.1    Hamme, H.V.2
  • 31
    • 85009227702 scopus 로고    scopus 로고
    • Analysis of the aurora large vocabulary extensions
    • Geneva, Switzerland, Sep.
    • N. Parihar and J. Picone, "Analysis of the aurora large vocabulary extensions," in Proc. Eurospeech, Geneva, Switzerland, Sep. 2003, vol. 4, pp. 337-340.
    • (2003) Proc. Eurospeech , vol.4 , pp. 337-340
    • Parihar, N.1    Picone, J.2
  • 32
    • 11144316019 scopus 로고    scopus 로고
    • Decoding speech in the presence of other sources
    • J. Barker, M. Cooke, and D. P. W. Ellis, "Decoding speech in the presence of other sources," Speech Commun., vol. 45, pp. 5-25, 2005.
    • (2005) Speech Commun. , vol.45 , pp. 5-25
    • Barker, J.1    Cooke, M.2    Ellis, D.P.W.3
  • 33
    • 70350038037 scopus 로고    scopus 로고
    • Robust speech recognition by integrating speech separation and hypothesis testing
    • S. Srinivasan and D. L. Wang, "Robust speech recognition by integrating speech separation and hypothesis testing," Speech Commun., vol. 52, pp. 72-81, 2010.
    • (2010) Speech Commun. , vol.52 , pp. 72-81
    • Srinivasan, S.1    Wang, D.L.2
  • 40
    • 85079095310 scopus 로고
    • The design of wall street journal-based CSR corpus
    • Banff, AB, Canada, Oct.
    • D. Paul and J. Baker, "The design of wall street journal-based CSR corpus," in Proc. Int. Conf. Spoken Lang., Banff, AB, Canada, Oct. 1992, pp. 899-902.
    • (1992) Proc. Int. Conf. Spoken Lang , pp. 899-902
    • Paul, D.1    Baker, J.2
  • 41
    • 78049364397 scopus 로고    scopus 로고
    • MMSE based noise PSD tracking with low complexity
    • R. C. Hendriks, R. Heusdens, and J. Jensen, "MMSE based noise PSD tracking with low complexity," in Proc. IEEE ICASSP, 2010, pp. 4266-4269.
    • (2010) Proc. IEEE ICASSP , pp. 4266-4269
    • Hendriks, R.C.1    Heusdens, R.2    Jensen, J.3
  • 42
    • 51449104842 scopus 로고    scopus 로고
    • Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors
    • Aug.
    • J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, "Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.6 , pp. 1741-1752
    • Erkelens, J.S.1    Hendriks, R.C.2    Heusdens, R.3    Jensen, J.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.