메뉴 건너뛰기




Volumn 48, Issue 11, 2006, Pages 1486-1501

Binary and ratio time-frequency masks for robust speech recognition

Author keywords

Binaural processing; Ideal binary mask; Missing data recognizer; Ratio mask; Robust speech recognition; Speech segregation

Indexed keywords

DATA REDUCTION; ROBUSTNESS (CONTROL SYSTEMS); SPEECH COMMUNICATION; TIME DOMAIN ANALYSIS;

EID: 33750311718     PISSN: 01676393     EISSN: None     Source Type: Journal    
DOI: 10.1016/j.specom.2006.09.003     Document Type: Article
Times cited : (222)

References (43)
  • 1
    • 85009063707 scopus 로고    scopus 로고
    • Barker, J., Josifovski, L., Cooke, M., Green, P., 2000. Soft decisions in missing data techniques for robust automatic speech recognition. In: Proc. International Conference on Spoken Language Processing '00, pp. 373-376.
  • 3
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • Boll S.F. Suppression of acoustic noise in speech using spectral subtraction. IEEE Trans. Acoust. Speech Signal Processing ASSP-27 2 (1979) 113-120
    • (1979) IEEE Trans. Acoust. Speech Signal Processing , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll, S.F.1
  • 5
    • 33644639591 scopus 로고    scopus 로고
    • Separation of speech by computational auditory scene analysis
    • Benesty J., Makino S., and Chen J. (Eds), Springer, New York
    • Brown G.J., and Wang D.L. Separation of speech by computational auditory scene analysis. In: Benesty J., Makino S., and Chen J. (Eds). Speech Enhancement (2005), Springer, New York 371-402
    • (2005) Speech Enhancement , pp. 371-402
    • Brown, G.J.1    Wang, D.L.2
  • 6
    • 0032187518 scopus 로고    scopus 로고
    • Blind signal separation: statistical principles
    • Cardoso J.F. Blind signal separation: statistical principles. Proc. IEEE 86 10 (1998) 2009-2025
    • (1998) Proc. IEEE , vol.86 , Issue.10 , pp. 2009-2025
    • Cardoso, J.F.1
  • 7
    • 33646780873 scopus 로고    scopus 로고
    • Chen, C.-P., Bilmes, J., Ellis, D.P.W., 2005. Speech feature smoothing for robust ASR. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing '05, vol. 1, pp. 525-528.
  • 8
    • 33750306752 scopus 로고    scopus 로고
    • Cole, R., Noel, M., Lander, T., Durham, T., 1995. New telephone speech corpora at CSLU. In: Proc. European Conference on Speech Communication and Technology '95, pp. 821-824.
  • 9
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • Cooke M., Green P., Josifovski L., and Vizinho A. Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34 (2001) 267-285
    • (2001) Speech Commun. , vol.34 , pp. 267-285
    • Cooke, M.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 10
    • 33750295942 scopus 로고    scopus 로고
    • Cunningham, S., Cooke, M., 1999. The role of evidence and counter-evidence in speech perception. In: Proc. International Congress on Phonetic Sciences '99, pp. 215-218.
  • 11
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • Davis S.B., and Mermelstein P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Processing ASSP-28 4 (1980) 357-366
    • (1980) IEEE Trans. Acoust. Speech Signal Processing , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis, S.B.1    Mermelstein, P.2
  • 12
    • 33750302993 scopus 로고    scopus 로고
    • de Veth, J., de Wet, F., Cranen, B., Boves, L., 1999. Missing feature theory in ASR: make sure you miss the right type of features. In: Proc. Workshop on Robust Methods for Speech Recognition in Adverse Conditions '99, pp. 231-234.
  • 13
    • 85009211607 scopus 로고    scopus 로고
    • Droppo, J., Acero, A., Deng, L., 2002. A nonlinear observation model for removing noise from corrupted speech log mel-spectral energies. In: Proc. International Conference on Spoken Language Processing '02, pp. 1569-1572.
  • 14
    • 0031258231 scopus 로고    scopus 로고
    • Blind separation of convolutive mixtures and an application in automatic speech recognition in a noisy environment
    • Ehlers F., and Schuster H.G. Blind separation of convolutive mixtures and an application in automatic speech recognition in a noisy environment. IEEE Trans. Signal Processing 45 10 (1997) 2608-2612
    • (1997) IEEE Trans. Signal Processing , vol.45 , Issue.10 , pp. 2608-2612
    • Ehlers, F.1    Schuster, H.G.2
  • 15
    • 0026843273 scopus 로고
    • A Bayesian estimation approach for speech enhancement using hidden Markov models
    • Ephraim Y. A Bayesian estimation approach for speech enhancement using hidden Markov models. IEEE Trans. Signal Processing 40 4 (1992) 725-735
    • (1992) IEEE Trans. Signal Processing , vol.40 , Issue.4 , pp. 725-735
    • Ephraim, Y.1
  • 16
    • 0021645331 scopus 로고
    • Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator
    • Ephraim Y., and Malah D. Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator. IEEE Trans. Acoust. Speech Signal Processing ASSP-32 6 (1984) 1109-1121
    • (1984) IEEE Trans. Acoust. Speech Signal Processing , vol.ASSP-32 , Issue.6 , pp. 1109-1121
    • Ephraim, Y.1    Malah, D.2
  • 17
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • Gales M.J.F., and Young S.J. Robust continuous speech recognition using parallel model combination. IEEE Trans. Speech Audio Processing 4 (1996) 352-359
    • (1996) IEEE Trans. Speech Audio Processing , vol.4 , pp. 352-359
    • Gales, M.J.F.1    Young, S.J.2
  • 18
    • 33750283214 scopus 로고    scopus 로고
    • Gardner, W.G., Martin, K.D., 1994. HRTF measurements of a KEMAR dummy-head microphone. Technical Report #280, MIT Media Lab Perceptual Computing Group.
  • 19
    • 33750341537 scopus 로고    scopus 로고
    • Garofolo, J., Lamel, L., Fisher, W., Fiscus, J., Pallet, D., Dahlgren, N., 1993. Darpa timit acoustic-phonetic continuous speech corpus. Technical Report NISTIR 4930, National Institute of Standards and Technology, Gaithersburg, MD.
  • 20
    • 33750364293 scopus 로고    scopus 로고
    • Glotin, H., Berthommier, F., Tessier, E., 1999. A CASA-labelling model using the localisation cue for robust cocktail-party speech recognition. In: Proc. European Conference on Speech Communication and Technology '99, pp. 2351-2354.
  • 21
    • 0029288202 scopus 로고
    • Speech recognition in noisy environments: a survey
    • Gong Y. Speech recognition in noisy environments: a survey. Speech Commun. 16 (1995) 261-291
    • (1995) Speech Commun. , vol.16 , pp. 261-291
    • Gong, Y.1
  • 22
    • 0032677010 scopus 로고    scopus 로고
    • Performance of an HMM speech recognizer using a real-time tracking microphone array as input
    • Hughes T.B., Kim H.S., DiBase J.H., and Silverman H.F. Performance of an HMM speech recognizer using a real-time tracking microphone array as input. IEEE Trans. Speech Audio Processing 7 3 (1999) 346-349
    • (1999) IEEE Trans. Speech Audio Processing , vol.7 , Issue.3 , pp. 346-349
    • Hughes, T.B.1    Kim, H.S.2    DiBase, J.H.3    Silverman, H.F.4
  • 23
    • 0021226391 scopus 로고    scopus 로고
    • Leonard, R.G., 1984. A database for speaker-independent digit recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing '84, pp. 111-114.
  • 24
    • 0031187171 scopus 로고    scopus 로고
    • Speech recognition by machines and humans
    • Lippmann R.P. Speech recognition by machines and humans. Speech Commun. 22 (1997) 1-15
    • (1997) Speech Commun. , vol.22 , pp. 1-15
    • Lippmann, R.P.1
  • 26
    • 85009242725 scopus 로고    scopus 로고
    • Macho, D., Mauuary, L., Noe, B., Cheng, Y.M., Ealey, D., Jouvet, D., Kelleher, H., Pearce, D., Saadoun, F., 2002. Evaluation of a noise-robust DSR front-end on aurora databases. In: Proc. International Conference on Spoken Language Processing '02, pp. 17-20.
  • 27
    • 0019009880 scopus 로고
    • Speech enhancement using a soft-decision noise suppression filter
    • McAulay R., and Malpass M.L. Speech enhancement using a soft-decision noise suppression filter. IEEE Trans. Acoust. Speech Signal Processing ASSP-28 2 (1980) 137-145
    • (1980) IEEE Trans. Acoust. Speech Signal Processing , vol.ASSP-28 , Issue.2 , pp. 137-145
    • McAulay, R.1    Malpass, M.L.2
  • 28
    • 0003513556 scopus 로고    scopus 로고
    • Discrete-time Signal Processing
    • Prentice-Hall, Upper Saddle River, NJ
    • Oppenheim A.V., Schafer R.W., and Buck J.R. Discrete-time Signal Processing. second ed. (1999), Prentice-Hall, Upper Saddle River, NJ
    • (1999) second ed.
    • Oppenheim, A.V.1    Schafer, R.W.2    Buck, J.R.3
  • 29
    • 4644304197 scopus 로고    scopus 로고
    • A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation
    • Palomaki K.J., Brown G.J., and Wang D.L. A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation. Speech Commun. 43 (2004) 361-378
    • (2004) Speech Commun. , vol.43 , pp. 361-378
    • Palomaki, K.J.1    Brown, G.J.2    Wang, D.L.3
  • 30
    • 0023776398 scopus 로고    scopus 로고
    • Price, P., Fisher, W.M., Bernstein, J., Pallett, D.S., 1988. The DARPA 1000 word Resource Management database for continuous speech recognition. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing '88, pp. 651-654.
  • 31
    • 0004244302 scopus 로고
    • Fundamentals of Speech Recognition
    • Prentice-Hall, Englewood Cliffs, NJ
    • Rabiner L.R., and Juang B.H. Fundamentals of Speech Recognition. second ed. (1993), Prentice-Hall, Englewood Cliffs, NJ
    • (1993) second ed.
    • Rabiner, L.R.1    Juang, B.H.2
  • 32
    • 4644336054 scopus 로고    scopus 로고
    • Reconstruction of missing features for robust speech recognition
    • Raj B., Seltzer M.L., and Stern R.M. Reconstruction of missing features for robust speech recognition. Speech Commun. 43 (2004) 275-296
    • (2004) Speech Commun. , vol.43 , pp. 275-296
    • Raj, B.1    Seltzer, M.L.2    Stern, R.M.3
  • 33
    • 0142026377 scopus 로고    scopus 로고
    • Speech segregation based on sound localization
    • Roman N., Wang D.L., and Brown G.J. Speech segregation based on sound localization. J. Acoust. Soc. Am. 114 (2003) 2236-2252
    • (2003) J. Acoust. Soc. Am. , vol.114 , pp. 2236-2252
    • Roman, N.1    Wang, D.L.2    Brown, G.J.3
  • 34
    • 0003444613 scopus 로고    scopus 로고
    • Rosenthal D.F., and Okuno H.G. (Eds), Lawrence Erlbaum Associates, Mahwah, NJ
    • In: Rosenthal D.F., and Okuno H.G. (Eds). Computational Auditory Scene Analysis (1998), Lawrence Erlbaum Associates, Mahwah, NJ
    • (1998) Computational Auditory Scene Analysis
  • 35
    • 33750353915 scopus 로고    scopus 로고
    • Shire, M.L., 2000. Discriminant training of front-end and acoustic modeling stages to heterogeneous acoustic environments for multi-stream automatic speech recognition. Ph.D. thesis, University of California, Berkley.
  • 36
    • 85009151411 scopus 로고    scopus 로고
    • Srinivasan, S., Roman, N., Wang, D.L., 2004. On binary and ratio time-frequency masks for robust speech recognition. In: Proc. International Conference on Spoken Language Processing '04, pp. 2541-2544.
  • 37
    • 33750365068 scopus 로고    scopus 로고
    • STQ-AURORA, 2005-11. Speech processing, transmission and quality aspects (STQ); distributed speech recognition; advanced front-end feature extraction algorithm; compression algorithms. In: ETSI ES 202 050 V1.1.4. European Telecommunications Standards Institute.
  • 38
    • 33750285581 scopus 로고    scopus 로고
    • Tessier, E., Berthommier, F., Glotin, H., Choi, S., 1999. A CASA front-end using the localisation cue for segregation and then cocktail-party speech recognition. In: Proc. IEEE International Conference on Speech Processing, pp. 97-102.
  • 39
    • 85009212472 scopus 로고    scopus 로고
    • van Hamme, H., 2003. Robust speech recognition using missing feature theory in the cepstral or LDA domain. In: Proceedings of the European Conference on Speech Communication and Technology '03, pp. 3089-3092.
  • 41
    • 0025681008 scopus 로고    scopus 로고
    • Varga, A.P., Moore, R.K., 1990. Hidden Markov model decomposition of speech and noise. In: Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing '90, pp. 845-848.
  • 42
    • 33750297707 scopus 로고    scopus 로고
    • Varga, A.P., Steeneken, H.J.M., Tomlinson, M., Jones, D., 1992. The NOISEX-92 study on the effect of additive noise on automatic speech recognition. Technical Report, Speech Research Unit, Defense Research Agency, Malvern, UK.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.