메뉴 건너뛰기




Volumn 15, Issue 7, 2007, Pages 2130-2140

Transforming binary uncertainties for robust speech recognition

Author keywords

Binary time x2013; frequency mask; Computational auditory scene analysis (CASA); Robust automatic speech recognition; Spectrogram reconstruction; Uncertainty decoding

Indexed keywords

BASE-LINE PERFORMANCE; BINARY MASKS; CEPSTRAL; CEPSTRAL DOMAINS; CEPSTRAL FEATURES; COMPUTATIONAL AUDITORY SCENE ANALYSIS (CASA); LINEAR SPECTRAL; NOISE CONDITIONS; NOISE ENERGIES; NOISY SPEECH; NOISY SPEECH SIGNALS; NON-LINEAR TRANSFORMATIONS; REGRESSION TREES; ROBUST AUTOMATIC SPEECH RECOGNITION; ROBUST SPEECH RECOGNITION; SPECTROGRAM RECONSTRUCTION; SYSTEMATIC EVALUATIONS; TIME FREQUENCIES; UNCERTAINTY DECODING;

EID: 56249136428     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2007.901836     Document Type: Article
Times cited : (52)

References (55)
  • 1
    • 35048881485 scopus 로고    scopus 로고
    • Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ica
    • S. Araki, S. Makino, H. Sawada, and R. Mukai, "Underdetermined blind separation of convolutive mixtures of speech with directivity pattern based mask and ica," in Proc. 5th Int. Conf. Independent Compon. Anal., 2004, pp. 898-905.
    • (2004) Proc. 5th Int. Conf. Independent Compon. Anal , pp. 898-905
    • Araki, S.1    Makino, S.2    Sawada, H.3    Mukai, R.4
  • 3
    • 11144316019 scopus 로고    scopus 로고
    • Decoding speech in the presence of other sources
    • J. P. Barker, M. P. Cooke, and D. P. W. Ellis, "Decoding speech in the presence of other sources," Speech Commun., vol. 45, pp. 5-25, 2005.
    • (2005) Speech Commun , vol.45 , pp. 5-25
    • Barker, J.P.1    Cooke, M.P.2    Ellis, D.P.W.3
  • 5
    • 64249165037 scopus 로고    scopus 로고
    • P. Boersma and D. Weenink, Praat: Doing Phonetics by Computer, Version 4.0.26, 2002, Online, Available
    • P. Boersma and D. Weenink, "Praat: Doing Phonetics by Computer, Version 4.0.26," 2002. [Online]. Available: http://www.fon.hum.uva.nl/praat
  • 6
    • 0018455310 scopus 로고
    • Suppression of acoustic noise in speech using spectral subtraction
    • Apr
    • S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
    • (1979) IEEE Trans. Acoust., Speech, Signal Process , vol.ASSP-27 , Issue.2 , pp. 113-120
    • Boll, S.F.1
  • 8
    • 44949173881 scopus 로고    scopus 로고
    • Statistical analysis and performance of DFT domain noise reduction filters for robust speech recognition
    • C. Breithaupt and R. Martin, "Statistical analysis and performance of DFT domain noise reduction filters for robust speech recognition," in Proc. Interspeech'06, 2006, pp. 365-368.
    • (2006) Proc. Interspeech'06 , pp. 365-368
    • Breithaupt, C.1    Martin, R.2
  • 9
    • 33845354768 scopus 로고    scopus 로고
    • Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation
    • D. S. Brungart, P. S. Chang, B. D. Simpson, and D. L. Wang, "Isolating the energetic component of speech-on-speech masking with ideal time-frequency segregation," J. Acoust. Soc. Amer., vol. 120, pp. 4007-4018, 2006.
    • (2006) J. Acoust. Soc. Amer , vol.120 , pp. 4007-4018
    • Brungart, D.S.1    Chang, P.S.2    Simpson, B.D.3    Wang, D.L.4
  • 10
    • 64249167258 scopus 로고    scopus 로고
    • quot;The CMU Pronouncing Dictionary, Carnegie Mellon University, Pittsburgh, PA [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
    • quot;The CMU Pronouncing Dictionary," Carnegie Mellon University, Pittsburgh, PA [Online]. Available: http://www.speech.cs.cmu.edu/cgi-bin/cmudict
  • 11
    • 33745217651 scopus 로고    scopus 로고
    • Exploration of behavioral, physiological, and computational approaches to auditory scene analysis,
    • Master's thesis, Dept. Compu. Sci. Eng, The Ohio State Univ, Columbus
    • P. S. Chang, "Exploration of behavioral, physiological, and computational approaches to auditory scene analysis," Master's thesis, Dept. Compu. Sci. Eng., The Ohio State Univ., Columbus, 2004.
    • (2004)
    • Chang, P.S.1
  • 12
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • M. Cooke, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data," Speech Commun., vol. 34, pp. 267-285, 2001.
    • (2001) Speech Commun , vol.34 , pp. 267-285
    • Cooke, M.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 13
    • 0019053271 scopus 로고
    • Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences
    • Aug
    • S. B. Davis and P. Mermelstein, "Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980.
    • (1980) IEEE Trans. Acoust., Speech, Signal Process , vol.ASSP-28 , Issue.4 , pp. 357-366
    • Davis, S.B.1    Mermelstein, P.2
  • 15
    • 18744401086 scopus 로고    scopus 로고
    • Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
    • May
    • L. Deng, J. Droppo, and A. Acero, "Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion," IEEE Trans. Speech Audio Process., vol. 13, no. 3, pp. 412-421, May 2005.
    • (2005) IEEE Trans. Speech Audio Process , vol.13 , Issue.3 , pp. 412-421
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 16
    • 0033099548 scopus 로고    scopus 로고
    • On second-order statistics and linear estimation of cepstral coefficients
    • Mar
    • Y. Ephraim and M. Rahim, "On second-order statistics and linear estimation of cepstral coefficients," IEEE Trans. Speech Audio Process., vol. 7, no. 2, pp. 162-176, Mar. 1999.
    • (1999) IEEE Trans. Speech Audio Process , vol.7 , Issue.2 , pp. 162-176
    • Ephraim, Y.1    Rahim, M.2
  • 17
    • 0030245128 scopus 로고    scopus 로고
    • Robust continuous speech recognition using parallel model combination
    • Sep
    • M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 352-359, Sep. 1996.
    • (1996) IEEE Trans. Speech Audio Process , vol.4 , Issue.5 , pp. 352-359
    • Gales, M.J.F.1    Young, S.J.2
  • 18
    • 0001551844 scopus 로고
    • Supervised learning from incomplete data via an EM approach
    • J. D. Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA: Morgan Kaufmann
    • Z. Ghahramani and M. I. Jordan, "Supervised learning from incomplete data via an EM approach," in Advances in Neural Information Processing Systems 6, J. D. Cowan, G. Tesauro, and J. Alspector, Eds. San Francisco, CA: Morgan Kaufmann, 1993, pp. 120-127.
    • (1993) Advances in Neural Information Processing Systems 6 , pp. 120-127
    • Ghahramani, Z.1    Jordan, M.I.2
  • 19
    • 4644265990 scopus 로고    scopus 로고
    • Monaural speech segregation based on pitch tracking and amplitude modulation
    • Sep
    • G. Hu and D. L. Wang, "Monaural speech segregation based on pitch tracking and amplitude modulation," IEEE Trans. Neural Netw., vol. 15, no. 5, pp. 1135-1150, Sep. 2004.
    • (2004) IEEE Trans. Neural Netw , vol.15 , Issue.5 , pp. 1135-1150
    • Hu, G.1    Wang, D.L.2
  • 21
    • 44949190747 scopus 로고    scopus 로고
    • Improved source modeling and predictive classification for channel robust speech recognition
    • V. Ion and R. Haeb-Umbach, "Improved source modeling and predictive classification for channel robust speech recognition," in Proc. Interspeech, 2006, pp. 633-636.
    • (2006) Proc. Interspeech , pp. 633-636
    • Ion, V.1    Haeb-Umbach, R.2
  • 22
    • 64249084844 scopus 로고    scopus 로고
    • Int. Telecomm. Union (ITU-T), Transmission characteristics for wideband (150-7000 Hz) digital hands-free telephony terminals, Recommendation P.341, 2005, .
    • Int. Telecomm. Union (ITU-T), "Transmission characteristics for wideband (150-7000 Hz) digital hands-free telephony terminals," Recommendation P.341, 2005, .
  • 23
    • 33749058582 scopus 로고    scopus 로고
    • Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing-data techniques
    • D. Kolossa, A. Klimas, and R. Orglmeister, "Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing-data techniques," in Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust., 2005, pp. 82-85.
    • (2005) Proc. IEEE Workshop Applicat. Signal Process. Audio Acoust , pp. 82-85
    • Kolossa, D.1    Klimas, A.2    Orglmeister, R.3
  • 24
    • 0002560960 scopus 로고
    • A database for speaker-independent digit recognition
    • R. G. Leonard, "A database for speaker-independent digit recognition," in Proc. ICASSP'84, 1984, pp. 111-114.
    • (1984) Proc. ICASSP'84 , pp. 111-114
    • Leonard, R.G.1
  • 25
    • 33745202806 scopus 로고    scopus 로고
    • Joint uncertainty decoding for noise robust speech recognition
    • H. Liao and M. J. F. Gales, "Joint uncertainty decoding for noise robust speech recognition," in Proc. Interspeech'05, 2005, pp. 3129-3132.
    • (2005) Proc. Interspeech'05 , pp. 3129-3132
    • Liao, H.1    Gales, M.J.F.2
  • 26
    • 44949140801 scopus 로고    scopus 로고
    • Issues with uncertainty decoding for noise robust speech recognition
    • H. Liao and M. J. F. Gales, "Issues with uncertainty decoding for noise robust speech recognition," in Proc. Interspeech'06, 2006, pp. 1121-1124.
    • (2006) Proc. Interspeech'06 , pp. 1121-1124
    • Liao, H.1    Gales, M.J.F.2
  • 27
    • 33947619691 scopus 로고    scopus 로고
    • Statistical methods for the enhancement of noisy speech
    • J. Benesty, S.Makino, and J. Chen, Eds. NY: Springer, ch. 3, pp
    • R. Martin, "Statistical methods for the enhancement of noisy speech," in Speech Enhancement, J. Benesty, S.Makino, and J. Chen, Eds. NY: Springer, 2005, ch. 3, pp. 43-65.
    • (2005) Speech Enhancement , pp. 43-65
    • Martin, R.1
  • 29
    • 0002671953 scopus 로고
    • A minimax classification approach with application to robust speech recognition
    • Jan
    • N. Merhav and C. H. Lee, "A minimax classification approach with application to robust speech recognition," IEEE Trans. Speech Audio Process., vol. 1, no. 1, pp. 90-193, Jan. 1993.
    • (1993) IEEE Trans. Speech Audio Process , vol.1 , Issue.1 , pp. 90-193
    • Merhav, N.1    Lee, C.H.2
  • 30
    • 0029725301 scopus 로고    scopus 로고
    • A vector Taylor series approach for environment-independent speech recognition
    • P. J. Moreno, B. Raj, and R. M. Stern, "A vector Taylor series approach for environment-independent speech recognition," in Proc. ICASSP'96, 1996, vol. 2, pp. 733-736.
    • (1996) Proc. ICASSP'96 , vol.2 , pp. 733-736
    • Moreno, P.J.1    Raj, B.2    Stern, R.M.3
  • 31
    • 4644304197 scopus 로고    scopus 로고
    • A binaural processor for missing data speech recognition in the presence of noise and smallroom reverberation
    • K. J. Palomaki, G. J. Brown, and D. L. Wang, "A binaural processor for missing data speech recognition in the presence of noise and smallroom reverberation," Speech Commun., vol. 43, pp. 361-378, 2004.
    • (2004) Speech Commun , vol.43 , pp. 361-378
    • Palomaki, K.J.1    Brown, G.J.2    Wang, D.L.3
  • 32
    • 33646773271 scopus 로고    scopus 로고
    • AuroraWorking Group, Eur. Telecomm. Standards Inst, Sophia-Antipolis Cedex, France
    • N. Parihar and J. Picone, "DSR front end LVCSR evaluation," AuroraWorking Group, Eur. Telecomm. Standards Inst., Sophia-Antipolis Cedex, France, 2002.
    • (2002) DSR front end LVCSR evaluation
    • Parihar, N.1    Picone, J.2
  • 33
    • 85009227702 scopus 로고    scopus 로고
    • Analysis of the aurora large vocabulary evalutions
    • N. Parihar and J. Picone, "Analysis of the aurora large vocabulary evalutions," in Proc. Eurospeech'03, 2003, pp. 337-340.
    • (2003) Proc. Eurospeech'03 , pp. 337-340
    • Parihar, N.1    Picone, J.2
  • 34
    • 85079095310 scopus 로고
    • The design of wall street journal-based CSR corpus
    • D. Paul and J. Baker, "The design of wall street journal-based CSR corpus," in Proc. Int. Conf. Spoken Lang. Process., 1992, pp. 899-902.
    • (1992) Proc. Int. Conf. Spoken Lang. Process , pp. 899-902
    • Paul, D.1    Baker, J.2
  • 37
    • 4644336054 scopus 로고    scopus 로고
    • Reconstruction of missing features for robust speech recognition
    • B. Raj, M. L. Seltzer, and R. M. Stern, "Reconstruction of missing features for robust speech recognition," Speech Commun., vol. 43, pp. 275-296, 2004.
    • (2004) Speech Commun , vol.43 , pp. 275-296
    • Raj, B.1    Seltzer, M.L.2    Stern, R.M.3
  • 38
    • 11144343436 scopus 로고    scopus 로고
    • Detection of reliable features for speech recognition in noisy conditions using a statistical criterion
    • P. Renevey and A. Drygajlo, "Detection of reliable features for speech recognition in noisy conditions using a statistical criterion," in Proc. Consist. Rel. Acoust. Cues Sound Anal. Workshop, 2001, pp. 71-74.
    • (2001) Proc. Consist. Rel. Acoust. Cues Sound Anal. Workshop , pp. 71-74
    • Renevey, P.1    Drygajlo, A.2
  • 39
    • 0142026377 scopus 로고    scopus 로고
    • Speech segregation based on sound localization
    • N. Roman, D. L. Wang, and G. J. Brown, "Speech segregation based on sound localization," J. Acoust. Soc. Amer., vol. 114, pp. 2236-2252, 2003.
    • (2003) J. Acoust. Soc. Amer , vol.114 , pp. 2236-2252
    • Roman, N.1    Wang, D.L.2    Brown, G.J.3
  • 40
    • 85009230793 scopus 로고    scopus 로고
    • Factorial models and refiltering for speech separation and denoising
    • S. T. Roweis, "Factorial models and refiltering for speech separation and denoising," in Proc. Eurospeech'03, 2003, pp. 1009-1012.
    • (2003) Proc. Eurospeech'03 , pp. 1009-1012
    • Roweis, S.T.1
  • 42
    • 85009180557 scopus 로고    scopus 로고
    • A harmonic-model-based front end for robust speech recognition
    • M. L. Seltzer, J. Droppo, and A. Acero, "A harmonic-model-based front end for robust speech recognition," in Proc. Eurospeech'03, 2003, pp. 1277-1280.
    • (2003) Proc. Eurospeech'03 , pp. 1277-1280
    • Seltzer, M.L.1    Droppo, J.2    Acero, A.3
  • 43
    • 9644309702 scopus 로고    scopus 로고
    • Discriminant training of front-end and acoustic modeling stages to heterogeneous acoustic environments for multi-stream automatic speech recognition,
    • Ph.D. dissertation, Univ. California, Berkeley
    • M. L. Shire, "Discriminant training of front-end and acoustic modeling stages to heterogeneous acoustic environments for multi-stream automatic speech recognition," Ph.D. dissertation, Univ. California, Berkeley, 2000.
    • (2000)
    • Shire, M.L.1
  • 44
    • 33750311718 scopus 로고    scopus 로고
    • Binary and ratio time-frequency masks for robust speech recognition
    • S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, no. 11, pp. 1486-1501, 2006.
    • (2006) Speech Commun , vol.48 , Issue.11 , pp. 1486-1501
    • Srinivasan, S.1    Roman, N.2    Wang, D.L.3
  • 45
    • 33947644911 scopus 로고    scopus 로고
    • A supervised learning approach to uncertainty decoding for robust speech recognition
    • S. Srinivasan and D. L. Wang, "A supervised learning approach to uncertainty decoding for robust speech recognition," in Proc. ICASSP'06, 2006, vol. I, pp. 297-300.
    • (2006) Proc. ICASSP'06 , vol.1 , pp. 297-300
    • Srinivasan, S.1    Wang, D.L.2
  • 46
    • 33750376174 scopus 로고    scopus 로고
    • Model-based feature enhancement with uncertainty decoding for noise robust ASR
    • V. Stouten, H. V. Hamme, and P.Wambacq, "Model-based feature enhancement with uncertainty decoding for noise robust ASR," Speech Commun., vol. 48, no. 11, pp. 1502-1514, 2006.
    • (2006) Speech Commun , vol.48 , Issue.11 , pp. 1502-1514
    • Stouten, V.1    Hamme, H.V.2    Wambacq, P.3
  • 47
    • 0025681008 scopus 로고
    • Hidden Markov model decomposition of speech and noise
    • A. P. Varga and R. K. Moore, "Hidden Markov model decomposition of speech and noise," in Proc. ICASSP'90, 1990, pp. 845-848.
    • (1990) Proc. ICASSP'90 , pp. 845-848
    • Varga, A.P.1    Moore, R.K.2
  • 48
    • 64249140840 scopus 로고    scopus 로고
    • A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, The NOISEX-92 study on the effect of additive noise on automatic speech recogonition, Speech Res. Unit, Def. Res. Agency, Malvern, U.K., 1992.
    • A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, "The NOISEX-92 study on the effect of additive noise on automatic speech recogonition," Speech Res. Unit, Def. Res. Agency, Malvern, U.K., 1992.
  • 49
    • 84892233308 scopus 로고    scopus 로고
    • On ideal binary mask as the computational goal of auditory scene analysis
    • P. Divenyi, Ed. Norwell, MA: Kluwer
    • D. L. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA: Kluwer, 2005, pp. 181-197.
    • (2005) Speech Separation by Humans and Machines , pp. 181-197
    • Wang, D.L.1
  • 50
    • 14544300108 scopus 로고    scopus 로고
    • How to pretend that correlated variables are independent by using difference observations
    • C. K. I. Williams, "How to pretend that correlated variables are independent by using difference observations," Neural Comput., vol. 17, pp. 1-6, 2005.
    • (2005) Neural Comput , vol.17 , pp. 1-6
    • Williams, C.K.I.1
  • 51
    • 31844435714 scopus 로고    scopus 로고
    • Incomplete-data classification using logistic regression
    • D.Williams, X. Liao, Y. Xue, and L. Carin, L. D. Raedt and S. Wrobel, Eds
    • D.Williams, X. Liao, Y. Xue, and L. Carin, L. D. Raedt and S. Wrobel, Eds., "Incomplete-data classification using logistic regression," in Proc. 22nd Int. Mach. Learning Conf., 2005, pp. 972-979.
    • (2005) Proc. 22nd Int. Mach. Learning Conf , pp. 972-979
  • 52
    • 44949171870 scopus 로고    scopus 로고
    • Vector Taylor series based joint uncertainty decoding
    • H. Xu, L. Rigazio, and D. Kryze, "Vector Taylor series based joint uncertainty decoding," in Proc. Interspeech'06, 2006, pp. 1125-1128.
    • (2006) Proc. Interspeech'06 , pp. 1125-1128
    • Xu, H.1    Rigazio, L.2    Kryze, D.3
  • 53
    • 33646768933 scopus 로고    scopus 로고
    • Static and dynamic spectral features: Their noise robustness and optimal weights
    • C. Yang, F. K. Soong, and T. Lee, "Static and dynamic spectral features: Their noise robustness and optimal weights," in Proc. ICASSP'05, 2005, vol. I, pp. 241-244.
    • (2005) Proc. ICASSP'05 , vol.1 , pp. 241-244
    • Yang, C.1    Soong, F.K.2    Lee, T.3
  • 54
    • 3142694930 scopus 로고    scopus 로고
    • Blind separation of speech mixtures via time-frequency masking
    • O. Yilmaz and S. Rickard, "Blind separation of speech mixtures via time-frequency masking," IEEE Trans. Signal Process., vol. 52, pp. 1830-1847, 2004.
    • (2004) IEEE Trans. Signal Process , vol.52 , pp. 1830-1847
    • Yilmaz, O.1    Rickard, S.2
  • 55
    • 64249091246 scopus 로고    scopus 로고
    • S. Young, D. Kershaw, J. Odell, V. Valtchev, and P. Woodland, The HTK Book for HTK Version 3.0, Redmond, WA: Microsoft Corp, 2000
    • S. Young, D. Kershaw, J. Odell, V. Valtchev, and P. Woodland, The HTK Book (for HTK Version 3.0). Redmond, WA: Microsoft Corp., 2000.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.