메뉴 건너뛰기




Volumn 20, Issue 3, 2012, Pages 818-827

Combining speech fragment decoding and adaptive noise floor modeling

Author keywords

Adaptive noise floor modeling; fragment decoding; missing data decoding; noise robust speech recognition

Indexed keywords

ACOUSTIC EVENTS; ADAPTIVE NOISE; DECODING SYSTEM; HIGH ENERGY; HIGH ENERGY REGIONS; MISSING DATA; MODEL ESTIMATES; NOISE FLOOR; NOISE MODELING; NOISE MODELS; NOISE ROBUST SPEECH RECOGNITION; NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION; SOURCE SEPARATION; TARGET SPEECH;

EID: 84856140165     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TASL.2011.2165945     Document Type: Article
Times cited : (9)

References (42)
  • 2
    • 0026882842 scopus 로고
    • Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars
    • P. Lockwood and J. Boudy, "Experiments with nonlinear spectral subtractor (NSS), hidden Markov models and the projection for robust speech recognition in cars, " Speech Commun. , vol. 11, pp. 215-228, 1992. (Pubitemid 23572493)
    • (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 215-228
    • Lockwood, P.1    Boudy, J.2
  • 3
    • 0035396555 scopus 로고    scopus 로고
    • Noise power spectral density estimation based on optimal smoothing and minimum statistics
    • DOI 10.1109/89.928915, PII S106366760104980X
    • R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics, " IEEE Trans. Speech. Audio Process. , vol. 9, no. 5, pp. 504-512, Jul. 2001. (Pubitemid 32631178)
    • (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.5 , pp. 504-512
    • Martin, R.1
  • 4
    • 0035342414 scopus 로고    scopus 로고
    • Robust automatic speech recognition with missing and unreliable acoustic data
    • DOI 10.1016/S0167-6393(00)00034-0, PII S0167639300000340
    • M. Cooke, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and uncertain acoustic data, " Speech Commun. , vol. 34, pp. 267-285, 2001. (Pubitemid 32284867)
    • (2001) Speech Communication , vol.34 , Issue.3 , pp. 267-285
    • Cooke, M.1    Green, P.2    Josifovski, L.3    Vizinho, A.4
  • 5
    • 4644317224 scopus 로고    scopus 로고
    • A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
    • M. Seltzer, B. Raj, and R. Stern, "A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition, " Speech Commun. , vol. 43, pp. 379-393, 2004.
    • (2004) Speech Commun. , vol.43 , pp. 379-393
    • Seltzer, M.1    Raj, B.2    Stern, R.3
  • 6
    • 11144316019 scopus 로고    scopus 로고
    • Decoding speech in the presence of other sources
    • DOI 10.1016/j.specom.2004.05.002, PII S0167639304000615
    • J. Barker, M. Cooke, and D. Ellis, "Decoding speech in the presence of other sources, " Speech Commun. , vol. 45, pp. 5-25, 2005. (Pubitemid 40034706)
    • (2005) Speech Communication , vol.45 , Issue.1 , pp. 5-25
    • Barker, J.P.1    Cooke, M.P.2    Ellis, D.P.W.3
  • 8
    • 18744401086 scopus 로고    scopus 로고
    • Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
    • DOI 10.1109/TSA.2005.845814
    • L. Deng, J. Droppo, and A. Acero, "Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion, " IEEE Trans. Speech. Audio Process. , vol. 13, no. 3, pp. 412-421, May 2005. (Pubitemid 40666175)
    • (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 412-421
    • Deng, L.1    Droppo, J.2    Acero, A.3
  • 9
    • 40249103761 scopus 로고    scopus 로고
    • Issues with uncertainty decoding for noise robust automatic speech recognition
    • H. Liao and M. Gales, "Issues with uncertainty decoding for noise robust automatic speech recognition, " Speech Commun. , vol. 50, pp. 265-277, 2008.
    • (2008) Speech Commun. , vol.50 , pp. 265-277
    • Liao, H.1    Gales, M.2
  • 11
    • 85135375893 scopus 로고
    • HMM recognition in noise using parallel model combination
    • Berlin
    • M. Gales and S. Young, "HMM recognition in noise using parallel model combination, " in Proc. Eurospeech, Berlin, 1993.
    • (1993) Proc. Eurospeech
    • Gales, M.1    Young, S.2
  • 12
    • 85009074657 scopus 로고    scopus 로고
    • ALGONQUIN: Iterating Laplace's method to remove multiple types of distortion for robust speech recognition
    • Aalborg, Denmark
    • B. Frey, L. Deng, A. Acero, and T. Kristjansson, "ALGONQUIN: Iterating Laplace's method to remove multiple types of distortion for robust speech recognition, " in Proc. Eurospeech, Aalborg, Denmark, 2001, pp. 901-904.
    • (2001) Proc. Eurospeech , pp. 901-904
    • Frey, B.1    Deng, L.2    Acero, A.3    Kristjansson, T.4
  • 13
    • 69249222720 scopus 로고    scopus 로고
    • Super-human multi-talker speech recognition: A graphical modeling approach
    • J. R. Hershey, S. J. Rennie, and P. A. Olsen, "Super-human multi-talker speech recognition: A graphical modeling approach, " Comput. Speech. Lang. , vol. 24, pp. 45-66, 2010.
    • (2010) Comput. Speech. Lang. , vol.24 , pp. 45-66
    • Hershey, J.R.1    Rennie, S.J.2    Olsen, P.A.3
  • 15
    • 69249202377 scopus 로고    scopus 로고
    • Monaural speech separation and recognition challenge
    • M. Cooke, J. Hershey, and S. Rennie, "Monaural speech separation and recognition challenge, " Comput. Speech. Lang. , vol. 24, pp. 1-15, 2010.
    • (2010) Comput. Speech. Lang. , vol.24 , pp. 1-15
    • Cooke, M.1    Hershey, J.2    Rennie, S.3
  • 16
    • 50249152311 scopus 로고    scopus 로고
    • Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
    • Mar
    • T. Virtanen, "Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, " IEEE Trans. Audio. Speech. , vol. 15, no. 3, pp. 1066-1074, Mar. 2007.
    • (2007) IEEE Trans. Audio. Speech. , vol.15 , Issue.3 , pp. 1066-1074
    • Virtanen, T.1
  • 17
    • 44949110218 scopus 로고    scopus 로고
    • Single-channel speech separation using sparse non-negative matrix factorization
    • Pittsburgh, PA
    • M. N. Schmidt and R. K. Olsson, "Single-channel speech separation using sparse non-negative matrix factorization, " in Proc. Interspeech, Pittsburgh, PA, 2006, pp. 2614-2617.
    • (2006) Proc. Interspeech , pp. 2614-2617
    • Schmidt, M.N.1    Olsson, R.K.2
  • 18
    • 4344607755 scopus 로고    scopus 로고
    • Likelihood-maximizing beamforming for robust hands-free speech recognition
    • Sep
    • M. Seltzer, B. Raj, and R. Stern, "Likelihood-maximizing beamforming for robust hands-free speech recognition, " IEEE Trans. Speech. Audio Process. , vol. 12, no. 5, pp. 489-498, Sep. 2004.
    • (2004) IEEE Trans. Speech. Audio Process. , vol.12 , Issue.5 , pp. 489-498
    • Seltzer, M.1    Raj, B.2    Stern, R.3
  • 19
    • 34250689497 scopus 로고    scopus 로고
    • Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears
    • DOI 10.1109/IROS.2006.281741, 4058472, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
    • R. Takeda, S. Yamamoto, K. Komatani, T. Ogata, and H. Okuno, "Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears, " in IEEE/RSJ Int. Conf. Intell. Robots Syst. , 2006, pp. 878-885. (Pubitemid 46927954)
    • (2006) IEEE International Conference on Intelligent Robots and Systems , pp. 878-885
    • Takeda, R.1    Yamamoto, S.2    Komatani, K.3    Ogata, T.4    Okuno, H.G.5
  • 20
    • 79959845286 scopus 로고    scopus 로고
    • The CHiME corpus: A resource and a challenge for Computational Hearing in Multisource Environments
    • H. Christensen, J. Barker, N. Ma, and P. Green, "The CHiME corpus: A resource and a challenge for Computational Hearing in Multisource Environments, " in Proc. Interspeech, 2010.
    • (2010) Proc. Interspeech
    • Christensen, H.1    Barker, J.2    Ma, N.3    Green, P.4
  • 22
    • 0002296637 scopus 로고
    • On the importance of time - A temporal representation of sound
    • M. Cooke, S. Beet, and M. Crawford, Eds. Sussex, U. K. : Wiley
    • M. Slaney and R. Lyon, "On the importance of time - A temporal representation of sound, " in Visual Representations of Speech Signals, M. Cooke, S. Beet, and M. Crawford, Eds. Sussex, U. K. : Wiley, 1993, pp. 95-116.
    • (1993) Visual Representations of Speech Signals , pp. 95-116
    • Slaney, M.1    Lyon, R.2
  • 23
    • 0344581050 scopus 로고    scopus 로고
    • Temporal integration and context effects in hearing
    • DOI 10.1016/S0095-4470(03)00011-1
    • B. C. J. Moore, "Temporal integration and context effects in hearing, " in J. Phonetics, 2003, vol. 31, pp. 563-574. (Pubitemid 37495928)
    • (2003) Journal of Phonetics , vol.31 , Issue.3-4 , pp. 563-574
    • Moore, B.C.J.1
  • 25
    • 0029249228 scopus 로고
    • Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits
    • R. Warren, K. Riener, J. Bashford, and B. Brubaker, "Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits, " Percept. Psychophys. , vol. 57, pp. 175-182, 1995.
    • (1995) Percept. Psychophys. , vol.57 , pp. 175-182
    • Warren, R.1    Riener, K.2    Bashford, J.3    Brubaker, B.4
  • 26
    • 0036713102 scopus 로고    scopus 로고
    • The intelligibility of speech with "holes" in the spectrum
    • K. Kasturi, P. C. Loizou, M. Dorman, and T. Spahr, "The intelligibility of speech with "holes" in the spectrum, " J. Acoust. Soc. Amer. , vol. 112, pp. 1102-1111, 2002.
    • (2002) J. Acoust. Soc. Amer. , vol.112 , pp. 1102-1111
    • Kasturi, K.1    Loizou, P.C.2    Dorman, M.3    Spahr, T.4
  • 27
    • 33644661135 scopus 로고    scopus 로고
    • A glimpsing model of speech perception in noise
    • DOI 10.1121/1.2166600
    • M. Cooke, "A glimpsing model of speech perception in noise, " J. Acoust. Soc. Amer. , vol. 119, pp. 1562-1573, 2006. (Pubitemid 43326025)
    • (2006) Journal of the Acoustical Society of America , vol.119 , Issue.3 , pp. 1562-1573
    • Cooke, M.1
  • 28
    • 4644336054 scopus 로고    scopus 로고
    • Reconstruction of missing features for robust speech recognition
    • B. Raj, M. Seltzer, and R. Stern, "Reconstruction of missing features for robust speech recognition, " Speech Commun. , vol. 43, pp. 275-296, 2004.
    • (2004) Speech Commun. , vol.43 , pp. 275-296
    • Raj, B.1    Seltzer, M.2    Stern, R.3
  • 29
    • 85009063707 scopus 로고    scopus 로고
    • Soft decisions in missing data techniques for robust automatic speech recognition
    • Beijing, China
    • J. Barker, L. Josifovski, M. Cooke, and P. Green, "Soft decisions in missing data techniques for robust automatic speech recognition, " in Proc. ICSLP, Beijing, China, 2000, pp. 373-376.
    • (2000) Proc. ICSLP , pp. 373-376
    • Barker, J.1    Josifovski, L.2    Cooke, M.3    Green, P.4
  • 30
    • 11144343436 scopus 로고    scopus 로고
    • Detection of reliable features for speech recognition in noisy conditions using a statistical criterion
    • Aalborg, Denmark
    • P. Renevey and A. Drygajlo, "Detection of reliable features for speech recognition in noisy conditions using a statistical criterion, " in Proc. CRAC, Aalborg, Denmark, 2001.
    • (2001) Proc. CRAC
    • Renevey, P.1    Drygajlo, A.2
  • 31
    • 33847629729 scopus 로고    scopus 로고
    • On noise masking for automatic missing data speech recognition: A survey and discussion
    • DOI 10.1016/j.csl.2006.08.001, PII S0885230806000301
    • C. Cerisara, S. Demange, and J. Haton, "On noise masking for automatic missing data speech recognition: A survey and discussion, " Comput. Speech. Lang. , vol. 21, pp. 443-457, 2007. (Pubitemid 46367508)
    • (2007) Computer Speech and Language , vol.21 , Issue.3 , pp. 443-457
    • Cerisara, C.1    Demange, S.2    Haton, J.-P.3
  • 32
    • 0041360463 scopus 로고    scopus 로고
    • Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
    • Sep
    • I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, " IEEE Trans. Speech. Audio Process. , vol. 11, no. 5, pp. 466-475, Sep. 2003.
    • (2003) IEEE Trans. Speech. Audio Process. , vol.11 , Issue.5 , pp. 466-475
    • Cohen, I.1
  • 33
    • 29444448046 scopus 로고    scopus 로고
    • A noise-estimation algorithm for highly non-stationary environments
    • DOI 10.1016/j.specom.2005.08.005, PII S0167639305002001
    • S. Rangachari and P. C. Loizou, "A noise-estimation algorithm for highly non-stationary environments, " Speech Commun. , vol. 48, pp. 220-231, 2006. (Pubitemid 43012033)
    • (2006) Speech Communication , vol.48 , Issue.2 , pp. 220-231
    • Rangachari, S.1    Loizou, P.C.2
  • 34
    • 0034244889 scopus 로고    scopus 로고
    • Learning patterns of activity using realtime tracking
    • Aug
    • C. Stauffer and W. Grimson, "Learning patterns of activity using realtime tracking, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 22, no. 8, pp. 747-757, Aug. 2000.
    • (2000) IEEE Trans. Pattern Anal. Mach. Intell. , vol.22 , Issue.8 , pp. 747-757
    • Stauffer, C.1    Grimson, W.2
  • 35
    • 0025110885 scopus 로고
    • Derivation of auditory filter shapes from notched-noise data
    • DOI 10.1016/0378-5955(90)90170-T
    • B. Glasberg and B. Moore, "Derivation of auditory filter shapes from notched-noise data, " Hearing Res. , vol. 47, pp. 103-138, 1990. (Pubitemid 20244652)
    • (1990) Hearing Research , vol.47 , Issue.1-2 , pp. 103-138
    • Glasberg, B.R.1    Moore, B.C.J.2
  • 36
    • 34748817500 scopus 로고    scopus 로고
    • Exploiting correlogram structure for robust speech recognition with multiple speech sources
    • DOI 10.1016/j.specom.2007.05.003, PII S016763930700088X
    • N. Ma, P. Green, J. Barker, and A. Coy, "Exploiting correlogram structure for robust speech recognition with multiple speech sources, "Speech Commun. , vol. 49, pp. 874-891, 2007. (Pubitemid 47488511)
    • (2007) Speech Communication , vol.49 , Issue.12 , pp. 874-891
    • Ma, N.1    Green, P.2    Barker, J.3    Coy, A.4
  • 37
    • 85009106519 scopus 로고    scopus 로고
    • Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise
    • Aalborg, Denmark
    • J. Barker, M. Cooke, and P. Green, "Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise, " in Proc. Eurospeech, Aalborg, Denmark, 2001, pp. 213-216.
    • (2001) Proc. Eurospeech , pp. 213-216
    • Barker, J.1    Cooke, M.2    Green, P.3
  • 38
    • 0001463644 scopus 로고
    • A duplex theory of pitch perception
    • J. Licklider, "A duplex theory of pitch perception, " Experientia, vol. 7, pp. 128-134, 1951.
    • (1951) Experientia , vol.7 , pp. 128-134
    • Licklider, J.1
  • 40
    • 33750368310 scopus 로고    scopus 로고
    • An audio-visual corpus for speech perception and automatic speech recognition
    • DOI 10.1121/1.2229005
    • M. Cooke, J. Barker, S. Cunningham, and X. Shao, "An audio-visual corpus for speech perception and automatic speech recognition, " J. Acoust. Soc. Amer. , vol. 120, pp. 2421-2424, 2006. (Pubitemid 44631681)
    • (2006) Journal of the Acoustical Society of America , vol.120 , Issue.5 , pp. 2421-2424
    • Cooke, M.1    Barker, J.2    Cunningham, S.3    Shao, X.4
  • 42
    • 0031268341 scopus 로고    scopus 로고
    • Factorial hidden markov models
    • Z. Ghahramani and M. I. Jordan, "Factorial hidden Markov models, " Mach. Learn. , vol. 29, pp. 245-273, 1997. (Pubitemid 127510040)
    • (1997) Machine Learning , vol.29 , Issue.2-3 , pp. 245-273
    • Ghahramani, Z.1    Jordan, M.I.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.