SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 20, Issue 3, 2012, Pages 818-827

Combining speech fragment decoding and adaptive noise floor modeling

(4) Ma, Ning a Barker, Jon a Christensen, Heidi a Green, Phil a

a UNIVERSITY OF SHEFFIELD (United Kingdom)

Author keywords

Adaptive noise floor modeling; fragment decoding; missing data decoding; noise robust speech recognition

Indexed keywords

ACOUSTIC EVENTS; ADAPTIVE NOISE; DECODING SYSTEM; HIGH ENERGY; HIGH ENERGY REGIONS; MISSING DATA; MODEL ESTIMATES; NOISE FLOOR; NOISE MODELING; NOISE MODELS; NOISE ROBUST SPEECH RECOGNITION; NOISE-ROBUST AUTOMATIC SPEECH RECOGNITION; SOURCE SEPARATION; TARGET SPEECH;

ACOUSTIC NOISE; DECODING; FLOORS; HIGH ENERGY PHYSICS; SPEECH COMMUNICATION; SPURIOUS SIGNAL NOISE;

SPEECH RECOGNITION;

EID: 84856140165 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2011.2165945 Document Type: Article

Times cited : (9)

References (42)

1
- 85032751593
- Research developments and directions in speech recognition and understanding, Part 1
- May
- J. M. Baker, L. Deng, J. Glass, S. Khudanpur, C. -H. Lee, N. Morgan, and D. O'Shaughnessy, "Research developments and directions in speech recognition and understanding, Part 1, " IEEE Signal Process. Mag. , vol. 26, no. 3, pp. 75-80, May 2009.
- (2009) IEEE Signal Process. Mag. , vol.26 , Issue.3 , pp. 75-80
- Baker, J.M.¹ Deng, L.² Glass, J.³ Khudanpur, S.⁴ Lee, C.-H.⁵ Morgan, N.⁶ O'Shaughnessy, D.⁷

2
- 0026882842
- Experiments with a nonlinear spectral subtractor (NSS), hidden Markov models and the projection, for robust speech recognition in cars
- P. Lockwood and J. Boudy, "Experiments with nonlinear spectral subtractor (NSS), hidden Markov models and the projection for robust speech recognition in cars, " Speech Commun. , vol. 11, pp. 215-228, 1992. (Pubitemid 23572493)
- (1992) Speech Communication , vol.11 , Issue.2-3 , pp. 215-228
- Lockwood, P.¹ Boudy, J.²

3
- 0035396555
- Noise power spectral density estimation based on optimal smoothing and minimum statistics
- DOI 10.1109/89.928915, PII S106366760104980X
- R. Martin, "Noise power spectral density estimation based on optimal smoothing and minimum statistics, " IEEE Trans. Speech. Audio Process. , vol. 9, no. 5, pp. 504-512, Jul. 2001. (Pubitemid 32631178)
- (2001) IEEE Transactions on Speech and Audio Processing , vol.9 , Issue.5 , pp. 504-512
- Martin, R.¹

4
- 0035342414
- Robust automatic speech recognition with missing and unreliable acoustic data
- DOI 10.1016/S0167-6393(00)00034-0, PII S0167639300000340
- M. Cooke, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and uncertain acoustic data, " Speech Commun. , vol. 34, pp. 267-285, 2001. (Pubitemid 32284867)
- (2001) Speech Communication , vol.34 , Issue.3 , pp. 267-285
- Cooke, M.¹ Green, P.² Josifovski, L.³ Vizinho, A.⁴

5
- 4644317224
- A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition
- M. Seltzer, B. Raj, and R. Stern, "A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition, " Speech Commun. , vol. 43, pp. 379-393, 2004.
- (2004) Speech Commun. , vol.43 , pp. 379-393
- Seltzer, M.¹ Raj, B.² Stern, R.³

6
- 11144316019
- Decoding speech in the presence of other sources
- DOI 10.1016/j.specom.2004.05.002, PII S0167639304000615
- J. Barker, M. Cooke, and D. Ellis, "Decoding speech in the presence of other sources, " Speech Commun. , vol. 45, pp. 5-25, 2005. (Pubitemid 40034706)
- (2005) Speech Communication , vol.45 , Issue.1 , pp. 5-25
- Barker, J.P.¹ Cooke, M.P.² Ellis, D.P.W.³

7
- 0036291376
- Uncertainty decoding with splice for noise robust speech recognition
- J. Droppo, L. Deng, and A. Acero, "Uncertainty decoding with splice for noise robust speech recognition, " in Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , 2002, pp. 57-60.
- (2002) Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , pp. 57-60
- Droppo, J.¹ Deng, L.² Acero, A.³

8
- 18744401086
- Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion
- DOI 10.1109/TSA.2005.845814
- L. Deng, J. Droppo, and A. Acero, "Dynamic compensation of HMM variances using the feature enhancement uncertainty computed from a parametric model of speech distortion, " IEEE Trans. Speech. Audio Process. , vol. 13, no. 3, pp. 412-421, May 2005. (Pubitemid 40666175)
- (2005) IEEE Transactions on Speech and Audio Processing , vol.13 , Issue.3 , pp. 412-421
- Deng, L.¹ Droppo, J.² Acero, A.³

9
- 40249103761
- Issues with uncertainty decoding for noise robust automatic speech recognition
- H. Liao and M. Gales, "Issues with uncertainty decoding for noise robust automatic speech recognition, " Speech Commun. , vol. 50, pp. 265-277, 2008.
- (2008) Speech Commun. , vol.50 , pp. 265-277
- Liao, H.¹ Gales, M.²

10
- 0025681008
- Hidden Markov model decomposition of speech and noise
- A. Varga and R. Moore, "Hidden Markov model decomposition of speech and noise, " in Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , 1990, pp. 845-848.
- (1990) Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , pp. 845-848
- Varga, A.¹ Moore, R.²

11
- 85135375893
- HMM recognition in noise using parallel model combination
- Berlin
- M. Gales and S. Young, "HMM recognition in noise using parallel model combination, " in Proc. Eurospeech, Berlin, 1993.
- (1993) Proc. Eurospeech
- Gales, M.¹ Young, S.²

12
- 85009074657
- ALGONQUIN: Iterating Laplace's method to remove multiple types of distortion for robust speech recognition
- Aalborg, Denmark
- B. Frey, L. Deng, A. Acero, and T. Kristjansson, "ALGONQUIN: Iterating Laplace's method to remove multiple types of distortion for robust speech recognition, " in Proc. Eurospeech, Aalborg, Denmark, 2001, pp. 901-904.
- (2001) Proc. Eurospeech , pp. 901-904
- Frey, B.¹ Deng, L.² Acero, A.³ Kristjansson, T.⁴

13
- 69249222720
- Super-human multi-talker speech recognition: A graphical modeling approach
- J. R. Hershey, S. J. Rennie, and P. A. Olsen, "Super-human multi-talker speech recognition: A graphical modeling approach, " Comput. Speech. Lang. , vol. 24, pp. 45-66, 2010.
- (2010) Comput. Speech. Lang. , vol.24 , pp. 45-66
- Hershey, J.R.¹ Rennie, S.J.² Olsen, P.A.³

14
- 85032751986
- Single-channel multitalker speech recognition
- S. J. Rennie, J. R. Hershey, and P. A. Olsen, "Single-channel multitalker speech recognition, " IEEE Signal Process. Mag. , vol. 27, pp. 66-80, 2010.
- (2010) IEEE Signal Process. Mag. , vol.27 , pp. 66-80
- Rennie, S.J.¹ Hershey, J.R.² Olsen, P.A.³

15
- 69249202377
- Monaural speech separation and recognition challenge
- M. Cooke, J. Hershey, and S. Rennie, "Monaural speech separation and recognition challenge, " Comput. Speech. Lang. , vol. 24, pp. 1-15, 2010.
- (2010) Comput. Speech. Lang. , vol.24 , pp. 1-15
- Cooke, M.¹ Hershey, J.² Rennie, S.³

16
- 50249152311
- Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria
- Mar
- T. Virtanen, "Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria, " IEEE Trans. Audio. Speech. , vol. 15, no. 3, pp. 1066-1074, Mar. 2007.
- (2007) IEEE Trans. Audio. Speech. , vol.15 , Issue.3 , pp. 1066-1074
- Virtanen, T.¹

17
- 44949110218
- Single-channel speech separation using sparse non-negative matrix factorization
- Pittsburgh, PA
- M. N. Schmidt and R. K. Olsson, "Single-channel speech separation using sparse non-negative matrix factorization, " in Proc. Interspeech, Pittsburgh, PA, 2006, pp. 2614-2617.
- (2006) Proc. Interspeech , pp. 2614-2617
- Schmidt, M.N.¹ Olsson, R.K.²

18
- 4344607755
- Likelihood-maximizing beamforming for robust hands-free speech recognition
- Sep
- M. Seltzer, B. Raj, and R. Stern, "Likelihood-maximizing beamforming for robust hands-free speech recognition, " IEEE Trans. Speech. Audio Process. , vol. 12, no. 5, pp. 489-498, Sep. 2004.
- (2004) IEEE Trans. Speech. Audio Process. , vol.12 , Issue.5 , pp. 489-498
- Seltzer, M.¹ Raj, B.² Stern, R.³

19
- 34250689497
- Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears
- DOI 10.1109/IROS.2006.281741, 4058472, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2006
- R. Takeda, S. Yamamoto, K. Komatani, T. Ogata, and H. Okuno, "Missing-feature based speech recognition for two simultaneous speech signals separated by ICA with a pair of humanoid ears, " in IEEE/RSJ Int. Conf. Intell. Robots Syst. , 2006, pp. 878-885. (Pubitemid 46927954)
- (2006) IEEE International Conference on Intelligent Robots and Systems , pp. 878-885
- Takeda, R.¹ Yamamoto, S.² Komatani, K.³ Ogata, T.⁴ Okuno, H.G.⁵

20
- 79959845286
- The CHiME corpus: A resource and a challenge for Computational Hearing in Multisource Environments
- H. Christensen, J. Barker, N. Ma, and P. Green, "The CHiME corpus: A resource and a challenge for Computational Hearing in Multisource Environments, " in Proc. Interspeech, 2010.
- (2010) Proc. Interspeech
- Christensen, H.¹ Barker, J.² Ma, N.³ Green, P.⁴

21
- 0002059527
- Understanding speech understanding: Towards a unified theory of speech perception
- U. K.
- S. Greenberg, W. Ainsworth and S. Greenberg, Eds. , "Understanding speech understanding: Towards a unified theory of speech perception, " in Proc. ESCA Workshop Auditory Basis Speech Percept. , U. K. , 1996, pp. 1-8.
- (1996) Proc. ESCA Workshop Auditory Basis Speech Percept. , pp. 1-8
- Greenberg, S.¹ Ainsworth, W.² Greenberg, S.³

22
- 0002296637
- On the importance of time - A temporal representation of sound
- M. Cooke, S. Beet, and M. Crawford, Eds. Sussex, U. K. : Wiley
- M. Slaney and R. Lyon, "On the importance of time - A temporal representation of sound, " in Visual Representations of Speech Signals, M. Cooke, S. Beet, and M. Crawford, Eds. Sussex, U. K. : Wiley, 1993, pp. 95-116.
- (1993) Visual Representations of Speech Signals , pp. 95-116
- Slaney, M.¹ Lyon, R.²

23
- 0344581050
- Temporal integration and context effects in hearing
- DOI 10.1016/S0095-4470(03)00011-1
- B. C. J. Moore, "Temporal integration and context effects in hearing, " in J. Phonetics, 2003, vol. 31, pp. 563-574. (Pubitemid 37495928)
- (2003) Journal of Phonetics , vol.31 , Issue.3-4 , pp. 563-574
- Moore, B.C.J.¹

24
- 0003549684
- New York: Van Nostrand
- H. Fletcher, Speech and Hearing in Communication. New York: Van Nostrand, 1953.
- (1953) Speech and Hearing in Communication
- Fletcher, H.¹

25
- 0029249228
- Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits
- R. Warren, K. Riener, J. Bashford, and B. Brubaker, "Spectral redundancy: Intelligibility of sentences heard through narrow spectral slits, " Percept. Psychophys. , vol. 57, pp. 175-182, 1995.
- (1995) Percept. Psychophys. , vol.57 , pp. 175-182
- Warren, R.¹ Riener, K.² Bashford, J.³ Brubaker, B.⁴

26
- 0036713102
- The intelligibility of speech with "holes" in the spectrum
- K. Kasturi, P. C. Loizou, M. Dorman, and T. Spahr, "The intelligibility of speech with "holes" in the spectrum, " J. Acoust. Soc. Amer. , vol. 112, pp. 1102-1111, 2002.
- (2002) J. Acoust. Soc. Amer. , vol.112 , pp. 1102-1111
- Kasturi, K.¹ Loizou, P.C.² Dorman, M.³ Spahr, T.⁴

27
- 33644661135
- A glimpsing model of speech perception in noise
- DOI 10.1121/1.2166600
- M. Cooke, "A glimpsing model of speech perception in noise, " J. Acoust. Soc. Amer. , vol. 119, pp. 1562-1573, 2006. (Pubitemid 43326025)
- (2006) Journal of the Acoustical Society of America , vol.119 , Issue.3 , pp. 1562-1573
- Cooke, M.¹

28
- 4644336054
- Reconstruction of missing features for robust speech recognition
- B. Raj, M. Seltzer, and R. Stern, "Reconstruction of missing features for robust speech recognition, " Speech Commun. , vol. 43, pp. 275-296, 2004.
- (2004) Speech Commun. , vol.43 , pp. 275-296
- Raj, B.¹ Seltzer, M.² Stern, R.³

29
- 85009063707
- Soft decisions in missing data techniques for robust automatic speech recognition
- Beijing, China
- J. Barker, L. Josifovski, M. Cooke, and P. Green, "Soft decisions in missing data techniques for robust automatic speech recognition, " in Proc. ICSLP, Beijing, China, 2000, pp. 373-376.
- (2000) Proc. ICSLP , pp. 373-376
- Barker, J.¹ Josifovski, L.² Cooke, M.³ Green, P.⁴

30
- 11144343436
- Detection of reliable features for speech recognition in noisy conditions using a statistical criterion
- Aalborg, Denmark
- P. Renevey and A. Drygajlo, "Detection of reliable features for speech recognition in noisy conditions using a statistical criterion, " in Proc. CRAC, Aalborg, Denmark, 2001.
- (2001) Proc. CRAC
- Renevey, P.¹ Drygajlo, A.²

31
- 33847629729
- On noise masking for automatic missing data speech recognition: A survey and discussion
- DOI 10.1016/j.csl.2006.08.001, PII S0885230806000301
- C. Cerisara, S. Demange, and J. Haton, "On noise masking for automatic missing data speech recognition: A survey and discussion, " Comput. Speech. Lang. , vol. 21, pp. 443-457, 2007. (Pubitemid 46367508)
- (2007) Computer Speech and Language , vol.21 , Issue.3 , pp. 443-457
- Cerisara, C.¹ Demange, S.² Haton, J.-P.³

32
- 0041360463
- Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
- Sep
- I. Cohen, "Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, " IEEE Trans. Speech. Audio Process. , vol. 11, no. 5, pp. 466-475, Sep. 2003.
- (2003) IEEE Trans. Speech. Audio Process. , vol.11 , Issue.5 , pp. 466-475
- Cohen, I.¹

33
- 29444448046
- A noise-estimation algorithm for highly non-stationary environments
- DOI 10.1016/j.specom.2005.08.005, PII S0167639305002001
- S. Rangachari and P. C. Loizou, "A noise-estimation algorithm for highly non-stationary environments, " Speech Commun. , vol. 48, pp. 220-231, 2006. (Pubitemid 43012033)
- (2006) Speech Communication , vol.48 , Issue.2 , pp. 220-231
- Rangachari, S.¹ Loizou, P.C.²

34
- 0034244889
- Learning patterns of activity using realtime tracking
- Aug
- C. Stauffer and W. Grimson, "Learning patterns of activity using realtime tracking, " IEEE Trans. Pattern Anal. Mach. Intell. , vol. 22, no. 8, pp. 747-757, Aug. 2000.
- (2000) IEEE Trans. Pattern Anal. Mach. Intell. , vol.22 , Issue.8 , pp. 747-757
- Stauffer, C.¹ Grimson, W.²

35
- 0025110885
- Derivation of auditory filter shapes from notched-noise data
- DOI 10.1016/0378-5955(90)90170-T
- B. Glasberg and B. Moore, "Derivation of auditory filter shapes from notched-noise data, " Hearing Res. , vol. 47, pp. 103-138, 1990. (Pubitemid 20244652)
- (1990) Hearing Research , vol.47 , Issue.1-2 , pp. 103-138
- Glasberg, B.R.¹ Moore, B.C.J.²

36
- 34748817500
- Exploiting correlogram structure for robust speech recognition with multiple speech sources
- DOI 10.1016/j.specom.2007.05.003, PII S016763930700088X
- N. Ma, P. Green, J. Barker, and A. Coy, "Exploiting correlogram structure for robust speech recognition with multiple speech sources, "Speech Commun. , vol. 49, pp. 874-891, 2007. (Pubitemid 47488511)
- (2007) Speech Communication , vol.49 , Issue.12 , pp. 874-891
- Ma, N.¹ Green, P.² Barker, J.³ Coy, A.⁴

37
- 85009106519
- Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise
- Aalborg, Denmark
- J. Barker, M. Cooke, and P. Green, "Robust ASR based on clean speech models: An evaluation of missing data techniques for connected digit recognition in noise, " in Proc. Eurospeech, Aalborg, Denmark, 2001, pp. 213-216.
- (2001) Proc. Eurospeech , pp. 213-216
- Barker, J.¹ Cooke, M.² Green, P.³

38
- 0001463644
- A duplex theory of pitch perception
- J. Licklider, "A duplex theory of pitch perception, " Experientia, vol. 7, pp. 128-134, 1951.
- (1951) Experientia , vol.7 , pp. 128-134
- Licklider, J.¹

39
- 0025623060
- A perceptual pitch detector
- Albequerque, NM
- M. Slaney and R. Lyon, "A perceptual pitch detector, " in Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , Albequerque, NM, 1990, pp. 357-360.
- (1990) Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , pp. 357-360
- Slaney, M.¹ Lyon, R.²

40
- 33750368310
- An audio-visual corpus for speech perception and automatic speech recognition
- DOI 10.1121/1.2229005
- M. Cooke, J. Barker, S. Cunningham, and X. Shao, "An audio-visual corpus for speech perception and automatic speech recognition, " J. Acoust. Soc. Amer. , vol. 120, pp. 2421-2424, 2006. (Pubitemid 44631681)
- (2006) Journal of the Acoustical Society of America , vol.120 , Issue.5 , pp. 2421-2424
- Cooke, M.¹ Barker, J.² Cunningham, S.³ Shao, X.⁴

41
- 0024909979
- Some statistical issues in the comparison of speech recognition algorithms
- L. Gillick and S. Cox, "Some statistical issues in the comparison of speech recognition algorithms, " in Proc. IEEE Int. Conf. Acoust. , Speech, Signal Process. , 1989, pp. 532-535. (Pubitemid 20604171)
- (1989) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 532-535
- Gillick, L.¹ Cox, S.J.²

42
- 0031268341
- Factorial hidden markov models
- Z. Ghahramani and M. I. Jordan, "Factorial hidden Markov models, " Mach. Learn. , vol. 29, pp. 245-273, 1997. (Pubitemid 127510040)
- (1997) Machine Learning , vol.29 , Issue.2-3 , pp. 245-273
- Ghahramani, Z.¹ Jordan, M.I.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.