SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 21, Issue 10, 2013, Pages 1993-2005

A direct masking approach to robust ASR

(4) Hartmann, William a Narayanan, Arun a Fosler Lussier, Eric a Wang, Deliang a,b

a The Ohio State University (United States)

b OHIO STATE UNIVERSITY (United States)

Author keywords

Direct masking; ideal binary mask; robust automatic speech recognition

Indexed keywords

BINARY MASKS; CEPSTRAL FEATURES; IDEAL BINARY MASK; LARGE VOCABULARY; MISSING ENERGY; MISSING FEATURES; ROBUST AUTOMATIC SPEECH RECOGNITION; SPEECH SEGREGATION;

ACOUSTICS; ELECTRICAL ENGINEERING;

SPEECH RECOGNITION;

EID: 84881088302 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2013.2263802 Document Type: Article

Times cited : (35)

References (44)

1
- 0029288202
- Speech recognition in noisy environments: A survey
- Y. Gong, "Speech recognition in noisy environments: A survey," Speech Commun., vol. 16, pp. 261-291, 1995.
- (1995) Speech Commun. , vol.16 , pp. 261-291
- Gong, Y.¹

2
- 0030245128
- Robust continuous speech recognition using parallel model combination
- Sep.
- M. J. F. Gales and S. J. Young, "Robust continuous speech recognition using parallel model combination," IEEE Trans. Speech Audio Process., vol. 4, no. 5, pp. 352-359, Sep. 1996.
- (1996) IEEE Trans. Speech Audio Process. , vol.4 , Issue.5 , pp. 352-359
- Gales, M.J.F.¹ Young, S.J.²

3
- 0027166410
- Recognition of speech in additive and convolutional noise based on RASTA spectral processing
- H. Hermansky, N. Morgan, and H.-G. Hirsch, "Recognition of speech in additive and convolutional noise based on RASTA spectral processing," in Proc. ICASSP, 1993, vol. 10, pp. 509-512.
- (1993) Proc. ICASSP , vol.10 , pp. 509-512
- Hermansky, H.¹ Morgan, N.² Hirsch, H.-G.³

4
- 0018455310
- Suppression of acoustic noise in speech using spectral subtraction
- Apr.
- S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-27, no. 2, pp. 113-120, Apr. 1979.
- (1979) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-27 , Issue.2 , pp. 113-120
- Boll, S.F.¹

5
- 0003684441
- Cambridge MA, USA: MIT Press
- A. S. Bregman, Auditory Scene Analysis. Cambridge, MA, USA: MIT Press, 1994.
- (1994) Auditory Scene Analysis
- Bregman, A.S.¹

6
- 84892233308
- On ideal binary mask as the computational goal of auditory scene analysis
- P. Divenyi, Ed. Norwell, MA, USA: Kluwer
- D. L. Wang, "On ideal binary mask as the computational goal of auditory scene analysis," in Speech Separation by Humans and Machines, P. Divenyi, Ed. Norwell, MA, USA: Kluwer, 2005, pp. 181-197.
- (2005) Speech Separation by Humans and Machines , pp. 181-197
- Wang, D.L.¹

7
- 85032752225
- Missing-feature approaches in speech recognition
- Sep.
- B. Raj and R. M. Stern, "Missing-feature approaches in speech recognition," IEEE Signal Process. Mag., vol. 22, no. 2, pp. 101-116, Sep. 2005.
- (2005) IEEE Signal Process. Mag. , vol.22 , Issue.2 , pp. 101-116
- Raj, B.¹ Stern, R.M.²

8
- 84877621926
- The role of binary mask patterns in automatic speech recognition in background noise
- A. Narayanan and D. L. Wang, "The role of binary mask patterns in automatic speech recognition in background noise," J. Acoust. Soc. Amer., vol. 133, no. 5, pp. 8083-8093, 2013.
- (2013) J. Acoust. Soc. Amer. , vol.133 , Issue.5 , pp. 8083-8093
- Narayanan, A.¹ Wang, D.L.²

9
- 0021176902
- The GRASP sound separation system
- M. Weintraub, "The GRASP sound separation system," in Proc. IEEE ICASSP, 1984, pp. 18A.6.1-18A.6.4.
- (1984) Proc. IEEE ICASSP
- Weintraub, M.¹

10
- 0028531926
- Computational auditory scene analysis
- G. J. Brown and M. Cooke, "Computational auditory scene analysis," Comput. Speech Lang., vol. 8, pp. 297-336, 1994.
- (1994) Comput. Speech Lang. , vol.8 , pp. 297-336
- Brown, G.J.¹ Cooke, M.²

11
- 0032682770
- Separation of speech from interfering sounds based on oscillatory correlation
- May
- D. L. Wang and G. J. Brown, "Separation of speech from interfering sounds based on oscillatory correlation," IEEE Trans. Neural Netw., vol. 10, no. 3, pp. 684-697, May 1999.
- (1999) IEEE Trans. Neural Netw. , vol.10 , Issue.3 , pp. 684-697
- Wang, D.L.¹ Brown, G.J.²

12
- 64649103540
- Speech intelligibility in background noise with ideal binary timefre-quency masking
- D. L. Wang, U. Kjems, M. S. Pedersen, J. B. Boldt, and T. Lunner, "Speech intelligibility in background noise with ideal binary timefre-quency masking," J. Acoust. Soc. Amer., vol. 125, pp. 2336-2347, 2009.
- (2009) J. Acoust. Soc. Amer. , vol.125 , pp. 2336-2347
- Wang, D.L.¹ Kjems, U.² Pedersen, M.S.³ Boldt, J.B.⁴ Lunner, T.⁵

13
- 0035342414
- Robust automatic speech recognition with missing and unreliable acoustic data
- M. Cooke, P. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and unreliable acoustic data," Speech Commun., vol. 34, pp. 267-285, 2001.
- (2001) Speech Commun. , vol.34 , pp. 267-285
- Cooke, M.¹ Green, P.² Josifovski, L.³ Vizinho, A.⁴

14
- 4644336054
- Reconstruction of missing features for robust speech recognition
- B. Raj, M. L. Seltzer, and R. M. Stern, "Reconstruction of missing features for robust speech recognition," Speech Commun., vol. 43, pp. 275-296, 2004.
- (2004) Speech Commun. , vol.43 , pp. 275-296
- Raj, B.¹ Seltzer, M.L.² Stern, R.M.³

15
- 77957739976
- Advances in missing feature techniques for robut large-vocabulary continuous speech recognition
- Jan.
- M. V. Segbroeck and H. V. Hamme, "Advances in missing feature techniques for robut large-vocabulary continuous speech recognition," IEEE Trans. Acoust., Speech, Signal Process., vol. 19, no. 1, pp. 123-137, Jan. 2011.
- (2011) IEEE Trans. Acoust., Speech, Signal Process. , vol.19 , Issue.1 , pp. 123-137
- Segbroeck, M.V.¹ Hamme, H.V.²

16
- 80051633766
- Investigations into the incorporation of the ideal binary mask in ASR
- Prague, Czech Republic, May
- W. Hartmann and E. Fosler-Lussier, "Investigations into the incorporation of the ideal binary mask in ASR," in Proc. IEEE ICASSP, Prague, Czech Republic, May 2011, pp. 4804-4807.
- (2011) Proc. IEEE ICASSP , pp. 4804-4807
- Hartmann, W.¹ Fosler-Lussier, E.²

17
- 0001556285
- Recognising occluded speech
- M. Cooke, A. Morris, and P. Green, "Recognising occluded speech," in Proc. ESCA Workshop Auditory Basis of Speech Percept., 1996, pp. 297-300.
- (1996) Proc. ESCA Workshop Auditory Basis of Speech Percept. , pp. 297-300
- Cooke, M.¹ Morris, A.² Green, P.³

18
- 84869001637
- Handling missing data in speech recognition
- M. Cooke, P. Green, and M. Crawford, "Handling missing data in speech recognition," in Proc. ICSLP, 1994.
- (1994) Proc. ICSLP
- Cooke, M.¹ Green, P.² Crawford, M.³

19
- 0000652102
- Some solutions to the missing feature problem in vision
- S. J. Hanson, J. D. Cowen, and C. L. Giles, Eds. San Mateo, CA, USA: Morgan Kaufmann
- S. Ahmad and V. Tresp, "Some solutions to the missing feature problem in vision," in Advances in Neural Information Processing Systems 5 (NIPS'92), S. J. Hanson, J. D. Cowen, and C. L. Giles, Eds. San Mateo, CA, USA: Morgan Kaufmann, 1993.
- (1993) Advances in Neural Information Processing Systems 5 (NIPS'92)
- Ahmad, S.¹ Tresp, V.²

20
- 16344396527
- Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise
- R. Lippmann and B. A. Carlson, "Using missing feature theory to actively select features for robust speech recognition with interruptions, filtering, and noise," in Proc. Eurospeech'97, 1997, pp. 37-40.
- (1997) Proc. Eurospeech'97 , pp. 37-40
- Lippmann, R.¹ Carlson, B.A.²

21
- 0019053271
- Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences
- Aug.
- S. B. Davis and P. Mermelstein, "Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences," IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-28, no. 4, pp. 357-366, Aug. 1980.
- (1980) IEEE Trans. Acoust., Speech, Signal Process. , vol.ASSP-28 , Issue.4 , pp. 357-366
- Davis, S.B.¹ Mermelstein, P.²

22
- 33750311718
- Binary and ratio time-frequency masks for robust speech recognition
- S. Srinivasan, N. Roman, and D. L. Wang, "Binary and ratio time-frequency masks for robust speech recognition," Speech Commun., vol. 48, pp. 1486-1501, 2006.
- (2006) Speech Commun. , vol.48 , pp. 1486-1501
- Srinivasan, S.¹ Roman, N.² Wang, D.L.³

23
- 56249136428
- Transforming binary uncertainties for robust speech recognition
- Sep.
- S. Srinivasan and D. L. Wang, "Transforming binary uncertainties for robust speech recognition," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 7, pp. 2130-2140, Sep. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.7 , pp. 2130-2140
- Srinivasan, S.¹ Wang, D.L.²

24
- 69249203845
- Monaural speech separation based on MAXVQ and CASA for robust speech recognition
- Jan.
- P. Li, Y. Guan, S. Wang, B. Xu, and W. Liu, "Monaural speech separation based on MAXVQ and CASA for robust speech recognition," Comput. Speech Lang., vol. 24, no. 1, pp. 30-44, Jan. 2010.
- (2010) Comput. Speech Lang. , vol.24 , Issue.1 , pp. 30-44
- Li, P.¹ Guan, Y.² Wang, S.³ Xu, B.⁴ Liu, W.⁵

25
- 85009063707
- Soft decisions in missing data techniques for robust automatic speech recognition
- Beijing, China
- J. Barker, L. Josifovski, M. Cooke, and P. Green, "Soft decisions in missing data techniques for robust automatic speech recognition," in Proc. Int. Conf. Spoken Lang., Beijing, China, 2000, pp. 373-376.
- (2000) Proc. Int. Conf. Spoken Lang , pp. 373-376
- Barker, J.¹ Josifovski, L.² Cooke, M.³ Green, P.⁴

26
- 84867596016
- A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition
- J. V. Hout and A. Alwan, "A novel approach to soft-mask estimation and log-spectral enhancement for robust speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2012, pp. 4105-4108.
- (2012) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 4105-4108
- Hout, J.V.¹ Alwan, A.²

27
- 77956506956
- Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions
- Nov.
- W. Kim and J. H. L. Hansen, "Missing-feature reconstruction by leveraging temporal spectral correlation for robust speech recognition in background noise conditions," IEEE Trans. Audio, Speech, Lang. Process., vol. 18, no. 8, pp. 2111-2120, Nov. 2010.
- (2010) IEEE Trans. Audio, Speech, Lang. Process. , vol.18 , Issue.8 , pp. 2111-2120
- Kim, W.¹ Hansen, J.H.L.²

28
- 0021226391
- A database for speaker-independent digit recognition
- R. G. Leonard, "A database for speaker-independent digit recognition," in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 1984, pp. 111-114.
- (1984) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 111-114
- Leonard, R.G.¹

29
- 84867227925
- Noise reduction through compressed sensing
- J. Gemmeke and B. Cranen, "Noise reduction through compressed sensing," in Proc. Interspeech, 2008.
- (2008) Proc. Interspeech
- Gemmeke, J.¹ Cranen, B.²

30
- 84873833546
- Multi-candidate missing data imputation for robust speech recognition
- doi:10.1186/1687-4722-2012-17
- Y. Wang and H. V. Hamme, "Multi-candidate missing data imputation for robust speech recognition," EURASIP J. Audio, Speech, Music Process., vol. 17, 2012, doi:10.1186/1687-4722-2012-17.
- (2012) EURASIP J. Audio, Speech, Music Process. , vol.17
- Wang, Y.¹ Hamme, H.V.²

31
- 85009227702
- Analysis of the aurora large vocabulary extensions
- Geneva, Switzerland, Sep.
- N. Parihar and J. Picone, "Analysis of the aurora large vocabulary extensions," in Proc. Eurospeech, Geneva, Switzerland, Sep. 2003, vol. 4, pp. 337-340.
- (2003) Proc. Eurospeech , vol.4 , pp. 337-340
- Parihar, N.¹ Picone, J.²

32
- 11144316019
- Decoding speech in the presence of other sources
- J. Barker, M. Cooke, and D. P. W. Ellis, "Decoding speech in the presence of other sources," Speech Commun., vol. 45, pp. 5-25, 2005.
- (2005) Speech Commun. , vol.45 , pp. 5-25
- Barker, J.¹ Cooke, M.² Ellis, D.P.W.³

33
- 70350038037
- Robust speech recognition by integrating speech separation and hypothesis testing
- S. Srinivasan and D. L. Wang, "Robust speech recognition by integrating speech separation and hypothesis testing," Speech Commun., vol. 52, pp. 72-81, 2010.
- (2010) Speech Commun. , vol.52 , pp. 72-81
- Srinivasan, S.¹ Wang, D.L.²

34
- 82255178542
- New York, NY, USA: Wiley-IEEE Press
- D. L. Wang and G. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. New York, NY, USA: Wiley-IEEE Press, 2006.
- (2006) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications
- Wang, D.L.¹ Brown, G.²

35
- 84881107969
- J. Barker, M. Cooke, and D. P. Ellis, The RESPITE-CASA-Toolkit Project [Online]. Available: http://staffwww.dcs.shef.ac.uk/people/J. Barker/ctk.html 2002
- (2002) The RESPITE-CASA-Toolkit Project
- Barker, J.¹ Cooke, M.² Ellis, D.P.³

36
- 56749102248
- Ph.D. dissertation, The Ohio State Univ., Columbus, OH, USA
- S. Srinivasan, "Integrating computational auditory scene analysis and automatic speech recognition," Ph.D. dissertation, The Ohio State Univ., Columbus, OH, USA, 2006.
- (2006) Integrating Computational Auditory Scene Analysis and Automatic Speech Recognition
- Srinivasan, S.¹

37
- 0003982501
- Ph.D. dissertation, Stanford Univ., Stanford, NY, USA
- M. Weintraub, "A theory and computational model of computational auditory scene analysis," Ph.D. dissertation, Stanford Univ., Stanford, NY, USA, 1985.
- (1985) A Theory and Computational Model of Computational Auditory Scene Analysis
- Weintraub, M.¹

38
- 0003822743
- Cambridge U.K.: Cambridge Univ. Publishing Dept.
- S. Young, G. Evermann, T. Hain, D. Kershaw, G. Moore, J. Odell, D. Ollason, D. Povey, V. Valtchev, and P. Woodland, The HTK Book. Cambridge, U.K.: Cambridge Univ. Publishing Dept., 2002 [Online]. Available: http://htk.eng.cam. ac.uk
- (2002) The HTK Book
- Young, S.¹ Evermann, G.² Hain, T.³ Kershaw, D.⁴ Moore, G.⁵ Odell, J.⁶ Ollason, D.⁷ Povey, D.⁸ Valtchev, V.⁹ Woodland, P.¹⁰

39
- 0004319968
- Speech Research Unit, Defense Research Agency, Malvern, UK Tech. Rep
- A. P. Varga, H. J. M. Steeneken, M. Tomlinson, and D. Jones, "The NOISEX-92 study on the effect of additive noise on automatic speech recognition," Speech Research Unit, Defense Research Agency, Malvern, UK, 1992, Tech. Rep..
- (1992) The NOISEX-92 Study on the Effect of Additive Noise on Automatic Speech Recognition
- Varga, A.P.¹ Steeneken, H.J.M.² Tomlinson, M.³ Jones, D.⁴

40
- 85079095310
- The design of wall street journal-based CSR corpus
- Banff, AB, Canada, Oct.
- D. Paul and J. Baker, "The design of wall street journal-based CSR corpus," in Proc. Int. Conf. Spoken Lang., Banff, AB, Canada, Oct. 1992, pp. 899-902.
- (1992) Proc. Int. Conf. Spoken Lang , pp. 899-902
- Paul, D.¹ Baker, J.²

41
- 78049364397
- MMSE based noise PSD tracking with low complexity
- R. C. Hendriks, R. Heusdens, and J. Jensen, "MMSE based noise PSD tracking with low complexity," in Proc. IEEE ICASSP, 2010, pp. 4266-4269.
- (2010) Proc. IEEE ICASSP , pp. 4266-4269
- Hendriks, R.C.¹ Heusdens, R.² Jensen, J.³

42
- 51449104842
- Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors
- Aug.
- J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, "Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.6 , pp. 1741-1752
- Erkelens, J.S.¹ Hendriks, R.C.² Heusdens, R.³ Jensen, J.⁴

43
- 84877592730
- D. P. Ellis, J. A. Bilmes, E. Fosler-Lussier, H. Hermansky, D. Johnson, B. Kingsbury, and N. Morgan, "The SPRACHcore Software Package," [Online]. Available: http://www.icsi.berkeley.edu/~dpwe/projects/sprach/ sprachcore.html 2010
- (2010) The SPRACHcore Software Package
- Ellis, D.P.¹ Bilmes, J.A.² Fosler-Lussier, E.³ Hermansky, H.⁴ Johnson, D.⁵ Kingsbury, B.⁶ Morgan, N.⁷

44
- 42549139762
- MVA processing of speech features
- Jan.
- C.-P. Chen and J. A. Bilmes, "MVA processing of speech features," IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 257-270, Jan. 2007.
- (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , Issue.1 , pp. 257-270
- Chen, C.-P.¹ Bilmes, J.A.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.