SCOPUS 정보 검색 플랫폼

IEEE Transactions on Audio, Speech and Language Processing

Volumn 17, Issue 3, 2009, Pages 446-458

Energetic and informational masking effects in an audiovisual speech recognition system

b UNIVERSITY OF SHEFFIELD (United Kingdom)

Author keywords

Audiovisual speech recognition; Energetic masking (EM); Iinformational masking (IM); Source separation; Speech fragment decoding

Indexed keywords

ACOUSTIC ANALYSIS; ACOUSTIC SIGNALS; AUDIO-VISUAL SPEECH; AUDIOVISUAL SPEECH RECOGNITION; ENERGETIC MASKING; ENERGETIC MASKING (EM); IINFORMATIONAL MASKING (IM); INFORMATIONAL MASKING; NONSTATIONARY NOISE; SOURCE SEPARATION; TARGET SPEECH; TWO STAGE; VISUAL CUES; VISUAL INFORMATION; VISUAL REPRESENTATIONS;

ACOUSTICS; DECODING; SIGNAL ANALYSIS; SPEECH COMMUNICATION; TARGETS; VISUAL COMMUNICATION;

SPEECH RECOGNITION;

EID: 70350437422 PISSN: 15587916 EISSN: None Source Type: Journal
DOI: 10.1109/TASL.2008.2011534 Document Type: Article

Times cited : (14)

References (37)

1
- 0025604287
- How much masking is informational masking?
- R. Lufti, "How much masking is informational masking?, " J. Acoust. Soc. Amer., vol. 88, pp. 2607-2610, 1990.
- (1990) J. Acoust. Soc. Amer. , vol.88 , pp. 2607-2610
- Lufti, R.¹

2
- 0035169173
- Informational and energetic masking effects in the perception of multiple simultaneous talkers
- D. S. Brungart, B. D. Simpson, M. A. Ericson, and K. R. Scott, "Informational and energetic masking effects in the perception of multiple simultaneous talkers, " J. Acoust. Soc. Amer., vol. 110, no. 5, pp. 2527-2538, 2001.
- (2001) J. Acoust. Soc. Amer. , vol.110 , Issue.5 , pp. 2527-2538
- Brungart, D.S.¹ Simpson, B.D.² Ericson, M.A.³ Scott, K.R.⁴

3
- 0012323283
- Note on informational masking
- N. I. Durlach, C. R. Mason, G. Kidd, Jr., T. L. Arbogast, H. S. Colburn, and B. G. Shinn-Cunningham, "Note on informational masking, " J. Acoust. Soc. Amer., vol. 113, no. 6, pp. 2984-2987, 2003.
- (2003) J. Acoust. Soc. Amer. , vol.113 , Issue.6 , pp. 2984-2987
- Durlach, N.I.¹ Mason, C.R.² Kidd Jr., G.³ Arbogast, T.L.⁴ Colburn, H.S.⁵ Shinn-Cunningham, B.G.⁶

4
- 4544290191
- Recent advances in the automatic recognition of audiovisual speech
- G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, "Recent advances in the automatic recognition of audiovisual speech, " Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, 2003.
- (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
- Potamianos, G.¹ Neti, C.² Gravier, G.³ Garg, A.⁴ W. Sr., A.⁵

5
- 85133593587
- Audio-visual speech recognition using lip movement extracted from side-face images
- K. Iwano, S. Furui, T. Yoshinaga, and S. Tamura, "Audio-visual speech recognition using lip movement extracted from side-face images, " in Proc. Int.Workshop on Auditory-Visual Speech Process. (AVSP), 2003, pp. 117-120.
- (2003) Proc. Int.Workshop on Auditory-visual Speech Process (AVSP) , pp. 117-120
- Iwano, K.¹ Furui, S.² Yoshinaga, T.³ Tamura, S.⁴

6
- 34250756493
- Lipreading using profile versus frontal views
- Victoria, BC, Canada
- P. Lucey and G. Potamianos, "Lipreading using profile versus frontal views, " in Proc. IEEE Int. Workshop Multimedia Signal Process. (MMSP'06), Victoria, BC, Canada, 2006, pp. 24-28.
- (2006) Proc. IEEE Int. Workshop Multimedia Signal Process. (MMSP'06) , pp. 24-28
- Lucey, P.¹ Potamianos, G.²

7
- 34547505663
- Profile view lip reading
- K. Kumar, T. Chen, and R. M. Stern, "Profile view lip reading, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2007, vol. 4, pp. 429-432.
- (2007) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process , vol.4 , pp. 429-432
- Kumar, K.¹ Chen, T.² Stern, R.M.³

8
- 84871373273
- An extended pose-invariant lipreading system
- Hilvarenbeek, The Netherlands
- P. Lucey, G. Potamianos, and S. Sridharan, "An extended pose-invariant lipreading system, " in Proc. Int. Workshop Auditory-Visual Speech Process. (AVSP), Hilvarenbeek, The Netherlands, 2007.
- (2007) Proc. Int. Workshop Auditory-visual Speech Process. (AVSP)
- Lucey, P.¹ Potamianos, G.² Sridharan, S.³

9
- 85009230873
- Audio-visual speech recognition in challenging environments
- Geneva, Switzerland
- G. Potamianos and C. Neti, "Audio-visual speech recognition in challenging environments, " in Proc. Eurospeech'03, Geneva, Switzerland, 2003, pp. 1293-1296.
- (2003) Proc. Eurospeech'03 , pp. 1293-1296
- Potamianos, G.¹ Neti, C.²

10
- 43949091431
- Comparison of image transform based features for visual speech recognition in clean and corrupted video
- R. Seymour, D. Stewart, and J. Ming, "Comparison of image transform based features for visual speech recognition in clean and corrupted video, " EURASIP J. Image Video Process., vol. 8, no. 2, 2008.
- (2008) EURASIP J. Image Video Process. , vol.8 , Issue.2
- Seymour, R.¹ Stewart, D.² Ming, J.³

11
- 4544333803
- Seeing to hear better: Evidence for early audio-visual interactions in speech identification
- J.-L. Schwartz, F. Berthommier, and C. Savariaux, "Seeing to hear better: Evidence for early audio-visual interactions in speech identification, " Cognition, vol. 93, no. 2, pp. B69-B78, 2004.
- (2004) Cognition , vol.93 , Issue.2
- Schwartz, J.-L.¹ Berthommier, F.² Savariaux, C.³

12
- 0029935458
- Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading
- J. Driver, "Enhancement of selective listening by illusory mislocation of speech sounds due to lip-reading, " Nature, vol. 381, no. 6577, pp. 66-68, 1996.
- (1996) Nature , vol.381 , Issue.6577 , pp. 66-68
- Driver, J.¹

13
- 13544256368
- The role of visual speech cues in reducing energetic and informational masking
- K. S. Helfer and R. L. Freyman, "The role of visual speech cues in reducing energetic and informational masking, " J. Acoust. Soc. Amer., vol. 117, no. 2, pp. 842-849, 2005.
- (2005) J. Acoust. Soc. Amer. , vol.117 , Issue.2 , pp. 842-849
- Helfer, K.S.¹ Freyman, R.L.²

14
- 33745137835
- Informational masking of speech in children: Auditory-visual integration
- F. Wightman, D. Kistler, and D. Brungart, "Informational masking of speech in children: Auditory-visual integration, " J. Acoust. Soc. Amer., vol. 119, no. 6, pp. 3940-3949, 2006.
- (2006) J. Acoust. Soc. Amer. , vol.119 , Issue.6 , pp. 3940-3949
- Wightman, F.¹ Kistler, D.² Brungart, D.³

15
- 0032178592
- Quantitative association of vocal-tract and facial behavior
- H. Yehia, P. Rubin, and E. Vatikiotis-Bateson, "Quantitative association of vocal-tract and facial behavior, " Speech Commun., vol. 26, no. 1, pp. 23-43, 1998.
- (1998) Speech Commun. , vol.26 , Issue.1 , pp. 23-43
- Yehia, H.¹ Rubin, P.² Vatikiotis-Bateson, E.³

16
- 0036874551
- On the relationship between face movements, tongue movements, and speech acoustics
- J. Jiang, A. Alwan, P. A. Keating, E. T. Auer Jr., and L. Bernstein, "On the relationship between face movements, tongue movements, and speech acoustics, " EURASIP J. Adv. Signal Process., vol. 2002, no. 11, pp. 1174-1188, 2002.
- (2002) EURASIP J. Adv. Signal Process. , vol.2002 , Issue.11 , pp. 1174-1188
- Jiang, J.¹ Alwan, A.² Keating, P.A.³ Auer Jr., E.T.⁴ Bernstein, L.⁵

17
- 0012725678
- Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models
- San Francisco, CA
- J. Barker and F. Berthommier, "Estimation of speech acoustics from visual speech features: A comparison of linear and non-linear models, " in Proc. Int. Workshop Auditory-Visual Speech Process. (AVSP), San Francisco, CA, 1999.
- (1999) Proc. Int. Workshop Auditory-visual Speech Process. (AVSP)
- Barker, J.¹ Berthommier, F.²

18
- 10444247388
- Developing an audio-visual speech source separation algorithm
- D. Sodoyer, L. Girin, G. Jutter, and J.-L. Schwartz, "Developing an audio-visual speech source separation algorithm, " Speech Commun., vol. 44, pp. 113-125, 2007.
- (2007) Speech Commun. , vol.44 , pp. 113-125
- Sodoyer, D.¹ Girin, L.² Jutter, G.³ Schwartz, J.-L.⁴

19
- 77951429856
- Noisy audio speech enhancement using wiener filters derived from visual speech
- Hilvarenbeek, The Netherlands
- B. Milner and I. Almajai, "Noisy audio speech enhancement using Wiener filters derived from visual speech, " in Proc. Int. Workshop Auditory-Visual Speech Process. (AVSP), Hilvarenbeek, The Netherlands, 2007.
- (2007) Proc. Int. Workshop Auditory-visual Speech Process. (AVSP)
- Milner, B.¹ Almajai, I.²

20
- 70350498369
- Audiovisual speech source separation: A regularization method based on visual voice activity detection
- Hilvarenbeek, The Netherlands
- B. Rivet, L. Girin, C. Servière, D.-T. Pham, and C. Jutten, "Audiovisual speech source separation: A regularization method based on visual voice activity detection, " in Proc. Int. Workshop Auditory-Visual Speech Process. (AVSP), Hilvarenbeek, The Netherlands, 2007.
- (2007) Proc. Int. Workshop Auditory-visual Speech Process. (AVSP)
- Rivet, B.¹ Girin, L.² Servière, C.³ Pham, D.-T.⁴ Jutten, C.⁵

21
- 0036295990
- Noisy audio feature enhancement using audio-visual speech data
- Orlando, FL
- R. Goecke, G. Potamianos, and C. Neti, "Noisy audio feature enhancement using audio-visual speech data, " in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., Orlando, FL, 2002, pp. 2025-2028.
- (2002) Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. , pp. 2025-2028
- Goecke, R.¹ Potamianos, G.² Neti, C.³

22
- 33644661135
- A glimpsing model of speech perception in noise
- M. P. Cooke, "A glimpsing model of speech perception in noise, " J. Acoust. Soc. Amer., vol. 119, no. 3, pp. 1562-1573, 2006.
- (2006) J. Acoust. Soc. Amer. , vol.119 , Issue.3 , pp. 1562-1573
- Cooke, M.P.¹

23
- 0003684441
- Cambridge, MA: MIT Press
- A. S. Bregman, Auditory Scene Analysis. Cambridge, MA: MIT Press, 1990.
- (1990) Auditory Scene Analysis
- Bregman, A.S.¹

24
- 2142751288
- Ph.D. dissertation, Mass. Inst. Technol. Cambridge, MA
- P. Smaragdis, "Redundancy reduction for computational audition, a unifying approach, " Ph.D. dissertation, Mass. Inst. Technol., Cambridge, MA, 2001.
- (2001) Redundancy Reduction for Computational Audition, a Unifying Approach
- Smaragdis, P.¹

25
- 82255178542
- D.-L. Wang and G. J. Brown, Eds., New York: IEEE Press/Wiley-Interscience
- D.-L. Wang and G. J. Brown, Eds., Computational Auditory Scene Analysis: Principles, Algorithms, and Applications. New York: IEEE Press/Wiley- Interscience, 2007.
- (2007) Computational Auditory Scene Analysis: Principles, Algorithms, and Applications

26
- 44949219122
- Recent advances in speech fragment decoding techniques
- Pittsburgh, PA
- J. Barker, A. Coy, N. Ma, and M. Cooke, "Recent advances in speech fragment decoding techniques, " in Proc. Interspeech'06, Pittsburgh, PA, 2006, pp. 85-88.
- (2006) Proc. Interspeech'06 , pp. 85-88
- Barker, J.¹ Coy, A.² Ma, N.³ Cooke, M.⁴

27
- 11144316019
- Decoding speech in the presence of other sources
- J. P. Barker, M. P. Cooke, and D. P. W. Ellis, "Decoding speech in the presence of other sources, " Speech Commun., vol. 45, pp. 5-25, 2005.
- (2005) Speech Commun. , vol.45 , pp. 5-25
- Barker, J.P.¹ Cooke, M.P.² Ellis, D.P.W.³

28
- 0035342414
- Robust automatic speech recognition with missing and uncertain acoustic data
- M. P. Cooke, P. D. Green, L. Josifovski, and A. Vizinho, "Robust automatic speech recognition with missing and uncertain acoustic data, " Speech Commun., vol. 34, pp. 267-285, 2001.
- (2001) Speech Commun. , vol.34 , pp. 267-285
- Cooke, M.P.¹ Green, P.D.² Josifovski, L.³ Vizinho, A.⁴

29
- 0035478859
- The auditory organization of speech and other sources in listeners and computational models
- M. P. Cooke and D. P. W. Ellis, "The auditory organization of speech and other sources in listeners and computational models, " Speech Commun., vol. 35, pp. 141-177, 2001.
- (2001) Speech Commun. , vol.35 , pp. 141-177
- Cooke, M.P.¹ Ellis, D.P.W.²

30
- 33750368310
- An audiovisual corpus for speech perception and automatic speech recognition
- M. P. Cooke, J. Barker, S. P. Cunningham, and X. Shao, "An audiovisual corpus for speech perception and automatic speech recognition, " J. Acoust. Soc. Amer., vol. 120, pp. 2421-2424, 2006.
- (2006) J. Acoust. Soc. Amer. , vol.120 , pp. 2421-2424
- Cooke, M.P.¹ Barker, J.² Cunningham, S.P.³ Shao, X.⁴

31
- 37849011878
- The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception
- M. P. Cooke, M. L. Garcia Lecumberri, and J. Barker, "The foreign language cocktail party problem: Energetic and informational masking effects in non-native speech perception, " J. Acoust. Soc. Amer., vol. 123, pp. 414-427, 2008.
- (2008) J. Acoust. Soc. Amer. , vol.123 , pp. 414-427
- Cooke, M.P.¹ Lecumberri, M.L.G.² Barker, J.³

32
- 0025110885
- Derivation of auditory filter shapes from notched-noise data
- B. R. Glasberg and B. C. J. Moore, "Derivation of auditory filter shapes from notched-noise data, " Hearing Res., vol. 47, pp. 103-138, 1990.
- (1990) Hearing Res. , vol.47 , pp. 103-138
- Glasberg, B.R.¹ Moore, B.C.J.²

33
- 40249108645
- Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment
- X. Shao and J. Barker, "Stream weight estimation for multistream audio-visual speech recognition in a multispeaker environment, " Speech Commun., vol. 50, no. 4, pp. 337-353, 2008.
- (2008) Speech Commun. , vol.50 , Issue.4 , pp. 337-353
- Shao, X.¹ Barker, J.²

34
- 0002858640
- Computer video face tracking for use in a perceptual user interface
- G. R. Bradski, "Computer video face tracking for use in a perceptual user interface, " Intel Technol. J., Q2, 1998.
- (1998) Intel Technol. J., Q2
- Bradski, G.R.¹

35
- 34748817500
- Exploiting correlogram structure for robust automatic speech recognition with multiple speech sources
- N. Ma, P. Green, J. Barker, and A. Coy, "Exploiting correlogram structure for robust automatic speech recognition with multiple speech sources, " Speech Commun., vol. 49, no. 12, pp. 874-891, 2007.
- (2007) Speech Commun. , vol.49 , Issue.12 , pp. 874-891
- Ma, N.¹ Green, P.² Barker, J.³ Coy, A.⁴

36
- 5444276464
- Upper Saddle River, NJ: Prentice-Hall
- R. C. Gonzales, R. E. Woods, and S. L. Eddins, Digital Image Process.ing Using MATLAB. Upper Saddle River, NJ: Prentice-Hall, 2004.
- (2004) Digital Image Process.ing Using MATLAB
- Gonzales, R.C.¹ Woods, R.E.² Eddins, S.L.³

37
- 0012668146
- Asynchrony modeling for audio-visual speech recognition
- San Diego, CA
- G. Gravier, G. Potamianos, and C. Neti, "Asynchrony modeling for audio-visual speech recognition, " in Proc. Human Lang. Technol. Conf., San Diego, CA, 2002, pp. 24-27.
- (2002) Proc. Human Lang. Technol. Conf. , pp. 24-27
- Gravier, G.¹ Potamianos, G.² Neti, C.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.