SCOPUS 정보 검색 플랫폼

IEEE Transactions on Cybernetics

Volumn 44, Issue 2, 2014, Pages 175-184

Robust audio-visual speech recognition under noisy audio-video conditions

(4) Stewart, Darryl a Seymour, Rowan b Pass, Adrian c Ming, Ji a

a QUEEN'S UNIVERSITY BELFAST (United Kingdom)

b UNIVERSITY OF WASHINGTON (United States)

c Pace Micro Technology plc (United Kingdom)

Author keywords

Automatic speech recognition; human computer interaction; speech recognition

Indexed keywords

AUDIO VISUAL SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION; FRAME-BY-FRAME BASIS; INTEGRATION APPROACH; MPEG-4 VIDEO COMPRESSION; ROBUST RECOGNITION; STREAM INTEGRATION; WEIGHTING APPROACHES;

ACOUSTIC NOISE; CRIME; EXPERIMENTS; HUMAN COMPUTER INTERACTION; SPEECH RECOGNITION; VIDEO STREAMING;

AUDIO ACOUSTICS;

ALGORITHM; ARTICLE; AUTOMATED PATTERN RECOGNITION; AUTOMATIC SPEECH RECOGNITION; COMPUTER SIMULATION; HUMAN; METHODOLOGY; PATTERN RECOGNITION; PHYSIOLOGY; SIGNAL NOISE RATIO; SOUND DETECTION; STATISTICAL ANALYSIS; STATISTICAL MODEL;

ALGORITHMS; COMPUTER SIMULATION; DATA INTERPRETATION, STATISTICAL; HUMANS; MODELS, STATISTICAL; PATTERN RECOGNITION, AUTOMATED; PATTERN RECOGNITION, PHYSIOLOGICAL; PATTERN RECOGNITION, VISUAL; SIGNAL-TO-NOISE RATIO; SOUND SPECTROGRAPHY; SPEECH RECOGNITION SOFTWARE;

EID: 84893400545 PISSN: 21682267 EISSN: None Source Type: Journal
DOI: 10.1109/TCYB.2013.2250954 Document Type: Article

Times cited : (73)

References (33)

1
- 0017199877
- Hearing lips and seeing voices
- H. McGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, no. 5588, pp. 746-748, 1976.
- (1976) Nature , vol.264 , Issue.5588 , pp. 746-748
- McGurk, H.¹ MacDonald, J.²

2
- 0001432664
- On the integration of auditory and visual parameters in an HMM-based ASR
- D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer
- A. Adjoudani and C. Benôit, "On the integration of auditory and visual parameters in an HMM-based ASR," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer, 1996, pp. 461-471.
- (1996) Speechreading by Humans and Machines , pp. 461-471
- Adjoudani, A.¹ Benôit, C.²

3
- 85032752352
- Audiovisual speech processing: Lip reading and lip synchronization
- DOI 10.1109/79.911195
- T. Chen, "Audiovisual speech processing. lip reading and lip synchronization," IEEE Signal Process. Mag., vol. 18, no. 1, pp. 9-21, Jan. 2001. (Pubitemid 32287667)
- (2001) IEEE Signal Processing Magazine , vol.18 , Issue.1 , pp. 9-21
- Chen, T.¹

4
- 0034853041
- Hierarchical discriminant features for audio-visual LVCSR
- G. Potamianos, J. Luettin, and C. Neti, "Hierarchical discriminant features for audio-visual LVCSR," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2001, pp. 165-168. (Pubitemid 32839213)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 165-168
- Potamianos, G.¹ Luettin, J.² Neti, C.³

5
- 0036295990
- Noisy audio feature enhancement using audio-visual speech data
- R. Goecke, G. Potamianos, and C. Neti, "Noisy audio feature enhancement using audio-visual speech data," in Proc. Int. Conf. Acoust. Speech Signal Process., 2002, pp. 2025-2028.
- (2002) Proc. Int. Conf. Acoust. Speech Signal Process , pp. 2025-2028
- Goecke, R.¹ Potamianos, G.² Neti, C.³

6
- 0034842342
- Asynchronous stream modeling for large vocabulary audio-visual speech recognition
- J. Luettin, G. Potamianos, and C. Neti, "Asynchronous stream modeling for large vocabulary audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2001, pp. 169-172. (Pubitemid 32839214)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 169-172
- Luettin, J.¹ Potamianos, G.² Neti, C.³

7
- 80051637579
- A multi-stream ASR framework for BLSTM modeling of conversational speech
- May
- M. Wollmer, F. Eyben, B. Schuller, and G. Rigoll, "A multi-stream ASR framework for BLSTM modeling of conversational speech," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., May 2011, pp. 4860-4863.
- (2011) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , pp. 4860-4863
- Wollmer, M.¹ Eyben, F.² Schuller, B.³ Rigoll, G.⁴

8
- 77949373348
- Improved decision trees for multistream HMM-based audio-visual continuous speech recognition
- Understanding, Nov.
- J. Huang and K. Visweswariah, "Improved decision trees for multistream HMM-based audio-visual continuous speech recognition," in Proc. Workshop IEEE Autom. Speech Recognit. Understanding, Nov. 2009, pp. 228-231.
- (2009) Proc. Workshop IEEE Autom. Speech Recognit , pp. 228-231
- Huang, J.¹ Visweswariah, K.²

9
- 84890568355
- A novel algorithm for acoustic and visual classifiers decision fusion in audio-visual speech recognition system
- R. Rajavel and P. S. Sathidevi, "A novel algorithm for acoustic and visual classifiers decision fusion in audio-visual speech recognition system," Signal Process. Int. J., vol. 4, no. 1 pp. 23-37, 2010.
- (2010) Signal Process. Int. J. , vol.4 , Issue.1 , pp. 23-37
- Rajavel, R.¹ Sathidevi, P.S.²

10
- 84897584045
- On dynamic stream weighting for audio-visual speech recognition
- May
- V. Estellers, M. Gurban, and J. Thiran, "On dynamic stream weighting for audio-visual speech recognition," IEEE Trans. Audio Speech Language Process., vol. 20, no. 4, pp. 1145-1157, May 2012.
- (2012) IEEE Trans. Audio Speech Language Process. , vol.20 , Issue.4 , pp. 1145-1157
- Estellers, V.¹ Gurban, M.² Thiran, J.³

11
- 56149109954
- Fused HMMadaptation of multi-stream HMMS for audio-visual speech recognition
- D. B. Dean, P. J. Lucey, S. Sridharan, and T. J. Wark, "Fused HMMadaptation of multi-stream HMMS for audio-visual speech recognition," in Proc. 8th Annu. Conf. Int. Speech Commun. Assoc., 2007, pp. 666-669.
- (2007) Proc. 8th Annu. Conf. Int. Speech Commun. Assoc. , pp. 666-669
- Dean, D.B.¹ Lucey, P.J.² Sridharan, S.³ Wark, T.J.⁴

12
- 0034270644
- Audio-visual speech modeling for continuous speech recognition
- Sep.
- S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
- (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
- Dupont, S.¹ Luettin, J.²

13
- 0036874999
- Dynamic Bayesian networks for audio-visual speech recognition
- Nov.
- A. V. Nefian, L. Liang, X. Pi, X. Liu, and K. Murphy, "Dynamic Bayesian networks for audio-visual speech recognition," EURASIP J. Appl. Signal Process., pp. 1274-1288, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Process. , pp. 1274-1288
- Nefian, A.V.¹ Liang, L.² Pi, X.³ Liu, X.⁴ Murphy, K.⁵

14
- 0036874527
- Noise adaptive stream weighting in audio-visual speech recognition
- Nov.
- M. Heckmann, F. Berthommier, and K. Kroschel, "Noise adaptive stream weighting in audio-visual speech recognition," EURASIP J. Appl. Signal Process., vol. 11, pp. 1260-1273, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Process. , vol.11 , pp. 1260-1273
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

15
- 0034848499
- Optimal weighting of posteriors for audio-visual speech recognition
- M. Heckmann, F. Berthommier, and K. Kroschel, "Optimal weighting of posteriors for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. May 2001, pp. 161-164. (Pubitemid 32839212)
- (2001) ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings , vol.1 , pp. 161-164
- Heckmann, M.¹ Berthommier, F.² Kroschel, K.³

16
- 69949118452
- Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition
- Oct.
- L. Terry, D. Shiell, and A. Katsaggelos, "Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition," in Proc. 15th IEEE Int. Conf. Image Process., Oct. 2008, pp. 1316-1319.
- (2008) Proc. 15th IEEE Int. Conf. Image Process , pp. 1316-1319
- Terry, L.¹ Shiell, D.² Katsaggelos, A.³

17
- 0036875048
- Automatic speechreading with applications to human-computer interfaces
- Nov.
- X. Zhang, C. C. Broun, R. M. Mersereau, and M. A. Clements, "Automatic speechreading with applications to human-computer interfaces," EURASIP J. Appl. Signal Process., vol. 11, pp. 1228-1247, Nov. 2002.
- (2002) EURASIP J. Appl. Signal Process. , vol.11 , pp. 1228-1247
- Zhang, X.¹ Broun, C.C.² Mersereau, R.M.³ Clements, M.A.⁴

18
- 85009154155
- Stream weight optimization of speech and lip image sequence for audiovisual speech recognition
- S. Nakamura, H. Ito, and K. Shikano, "Stream weight optimization of speech and lip image sequence for audiovisual speech recognition," in Proc. Int. Conf. Spoken Language Process., vol. 3. 2000, pp. 20-23.
- (2000) Proc. Int. Conf. Spoken Language Process. , vol.3 , pp. 20-23
- Nakamura, S.¹ Ito, H.² Shikano, K.³

19
- 17344376380
- Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR
- May
- G. Gravier, S. Axelrod, G. Potamianos, and C. Neti, "Maximum entropy and MCE based HMM stream weight estimation for audio-visual ASR," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. May 2002, pp. 853-856.
- (2002) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.1 , pp. 853-856
- Gravier, G.¹ Axelrod, S.² Potamianos, G.³ Neti, C.⁴

20
- 0141814785
- Frame-dependent multi-stream reliability indicators for audio-visual speech recognition
- Apr.
- A. Garg, G. Potamianos, C. Neti, and T. S. Huang, "Frame-dependent multi-stream reliability indicators for audio-visual speech recognition," in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., vol. 1. Apr. 2003, pp. 24-27.
- (2003) Proc. IEEE Int. Conf. Acoust. Speech Signal Process , vol.1 , pp. 24-27
- Garg, A.¹ Potamianos, G.² Neti, C.³ Huang, T.S.⁴

21
- 75749106784
- Audio-visual integration for robust speech recognition using maximum weighted stream posteriors
- R. Seymour, D. Stewart, and J. Ming, "Audio-visual integration for robust speech recognition using maximum weighted stream posteriors," in Proc. Interspeech, 2007, pp. 654-657.
- (2007) Proc. Interspeech , pp. 654-657
- Seymour, R.¹ Stewart, D.² Ming, J.³

22
- 84885728886
- Your word is my command': Google search by voice: A case study
- ch. 4
- J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne, C. Chelba, M. Cohen, M. Kamvar, and B. Strope, "'Your word is my command': Google search by voice: A case study," in Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics, 2010, ch. 4, pp. 61-90.
- (2010) Advances in Speech Recognition: Mobile Environments, Call Centers and Clinics , pp. 61-90
- Schalkwyk, J.¹ Beeferman, D.² Beaufays, F.³ Byrne, B.⁴ Chelba, C.⁵ Cohen, M.⁶ Kamvar, M.⁷ Strope, B.⁸

23
- 33745224761
- A new posterior based audio-visual integration method for robust speech recognition
- 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
- R. Seymour, J. Ming, and D. Stewart, "A new posterior based audiovisual integration method for robust speech recognition," in Proc. Interspeech-Eurospeech, Sep. 2005, pp. 1229-1232. (Pubitemid 43908290)
- (2005) 9th European Conference on Speech Communication and Technology , pp. 1229-1232
- Seymour, R.¹ Ming, J.² Stewart, D.³

24
- 33646410695
- A posterior union model with applications to robust speech and speaker recognition
- Apr.
- J. Ming, J. Lin, and F. J. Smith, "A posterior union model with applications to robust speech and speaker recognition," EURASIP J. Applied Signal Process., Apr. 2006, pp. 1-12.
- (2006) EURASIP J. Applied Signal Process. , pp. 1-12
- Ming, J.¹ Lin, J.² Smith, F.J.³

25
- 69449094603
- Robust face recognition using posterior union model based neural networks
- Sep.
- J. Lin, J. Ming, and D. Crookes, "Robust face recognition using posterior union model based neural networks," Comput. Vision, IET, vol. 3, no. 3, pp. 130-142, Sep. 2009.
- (2009) Comput. Vision, IET , vol.3 , Issue.3 , pp. 130-142
- Lin, J.¹ Ming, J.² Crookes, D.³

26
- 0001935972
- XM2VTSDB: The extended M2VTS database
- Mar.
- K. Messer, J. Matas, J. Kittler, J. Luettin, and G. Maitre, "XM2VTSDB: The extended M2VTS database," in Proc. Audio video-Based Biometric Person Authentication, Mar. 1999, pp. 72-77.
- (1999) Proc. Audio Video-Based Biometric Person Authentication , pp. 72-77
- Messer, K.¹ Matas, J.² Kittler, J.³ Luettin, J.⁴ Maitre, G.⁵

27
- 0003822743
- (for HTK Version 3.0), Microsoft Corporation [Online] Available
- S. Young. (2000). The HTK Book (for HTK Version 3.0), Microsoft Corporation [Online]. Available: http://htk.eng.cam.ac.uk/docs/docs.shtml
- (2000) The HTK Book
- Young, S.¹

28
- 4243220068
- Ph.D. dissertation Univ. Surrey Guilford, U.K.
- M. U. R. Sanchez, "Aspects of facial biometrics for verification of personal identity," Ph.D. dissertation, Univ. Surrey, Guilford, U.K., 2000.
- (2000) Aspects of Facial Biometrics for Verification of Personal Identity
- Sanchez, M.U.R.¹

29
- 0032314380
- An image transform approach for HMM based automatic lipreading
- G. Potamianos, H. P. Graf, and E. Cosatto, "An image transform approach for HMM based automatic lipreading," in Proc. Int. Conf. Image Process., vol. 3. 1998, pp. 173-177.
- (1998) Proc. Int. Conf. Image Process , vol.3 , pp. 173-177
- Potamianos, G.¹ Graf, H.P.² Cosatto, E.³

30
- 43949091431
- Comparison of image transformbased features for visual speech recognition in clean and corrupted videos
- article 14, Apr.
- R. Seymour, D. Stewart, and J. Ming, "Comparison of image transformbased features for visual speech recognition in clean and corrupted videos," EURASIP J. Image Video Process., vol. 2008, article 14, Apr. 2008.
- (2008) EURASIP J. Image Video Process , vol.2008
- Seymour, R.¹ Stewart, D.² Ming, J.³

31
- 70349494073
- Dynamic visual features for audio-visual speaker verification
- D. Dean and S. Sridharan, "Dynamic visual features for audio-visual speaker verification," Comput. Speech Language, vol. 24, no. 2, pp. 136-149, 2010.
- (2010) Comput. Speech Language , vol.24 , Issue.2 , pp. 136-149
- Dean, D.¹ Sridharan, S.²

32
- 85009284526
- DCTbased video features for audio-visual speech recognition
- Denver, CO, USA Sep.
- M. Heckmann, K. Kroschel, C. Savariaux, and F. Berthommier, "DCTbased video features for audio-visual speech recognition," in Proc. Int. Conf. Spoken Language Process., Denver, CO, USA, Sep. 2002, pp. 1925-1928.
- (2002) Proc. Int. Conf. Spoken Language Process , pp. 1925-1928
- Heckmann, M.¹ Kroschel, K.² Savariaux, C.³ Berthommier, F.⁴

33
- 84893419257
- An examination of audio-visual fused HMMS for speaker recognition
- Toulouse, France [Online]. Available
- D. B. Dean, T. J. Wark, and S. Sridharan. (2006). "An examination of audio-visual fused HMMS for speaker recognition," in Proc. 2nd Workshop Multimodal User Authentication, Toulouse, France [Online]. Available: http://eprints.qut.edu.au/5343/
- (2006) Proc. 2nd Workshop Multimodal User Authentication
- Dean, D.B.¹ Wark, T.J.² Sridharan, S.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.