메뉴 건너뛰기




Volumn 44, Issue 2, 2014, Pages 175-184

Robust audio-visual speech recognition under noisy audio-video conditions

Author keywords

Automatic speech recognition; human computer interaction; speech recognition

Indexed keywords

AUDIO VISUAL SPEECH RECOGNITION; AUTOMATIC SPEECH RECOGNITION; FRAME-BY-FRAME BASIS; INTEGRATION APPROACH; MPEG-4 VIDEO COMPRESSION; ROBUST RECOGNITION; STREAM INTEGRATION; WEIGHTING APPROACHES;

EID: 84893400545     PISSN: 21682267     EISSN: None     Source Type: Journal    
DOI: 10.1109/TCYB.2013.2250954     Document Type: Article
Times cited : (73)

References (33)
  • 1
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • H. McGurk and J. MacDonald, "Hearing lips and seeing voices," Nature, vol. 264, no. 5588, pp. 746-748, 1976.
    • (1976) Nature , vol.264 , Issue.5588 , pp. 746-748
    • McGurk, H.1    MacDonald, J.2
  • 2
    • 0001432664 scopus 로고    scopus 로고
    • On the integration of auditory and visual parameters in an HMM-based ASR
    • D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer
    • A. Adjoudani and C. Benôit, "On the integration of auditory and visual parameters in an HMM-based ASR," in Speechreading by Humans and Machines, D. G. Stork and M. E. Hennecke, Eds. Berlin, Germany: Springer, 1996, pp. 461-471.
    • (1996) Speechreading by Humans and Machines , pp. 461-471
    • Adjoudani, A.1    Benôit, C.2
  • 3
    • 85032752352 scopus 로고    scopus 로고
    • Audiovisual speech processing: Lip reading and lip synchronization
    • DOI 10.1109/79.911195
    • T. Chen, "Audiovisual speech processing. lip reading and lip synchronization," IEEE Signal Process. Mag., vol. 18, no. 1, pp. 9-21, Jan. 2001. (Pubitemid 32287667)
    • (2001) IEEE Signal Processing Magazine , vol.18 , Issue.1 , pp. 9-21
    • Chen, T.1
  • 8
    • 77949373348 scopus 로고    scopus 로고
    • Improved decision trees for multistream HMM-based audio-visual continuous speech recognition
    • Understanding, Nov.
    • J. Huang and K. Visweswariah, "Improved decision trees for multistream HMM-based audio-visual continuous speech recognition," in Proc. Workshop IEEE Autom. Speech Recognit. Understanding, Nov. 2009, pp. 228-231.
    • (2009) Proc. Workshop IEEE Autom. Speech Recognit , pp. 228-231
    • Huang, J.1    Visweswariah, K.2
  • 9
    • 84890568355 scopus 로고    scopus 로고
    • A novel algorithm for acoustic and visual classifiers decision fusion in audio-visual speech recognition system
    • R. Rajavel and P. S. Sathidevi, "A novel algorithm for acoustic and visual classifiers decision fusion in audio-visual speech recognition system," Signal Process. Int. J., vol. 4, no. 1 pp. 23-37, 2010.
    • (2010) Signal Process. Int. J. , vol.4 , Issue.1 , pp. 23-37
    • Rajavel, R.1    Sathidevi, P.S.2
  • 10
    • 84897584045 scopus 로고    scopus 로고
    • On dynamic stream weighting for audio-visual speech recognition
    • May
    • V. Estellers, M. Gurban, and J. Thiran, "On dynamic stream weighting for audio-visual speech recognition," IEEE Trans. Audio Speech Language Process., vol. 20, no. 4, pp. 1145-1157, May 2012.
    • (2012) IEEE Trans. Audio Speech Language Process. , vol.20 , Issue.4 , pp. 1145-1157
    • Estellers, V.1    Gurban, M.2    Thiran, J.3
  • 12
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modeling for continuous speech recognition
    • Sep.
    • S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 14
    • 0036874527 scopus 로고    scopus 로고
    • Noise adaptive stream weighting in audio-visual speech recognition
    • Nov.
    • M. Heckmann, F. Berthommier, and K. Kroschel, "Noise adaptive stream weighting in audio-visual speech recognition," EURASIP J. Appl. Signal Process., vol. 11, pp. 1260-1273, Nov. 2002.
    • (2002) EURASIP J. Appl. Signal Process. , vol.11 , pp. 1260-1273
    • Heckmann, M.1    Berthommier, F.2    Kroschel, K.3
  • 16
    • 69949118452 scopus 로고    scopus 로고
    • Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition
    • Oct.
    • L. Terry, D. Shiell, and A. Katsaggelos, "Feature space video stream consistency estimation for dynamic stream weighting in audio-visual speech recognition," in Proc. 15th IEEE Int. Conf. Image Process., Oct. 2008, pp. 1316-1319.
    • (2008) Proc. 15th IEEE Int. Conf. Image Process , pp. 1316-1319
    • Terry, L.1    Shiell, D.2    Katsaggelos, A.3
  • 17
  • 18
    • 85009154155 scopus 로고    scopus 로고
    • Stream weight optimization of speech and lip image sequence for audiovisual speech recognition
    • S. Nakamura, H. Ito, and K. Shikano, "Stream weight optimization of speech and lip image sequence for audiovisual speech recognition," in Proc. Int. Conf. Spoken Language Process., vol. 3. 2000, pp. 20-23.
    • (2000) Proc. Int. Conf. Spoken Language Process. , vol.3 , pp. 20-23
    • Nakamura, S.1    Ito, H.2    Shikano, K.3
  • 21
    • 75749106784 scopus 로고    scopus 로고
    • Audio-visual integration for robust speech recognition using maximum weighted stream posteriors
    • R. Seymour, D. Stewart, and J. Ming, "Audio-visual integration for robust speech recognition using maximum weighted stream posteriors," in Proc. Interspeech, 2007, pp. 654-657.
    • (2007) Proc. Interspeech , pp. 654-657
    • Seymour, R.1    Stewart, D.2    Ming, J.3
  • 23
    • 33745224761 scopus 로고    scopus 로고
    • A new posterior based audio-visual integration method for robust speech recognition
    • 9th European Conference on Speech Communication and Technology, Eurospeech Interspeech
    • R. Seymour, J. Ming, and D. Stewart, "A new posterior based audiovisual integration method for robust speech recognition," in Proc. Interspeech-Eurospeech, Sep. 2005, pp. 1229-1232. (Pubitemid 43908290)
    • (2005) 9th European Conference on Speech Communication and Technology , pp. 1229-1232
    • Seymour, R.1    Ming, J.2    Stewart, D.3
  • 24
    • 33646410695 scopus 로고    scopus 로고
    • A posterior union model with applications to robust speech and speaker recognition
    • Apr.
    • J. Ming, J. Lin, and F. J. Smith, "A posterior union model with applications to robust speech and speaker recognition," EURASIP J. Applied Signal Process., Apr. 2006, pp. 1-12.
    • (2006) EURASIP J. Applied Signal Process. , pp. 1-12
    • Ming, J.1    Lin, J.2    Smith, F.J.3
  • 25
    • 69449094603 scopus 로고    scopus 로고
    • Robust face recognition using posterior union model based neural networks
    • Sep.
    • J. Lin, J. Ming, and D. Crookes, "Robust face recognition using posterior union model based neural networks," Comput. Vision, IET, vol. 3, no. 3, pp. 130-142, Sep. 2009.
    • (2009) Comput. Vision, IET , vol.3 , Issue.3 , pp. 130-142
    • Lin, J.1    Ming, J.2    Crookes, D.3
  • 27
    • 0003822743 scopus 로고    scopus 로고
    • (for HTK Version 3.0), Microsoft Corporation [Online] Available
    • S. Young. (2000). The HTK Book (for HTK Version 3.0), Microsoft Corporation [Online]. Available: http://htk.eng.cam.ac.uk/docs/docs.shtml
    • (2000) The HTK Book
    • Young, S.1
  • 29
    • 0032314380 scopus 로고    scopus 로고
    • An image transform approach for HMM based automatic lipreading
    • G. Potamianos, H. P. Graf, and E. Cosatto, "An image transform approach for HMM based automatic lipreading," in Proc. Int. Conf. Image Process., vol. 3. 1998, pp. 173-177.
    • (1998) Proc. Int. Conf. Image Process , vol.3 , pp. 173-177
    • Potamianos, G.1    Graf, H.P.2    Cosatto, E.3
  • 30
    • 43949091431 scopus 로고    scopus 로고
    • Comparison of image transformbased features for visual speech recognition in clean and corrupted videos
    • article 14, Apr.
    • R. Seymour, D. Stewart, and J. Ming, "Comparison of image transformbased features for visual speech recognition in clean and corrupted videos," EURASIP J. Image Video Process., vol. 2008, article 14, Apr. 2008.
    • (2008) EURASIP J. Image Video Process , vol.2008
    • Seymour, R.1    Stewart, D.2    Ming, J.3
  • 31
    • 70349494073 scopus 로고    scopus 로고
    • Dynamic visual features for audio-visual speaker verification
    • D. Dean and S. Sridharan, "Dynamic visual features for audio-visual speaker verification," Comput. Speech Language, vol. 24, no. 2, pp. 136-149, 2010.
    • (2010) Comput. Speech Language , vol.24 , Issue.2 , pp. 136-149
    • Dean, D.1    Sridharan, S.2
  • 33
    • 84893419257 scopus 로고    scopus 로고
    • An examination of audio-visual fused HMMS for speaker recognition
    • Toulouse, France [Online]. Available
    • D. B. Dean, T. J. Wark, and S. Sridharan. (2006). "An examination of audio-visual fused HMMS for speaker recognition," in Proc. 2nd Workshop Multimodal User Authentication, Toulouse, France [Online]. Available: http://eprints.qut.edu.au/5343/
    • (2006) Proc. 2nd Workshop Multimodal User Authentication
    • Dean, D.B.1    Wark, T.J.2    Sridharan, S.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.