메뉴 건너뛰기




Volumn 14, Issue 3, 2006, Pages 1082-1089

Visual model structures and synchrony constraints for audio-visual speech recognition

Author keywords

Audio visual speech recognition; Lip reading; Multimodal speech processing

Indexed keywords

AUDIO VISUAL SPEECH RECOGNITION (AVSR); LIP READING; MULTIMODAL SPEECH PROCESSING; WORD ERROR RATE;

EID: 34047263009     PISSN: 15587916     EISSN: None     Source Type: Journal    
DOI: 10.1109/TSA.2005.857572     Document Type: Article
Times cited : (61)

References (36)
  • 1
    • 34047262788 scopus 로고    scopus 로고
    • The intrinsic bimodality of speech communication and the synthesis of talking faces
    • M. M. Taylor, F. Nel, and D. Bouwhuis, Eds. Amsterdam, The Netherlands: John Benjamins
    • C. Benoit, "The intrinsic bimodality of speech communication and the synthesis of talking faces," in The Structure of Multimodal Dialogue II, M. M. Taylor, F. Nel, and D. Bouwhuis, Eds. Amsterdam, The Netherlands: John Benjamins, 2000, pp. 485-502.
    • (2000) The Structure of Multimodal Dialogue II , pp. 485-502
    • Benoit, C.1
  • 2
    • 0036502797 scopus 로고    scopus 로고
    • A review of speech-based bimodal recognition
    • Mar
    • C. Chibelushi, F. Deravi, and J. Mason, "A review of speech-based bimodal recognition," IEEE Trans. Multimedia, vol. 4, no. 1, pp. 23-37, Mar. 2002.
    • (2002) IEEE Trans. Multimedia , vol.4 , Issue.1 , pp. 23-37
    • Chibelushi, C.1    Deravi, F.2    Mason, J.3
  • 3
    • 85009074661 scopus 로고    scopus 로고
    • Large- vocabulary audio-visual speech recognition by machines and humans
    • Aalborg, Denmark, Sep
    • G. Potamianos, C. Neti, G. lyengar, and E. Helmuth, "Large- vocabulary audio-visual speech recognition by machines and humans," in Proc. Eurospeech, Aalborg, Denmark, Sep. 2001, pp. 1293-1296.
    • (2001) Proc. Eurospeech , pp. 1293-1296
    • Potamianos, G.1    Neti, C.2    lyengar, G.3    Helmuth, E.4
  • 4
    • 4544290191 scopus 로고    scopus 로고
    • Recent advances in the automatic recognition of audiovisual speech
    • Sep
    • G. Potamianos, C. Neti, G. Gravier, A. Garg, and A. Senior, "Recent advances in the automatic recognition of audiovisual speech," Proc. IEEE, vol. 91, no. 9, pp. 1306-1326, Sep. 2003.
    • (2003) Proc. IEEE , vol.91 , Issue.9 , pp. 1306-1326
    • Potamianos, G.1    Neti, C.2    Gravier, G.3    Garg, A.4    Senior, A.5
  • 5
    • 84955023977 scopus 로고
    • Visual contribution to speech intelligibility in noise
    • Mar
    • W. Sumby and I. Pollack, "Visual contribution to speech intelligibility in noise," J. Acoust. Soc. Amer., vol. 26, no. 2, pp. 212-215, Mar. 1954.
    • (1954) J. Acoust. Soc. Amer , vol.26 , Issue.2 , pp. 212-215
    • Sumby, W.1    Pollack, I.2
  • 6
    • 85009230873 scopus 로고    scopus 로고
    • Audio-visual speech recognition in challenging environments
    • Geneva, Switzerland, Sep
    • G. Potamianos and C. Neti, "Audio-visual speech recognition in challenging environments," in Proc. Eurospeech, Geneva, Switzerland, Sep. 2003, pp. 1293-1296.
    • (2003) Proc. Eurospeech , pp. 1293-1296
    • Potamianos, G.1    Neti, C.2
  • 7
    • 0038359548 scopus 로고    scopus 로고
    • A probabilistic framework for segment-based speech recognition
    • Apr./Jul
    • J. Glass, "A probabilistic framework for segment-based speech recognition," Comput. Speech Lang., vol. 17, no. 2-3, pp. 137-152, Apr./Jul. 2003.
    • (2003) Comput. Speech Lang , vol.17 , Issue.2-3 , pp. 137-152
    • Glass, J.1
  • 8
    • 14944353581 scopus 로고    scopus 로고
    • A segment-based audio-visual speech recognizer: Data collection, development, and initial experiments
    • PA, Oct
    • T. Hazen, K. Saenko, C. La, and J. Glass, "A segment-based audio-visual speech recognizer: Data collection, development, and initial experiments," in Proc. Int. Conf. Multimodal Interfaces, PA, Oct. 2004.
    • (2004) Proc. Int. Conf. Multimodal Interfaces
    • Hazen, T.1    Saenko, K.2    La, C.3    Glass, J.4
  • 9
    • 84925639646 scopus 로고    scopus 로고
    • Real-time lip tracking and bimodal continuous speech recognition
    • Redondo Beach, CA
    • M. Chan, Y. Zhang, and T. Huang, "Real-time lip tracking and bimodal continuous speech recognition," in Proc. Workshop on Multimedia Signal Processing, Redondo Beach, CA, 1998, pp. 65-70.
    • (1998) Proc. Workshop on Multimedia Signal Processing , pp. 65-70
    • Chan, M.1    Zhang, Y.2    Huang, T.3
  • 10
    • 0030355932 scopus 로고    scopus 로고
    • Audio-visual speech recognition using multiscale nonlinear image decomposition
    • Philadelphia, PA
    • I. Matthews, J. Bangham, and S. Cox, "Audio-visual speech recognition using multiscale nonlinear image decomposition," in Proc. Int. Conf. Spoken Language Processing, Philadelphia, PA, 1996, pp. 38-41.
    • (1996) Proc. Int. Conf. Spoken Language Processing , pp. 38-41
    • Matthews, I.1    Bangham, J.2    Cox, S.3
  • 12
    • 85009135946 scopus 로고    scopus 로고
    • Bimodal speech recognition using coupled hidden Markov models
    • Beijing, China, Oct
    • S. Chu and T. Huang, "Bimodal speech recognition using coupled hidden Markov models," in Proc. Int. Conf. Spoken Language Processing, vol. II, Beijing, China, Oct. 2000.
    • (2000) Proc. Int. Conf. Spoken Language Processing , vol.2
    • Chu, S.1    Huang, T.2
  • 15
    • 0036295989 scopus 로고    scopus 로고
    • Audio-visual speech modeling using coupled hidden Markov models
    • Orlando, FL, May
    • S. Chu and T. Huang, "Audio-visual speech modeling using coupled hidden Markov models," in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, vol. 2, Orlando, FL, May 2002, pp. 2009-2012.
    • (2002) Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing , vol.2 , pp. 2009-2012
    • Chu, S.1    Huang, T.2
  • 16
    • 0041355006 scopus 로고    scopus 로고
    • The VidTIMIT database
    • Martigny, Switzerland: IDIAP, Aug
    • C. Sanderson, "The VidTIMIT database," in IDIAP Communication 02-06. Martigny, Switzerland: IDIAP, Aug. 2002.
    • (2002) IDIAP Communication 02-06
    • Sanderson, C.1
  • 17
    • 0025477640 scopus 로고
    • Speech database development: TIMIT and beyond
    • V. Zue, S. Seneff, and J. Glass, "Speech database development: TIMIT and beyond," Speech Commun., vol. 9, no. 4, pp. 351-356, 1990.
    • (1990) Speech Commun , vol.9 , Issue.4 , pp. 351-356
    • Zue, V.1    Seneff, S.2    Glass, J.3
  • 18
    • 0027623210 scopus 로고
    • Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems
    • Jul
    • A. Varga and H. Steeneken, "Assessment for automatic speech recognition II: NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems," Speech Commun., vol. 12, no. 3, pp. 247-251, Jul. 1993.
    • (1993) Speech Commun , vol.12 , Issue.3 , pp. 247-251
    • Varga, A.1    Steeneken, H.2
  • 19
    • 0034270644 scopus 로고    scopus 로고
    • Audio-visual speech modeling for continuous speech recognition
    • Sep
    • S. Dupont and J. Luettin, "Audio-visual speech modeling for continuous speech recognition," IEEE Trans. Multimedia, vol. 2, no. 3, pp. 141-151, Sep. 2000.
    • (2000) IEEE Trans. Multimedia , vol.2 , Issue.3 , pp. 141-151
    • Dupont, S.1    Luettin, J.2
  • 21
    • 0012668146 scopus 로고    scopus 로고
    • Asynchrony modeling for audio-visual speech recognition
    • San Diego, CA, Mar
    • G. Gravier, G. Potamianos, and C. Neti, "Asynchrony modeling for audio-visual speech recognition," in Proc. Human Language Technology Conf., San Diego, CA, Mar. 2002, pp. 1-6.
    • (2002) Proc. Human Language Technology Conf , pp. 1-6
    • Gravier, G.1    Potamianos, G.2    Neti, C.3
  • 25
    • 4544224863 scopus 로고    scopus 로고
    • A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs
    • Montreal, QC, Canada, May
    • S. Tamura, K. Iwano, and S. Furui, "A stream-weight optimization method for audio-visual speech recognition using multi-stream HMMs," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, vol. 1, Montreal, QC, Canada, May 2004, pp. 857-860.
    • (2004) Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing , vol.1 , pp. 857-860
    • Tamura, S.1    Iwano, K.2    Furui, S.3
  • 26
    • 0014366349 scopus 로고
    • Confusions among visually perceived consonants
    • C. Fisher, "Confusions among visually perceived consonants," J. Speech Hearing Res., vol. 11, pp. 796-804, 1968.
    • (1968) J. Speech Hearing Res , vol.11 , pp. 796-804
    • Fisher, C.1
  • 27
    • 34047264545 scopus 로고    scopus 로고
    • K. Berger, Speechreading: Principles and Methods. Baltimore, MD: National Educational, 1972.
    • K. Berger, Speechreading: Principles and Methods. Baltimore, MD: National Educational, 1972.
  • 28
    • 0032074310 scopus 로고    scopus 로고
    • Audio-visual integration in multimodal communication
    • May
    • T. Chen and R. Rao, "Audio-visual integration in multimodal communication," Proc. IEEE, vol. 86, no. 5, pp. 837-852, May 1998.
    • (1998) Proc. IEEE , vol.86 , Issue.5 , pp. 837-852
    • Chen, T.1    Rao, R.2
  • 29
    • 85013580214 scopus 로고
    • Sensory integration in audiovisual automatic speech recognition
    • Pacific Grove, CA, Oct./Nov
    • P. Silsbee, "Sensory integration in audiovisual automatic speech recognition," in Proce. 28th Annual Asilomar Conf. Signals, Systems, and Computers, vol. I, Pacific Grove, CA, Oct./Nov. 1994, pp. 561-565.
    • (1994) Proce. 28th Annual Asilomar Conf. Signals, Systems, and Computers , vol.1 , pp. 561-565
    • Silsbee, P.1
  • 30
    • 0002358797 scopus 로고    scopus 로고
    • Discriminative learning of visual data for audiovisual speech recognition
    • Mar
    • A. Rogozan, "Discriminative learning of visual data for audiovisual speech recognition," Int. J. Artif. Intell. Tools, vol. 8, no. 1, pp. 43-52, Mar. 1999.
    • (1999) Int. J. Artif. Intell. Tools , vol.8 , Issue.1 , pp. 43-52
    • Rogozan, A.1
  • 31
    • 0004052871 scopus 로고    scopus 로고
    • Center Lang. Speech Process, The Johns Hopkins Univ, Baltimore, MD, Tech. Rep
    • C. Neti et al., "Audio-visual speech recognition," Center Lang. Speech Process., The Johns Hopkins Univ., Baltimore, MD, Tech. Rep., 2000.
    • (2000) Audio-visual speech recognition
    • Neti, C.1
  • 32
    • 38249029471 scopus 로고
    • Automatic optically-based recognition of speech
    • Oct
    • K. Finn and A. Montgomery, "Automatic optically-based recognition of speech," Pattern Recognit. Lett., vol. 8, no. 3, pp. 159-164, Oct. 1988.
    • (1988) Pattern Recognit. Lett , vol.8 , Issue.3 , pp. 159-164
    • Finn, K.1    Montgomery, A.2
  • 34
    • 0033329811 scopus 로고    scopus 로고
    • VHD: A system for directing real-time virtual actors
    • Nov
    • G. Sannier, S. Balcisoy, N. Magnenat-Thalmann, and D. Thalmann, "VHD: A system for directing real-time virtual actors," Vis. Comput., vol. 15, no. 7/8, pp. 320-329, Nov. 1999.
    • (1999) Vis. Comput , vol.15 , Issue.7-8 , pp. 320-329
    • Sannier, G.1    Balcisoy, S.2    Magnenat-Thalmann, N.3    Thalmann, D.4
  • 36
    • 84898971246 scopus 로고    scopus 로고
    • An asynchronous hidden Markov model for audio-visual speech recognition
    • S. Decker, S. Thrun, and K. Obermayer, Eds. Cambridge, MA: MIT Press
    • S. Bengio, "An asynchronous hidden Markov model for audio-visual speech recognition," in Advances in Neural Information Processing Systems, NIPS 75, S. Decker, S. Thrun, and K. Obermayer, Eds. Cambridge, MA: MIT Press, 2003, pp. 1237-1244.
    • (2003) Advances in Neural Information Processing Systems, NIPS 75 , pp. 1237-1244
    • Bengio, S.1


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.