메뉴 건너뛰기




Volumn 125, Issue 2, 2009, Pages 1184-1196

A study of lip movements during spontaneous dialog and its application to voice activity detection

Author keywords

[No Author keywords available]

Indexed keywords

AUDIO-VISUAL CORPORA; COMPREHENSIVE ANALYSIS; COMPREHENSIVE STUDIES; LIP MOVEMENTS; MOVING NOISE SOURCES; NON-STATIONARY NOISE; SPEECH ACTIVITIES; SPEECH SIGNALS; VISUAL SPEECH; VOICE ACTIVITY DETECTIONS; VOICE ACTIVITY DETECTORS;

EID: 59849111743     PISSN: 00014966     EISSN: None     Source Type: Journal    
DOI: 10.1121/1.3050257     Document Type: Article
Times cited : (34)

References (61)
  • 2
    • 0022690063 scopus 로고
    • Laws for lips
    • Abry, C., and Boë, L. J. (1986). " Laws for lips.," Speech Commun. 5, 97-104.
    • (1986) Speech Commun. , vol.5 , pp. 97-104
    • Abry, C.1    Boë, L.J.2
  • 4
    • 85009255585 scopus 로고    scopus 로고
    • in Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO
    • Bailly, G., and Badin, P. (2002). " Seeing tongue movements from outside.," in Proceedings of the International Conference on Spoken Language Processing (ICSLP), Denver, CO, pp. 1913-1916.
    • (2002) Seeing Tongue Movements from Outside , pp. 1913-1916
    • Bailly, G.1    Badin, P.2
  • 7
    • 0001055701 scopus 로고    scopus 로고
    • Which components of the face humans and machines best speechread?
    • in, NATO Advanced Studies Institute, Series F: Computer and System Sciences, edited by D. G. Stork and M. E. Hennecke (Springer, New York)
    • Benòt, C., Guiard-Marigny, T., Le Goff, B., and Adjoudani, A. (1996). " Which components of the face humans and machines best speechread? " in Speechreading by Man and Machine: Models, Systems and Applications, NATO Advanced Studies Institute, Series F: Computer and System Sciences, edited by, D. G. Stork, and, M. E. Hennecke, (Springer, New York), pp. 315-328.
    • (1996) Speechreading by Man and Machine: Models, Systems and Applications , pp. 315-328
    • Benòt, C.1    Guiard-Marigny, T.2    Le Goff, B.3    Adjoudani, A.4
  • 8
    • 0002186602 scopus 로고
    • A set of French visemes for visual speech synthesis
    • " in, edited by G. Bailly, C. Benoit, and T. R. Sawallis (North-Holland, Amsterdam)
    • Benòt, C., Lallouache, T., Mohamadi, T., and Abry, C. (1992). " A set of French visemes for visual speech synthesis.," in Talking Machines: Th̀ories, Models, and Designs, edited by, G. Bailly, C. Benoit, and, T. R. Sawallis, (North-Holland, Amsterdam), pp. 485-504.
    • (1992) Talking Machines: Th̀ories, Models, and Designs , pp. 485-504
    • Benòt, C.1    Lallouache, T.2    Mohamadi, T.3    Abry, C.4
  • 9
    • 0028023732 scopus 로고
    • Effects of phonetic context on audio-visual intelligibility of French
    • "
    • Benòt, C., Mohamadi, T., and Kandel, S. (1994). " Effects of phonetic context on audio-visual intelligibility of French.," J. Speech Hear. Res. 37, 1195-1293.
    • (1994) J. Speech Hear. Res. , vol.37 , pp. 1195-1293
    • Benòt, C.1    Mohamadi, T.2    Kandel, S.3
  • 10
    • 10444276578 scopus 로고    scopus 로고
    • Auditory speech detection in noise enhanced by lipreading
    • "
    • Bernstein, L. E., Takayanagi, S., and Auer, E. T., Jr. (2004). " Auditory speech detection in noise enhanced by lipreading.," Speech Commun. 44, 5-18.
    • (2004) Speech Commun. , vol.44 , pp. 5-18
    • Bernstein, L.E.1    Takayanagi, S.2    Auer Jr., E.T.3
  • 11
    • 77956789336 scopus 로고    scopus 로고
    • Ventriloquism: A case of crossmodal perceptual grouping
    • in, edited by G. Aschersleben, T. Bachmann, and J. Müsseler (Elsevier, Amsterdam)
    • Bertelson, P. (1999). " Ventriloquism: A case of crossmodal perceptual grouping.," in Cognitive Contributions to the Perception of Spatial and Temporal Events, edited by, G. Aschersleben, T. Bachmann, and, J. Müsseler, (Elsevier, Amsterdam), pp. 347-362.
    • (1999) Cognitive Contributions to the Perception of Spatial and Temporal Events , pp. 347-362
    • Bertelson, P.1
  • 12
    • 0037240789 scopus 로고    scopus 로고
    • Reading speech from still and moving faces: The neural substrates of visible speech
    • Calvert, G. A., and Campbell, R. (2003). " Reading speech from still and moving faces: The neural substrates of visible speech.," J. Cogn Neurosci. 15, 57-70.
    • (2003) J. Cogn Neurosci. , vol.15 , pp. 57-70
    • Calvert, G.A.1    Campbell, R.2
  • 15
    • 0033708494 scopus 로고    scopus 로고
    • " in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey
    • De Cueto, P., Neti, C., and Senior, A. W. (2000). " Audio-visual intent-to-speak detection in human-computer interaction.," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, pp. 2373-2376.
    • (2000) Audio-visual Intent-to-speak Detection in Human-computer Interaction , pp. 2373-2376
    • De Cueto, P.1    Neti, C.2    Senior, A.W.3
  • 17
    • 0021645331 scopus 로고
    • Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator
    • Ephraim, Y., and Malah, D. (1984). " Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator.," IEEE Trans. Acoust., Speech, Signal Process. 32, 1109-1121.
    • (1984) IEEE Trans. Acoust., Speech, Signal Process. , vol.32 , pp. 1109-1121
    • Ephraim, Y.1    Malah, D.2
  • 18
    • 0016817064 scopus 로고
    • Auditory-visual perception of speech
    • Erber, N. P. (1975). " Auditory-visual perception of speech.," J. Speech Hear Disord. 40, 481-492.
    • (1975) J. Speech Hear Disord. , vol.40 , pp. 481-492
    • Erber, N.P.1
  • 19
    • 23744449511 scopus 로고    scopus 로고
    • Analysis and synthesis of three-dimensional movements of the head, face, and of a speaker using cued speech
    • "
    • Gibert, G., Bailly, G., Beautemps, D., Elisei, F., and Brun, R. (2005). " Analysis and synthesis of three-dimensional movements of the head, face, and of a speaker using cued speech.," J. Acoust. Soc. Am. 118, 1144-1153.
    • (2005) J. Acoust. Soc. Am. , vol.118 , pp. 1144-1153
    • Gibert, G.1    Bailly, G.2    Beautemps, D.3    Elisei, F.4    Brun, R.5
  • 20
    • 2442628301 scopus 로고    scopus 로고
    • Joint matrix quantization of face parameters and LPC coefficients for low bit rate audiovisual speech coding
    • Girin, L. (2004). " Joint matrix quantization of face parameters and LPC coefficients for low bit rate audiovisual speech coding.," IEEE Trans. Speech Audio Process. 12, 265-276.
    • (2004) IEEE Trans. Speech Audio Process. , vol.12 , pp. 265-276
    • Girin, L.1
  • 21
    • 0034974093 scopus 로고    scopus 로고
    • Audio-visual enhancement of speech noise
    • "
    • Girin, L., Schwartz, J.-L., and Feng, G. (2001). " Audio-visual enhancement of speech noise.," J. Acoust. Soc. Am. 109, 3007-3020.
    • (2001) J. Acoust. Soc. Am. , vol.109 , pp. 3007-3020
    • Girin, L.1    Schwartz, J.-L.2    Feng, G.3
  • 23
    • 0033822769 scopus 로고    scopus 로고
    • The use of visible speech cues for improving auditory detection of spoken sentences
    • Grant, K. W., and Seitz, P. (2000). " The use of visible speech cues for improving auditory detection of spoken sentences.," J. Acoust. Soc. Am. 108, 1197-1208.
    • (2000) J. Acoust. Soc. Am. , vol.108 , pp. 1197-1208
    • Grant, K.W.1    Seitz, P.2
  • 25
    • 14944343138 scopus 로고    scopus 로고
    • in Proceedings of the Workshoat the International Conference on Computer Vision (ICCV) on Recognition, Analysis and Tracking of Face and Gestures in Real Time Systems (RATFG-RTS), Vancouver, Canada
    • Iyengar, G., and Neti, C. (2001). " A vision-based microphone switch for speech intent detection.," in Proceedings of the Workshop at the International Conference on Computer Vision (ICCV) on Recognition, Analysis and Tracking of Face and Gestures in Real Time Systems (RATFG-RTS), Vancouver, Canada, pp. 101-105.
    • (2001) A Vision-based Microphone Switch for Speech Intent Detection , pp. 101-105
    • Iyengar, G.1    Neti, C.2
  • 27
    • 10444258058 scopus 로고    scopus 로고
    • Investigating the audio-visual speech detection advantage
    • Kim, J., and Davis, C. (2004). " Investigating the audio-visual speech detection advantage.," Speech Commun. 44, 19-30.
    • (2004) Speech Commun. , vol.44 , pp. 19-30
    • Kim, J.1    Davis, C.2
  • 29
    • 0001653589 scopus 로고
    • The Lombard sign and the role of hearing in speech
    • Lane, H., and Tranel, B. (1971). " The Lombard sign and the role of hearing in speech.," J. Speech Hear. Res. 14, 677-709.
    • (1971) J. Speech Hear. Res. , vol.14 , pp. 677-709
    • Lane, H.1    Tranel, B.2
  • 30
    • 0029290274 scopus 로고
    • Study of a voice activity detector and its influence on a noise reduction system
    • Le Bouquin-Jeanǹs, R., and Faucon, G. (1995). " Study of a voice activity detector and its influence on a noise reduction system.," Speech Commun. 16, 245-254.
    • (1995) Speech Commun. , vol.16 , pp. 245-254
    • Le Bouquin-Jeanǹs, R.1    Faucon, G.2
  • 31
    • 4544351504 scopus 로고    scopus 로고
    • in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada
    • Liu, P., and Wang, Z. (2004). " Voice activity detection using visual information.," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Montreal, Canada, pp. 609-612.
    • (2004) Voice Activity Detection Using Visual Information , pp. 609-612
    • Liu, P.1    Wang, Z.2
  • 34
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • McGurk, H., and McDonald, J. (1976). " Hearing lips and seeing voices.," Nature (London) 264, 746-748.
    • (1976) Nature (London) , vol.264 , pp. 746-748
    • McGurk, H.1    McDonald, J.2
  • 36
    • 0037038098 scopus 로고    scopus 로고
    • Dynamic visual speech perception in a patient with visual form agnosia
    • "
    • Munhall, K. G., Servos, P., Santi, A., and Goodale, M. (2002). " Dynamic visual speech perception in a patient with visual form agnosia.," NeuroReport 13, 1793-1796.
    • (2002) NeuroReport , vol.13 , pp. 1793-1796
    • Munhall, K.G.1    Servos, P.2    Santi, A.3    Goodale, M.4
  • 38
    • 0021541159 scopus 로고
    • in Proceedings of the Global Telecommunications Conference (GLOBCOM), Atlanta, GA
    • Petajan, E. D. (1984). " Automatic lipreading to enhance speech recognition.," in Proceedings of the Global Telecommunications Conference (GLOBCOM), Atlanta, GA, pp. 265-272.
    • (1984) Automatic Lipreading to Enhance Speech Recognition , pp. 265-272
    • Petajan, E.D.1
  • 40
    • 4544290191 scopus 로고    scopus 로고
    • Recent advances in the automatic recognition of visual speech
    • "
    • Potamianos, G., Neti, C., and Gravier, G. (2003b). " Recent advances in the automatic recognition of visual speech.," Proc. IEEE 91, 1306-1326.
    • (2003) Proc. IEEE , vol.91 , pp. 1306-1326
    • Potamianos, G.1    Neti, C.2    Gravier, G.3
  • 41
    • 1842476689 scopus 로고    scopus 로고
    • Efficient voice activity detection algorithms using long-term speech information
    • "
    • Ramirez, J., Segura, J. C., Bemtez, C., de la Torre, A., and Rubio, A. (2004). " Efficient voice activity detection algorithms using long-term speech information.," Speech Commun. 42, 271-287.
    • (2004) Speech Commun. , vol.42 , pp. 271-287
    • Ramirez, J.1    Segura, J.C.2    Bemtez, C.3    De La Torre, A.4    Rubio, A.5
  • 42
    • 23344452899 scopus 로고    scopus 로고
    • Statistical voice activity detection using a multiple observation likelihood ratio test
    • "
    • Ramírez, J., Segura, J. C., Benítez, C., García, L., and Rubio, A. (2005). " Statistical voice activity detection using a multiple observation likelihood ratio test.," IEEE Signal Process. Lett. 12, 689-692.
    • (2005) IEEE Signal Process. Lett. , vol.12 , pp. 689-692
    • Ramírez, J.1    Segura, J.C.2    Benítez, C.3    García, L.4    Rubio, A.5
  • 43
    • 59849095099 scopus 로고    scopus 로고
    • in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, GA
    • Rao, R., and Chen, T. (1996). " Cross-modal predictive coding for talking head sequences.," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, GA, pp. 2058-2061.
    • (1996) Cross-modal Predictive Coding for Talking Head Sequences , pp. 2058-2061
    • Rao, R.1    Chen, T.2
  • 44
    • 34447100075 scopus 로고    scopus 로고
    • Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures
    • "
    • Rivet, B., Girin, L., and Jutten, C. (2007a). " Mixing audiovisual speech processing and blind source separation for the extraction of speech signals from convolutive mixtures.," IEEE Trans. Audio, Speech, Lang. Process. 15, 96-108.
    • (2007) IEEE Trans. Audio, Speech, Lang. Process. , vol.15 , pp. 96-108
    • Rivet, B.1    Girin, L.2    Jutten, C.3
  • 45
    • 34447095008 scopus 로고    scopus 로고
    • Visual voice activity detection as a help for speech source separation from convolutive mixtures
    • "
    • Rivet, B., Girin, L., and Jutten, C. (2007b). " Visual voice activity detection as a help for speech source separation from convolutive mixtures.," Speech Commun. 49, 667-677.
    • (2007) Speech Commun. , vol.49 , pp. 667-677
    • Rivet, B.1    Girin, L.2    Jutten, C.3
  • 46
    • 0031747741 scopus 로고    scopus 로고
    • Complementary and synergy in bimodal speech: Auditory, visual, and audio-visual identification of French oral vowels in noise
    • "
    • Robert-Ribes, J., Schwartz, J. L., Lallouache, T., and Escudier, P. (1998). " Complementary and synergy in bimodal speech: Auditory, visual, and audio-visual identification of French oral vowels in noise.," J. Acoust. Soc. Am. 6, 3677-3689.
    • (1998) J. Acoust. Soc. Am. , vol.6 , pp. 3677-3689
    • Robert-Ribes, J.1    Schwartz, J.L.2    Lallouache, T.3    Escudier, P.4
  • 47
    • 0029853869 scopus 로고    scopus 로고
    • Visual kinematic information for embellishing speech in noise
    • "
    • Rosenblum, L. D., Johnson, J. A., and Saldana, H. M. (1996). " Visual kinematic information for embellishing speech in noise.," J. Speech Hear. Res. 39, 1159-1170.
    • (1996) J. Speech Hear. Res. , vol.39 , pp. 1159-1170
    • Rosenblum, L.D.1    Johnson, J.A.2    Saldana, H.M.3
  • 48
    • 0030114603 scopus 로고    scopus 로고
    • An audiovisual test of kinematic primitives for visual speech perception
    • Rosenblum, L. D., and Saldana, H. M. (1996). " An audiovisual test of kinematic primitives for visual speech perception.," J. Exp. Psychol. Hum. Percept. Perform. 22, 318-331.
    • (1996) J. Exp. Psychol. Hum. Percept. Perform. , vol.22 , pp. 318-331
    • Rosenblum, L.D.1    Saldana, H.M.2
  • 49
    • 4544333803 scopus 로고    scopus 로고
    • Seeing to hear better: Evidence for early audio-visual interactions in speech identification
    • "
    • Schwartz, J. L., Berthommier, F., and Savariaux, C. (2004). " Seeing to hear better: Evidence for early audio-visual interactions in speech identification.," Cognition 93, 69-78.
    • (2004) Cognition , vol.93 , pp. 69-78
    • Schwartz, J.L.1    Berthommier, F.2    Savariaux, C.3
  • 50
    • 0036874541 scopus 로고    scopus 로고
    • Separation of audio-visual speech sources: A new approach exploiting the audiovisual coherence of speech stimuli
    • "
    • Sodoyer, D., Girin, L., Jutten, C., and Schwartz, J. L. (2002). " Separation of audio-visual speech sources: A new approach exploiting the audiovisual coherence of speech stimuli.," EURASIP J. Appl. Signal Process. 11, 1165-1173.
    • (2002) EURASIP J. Appl. Signal Process. , vol.11 , pp. 1165-1173
    • Sodoyer, D.1    Girin, L.2    Jutten, C.3    Schwartz, J.L.4
  • 51
    • 10444247388 scopus 로고    scopus 로고
    • Further experiments on audio-visual speech source separation
    • "
    • Sodoyer, D., Girin, L., Jutten, C., and Schwartz, J. L. (2004). " Further experiments on audio-visual speech source separation.," Speech Commun. 44, 113-125.
    • (2004) Speech Commun. , vol.44 , pp. 113-125
    • Sodoyer, D.1    Girin, L.2    Jutten, C.3    Schwartz, J.L.4
  • 53
    • 0032762471 scopus 로고    scopus 로고
    • A statistical model based voice activity detection
    • "
    • Sohn, J., Kim, N. S., and Sung, W. (1999). " A statistical model based voice activity detection.," IEEE Signal Process. Lett. 6, 1-3.
    • (1999) IEEE Signal Process. Lett. , vol.6 , pp. 1-3
    • Sohn, J.1    Kim, N.S.2    Sung, W.3
  • 54
    • 0001048664 scopus 로고
    • Visual contribution to speech intelligibility in noise
    • Sumby, W. H., and Pollack, I. (1954). " Visual contribution to speech intelligibility in noise.," J. Acoust. Soc. Am. 26, 212-215.
    • (1954) J. Acoust. Soc. Am. , vol.26 , pp. 212-215
    • Sumby, W.H.1    Pollack, I.2
  • 55
    • 0018701386 scopus 로고
    • Use of visual information for phonetic perception
    • Summerfield, Q. (1979). " Use of visual information for phonetic perception.," Phonetica 36, 314-331.
    • (1979) Phonetica , vol.36 , pp. 314-331
    • Summerfield, Q.1
  • 56
    • 0002028032 scopus 로고
    • Some preliminaries to a comprehensive account of audio-visual speech perception
    • in, edited by B. Dodd and R. Campbell (Erlbaum, London)
    • Summerfield, Q. (1987). " Some preliminaries to a comprehensive account of audio-visual speech perception.," in Hearing by Eye: The Psychology of Lip-Reading, edited by, B. Dodd, and, R. Campbell, (Erlbaum, London), pp. 3-51.
    • (1987) Hearing by Eye: The Psychology of Lip-Reading , pp. 3-51
    • Summerfield, Q.1
  • 57
    • 0034228994 scopus 로고    scopus 로고
    • Voice activity detection in nonstationary noise
    • Tanyer, S. G., and Ozer, H. (2000). " Voice activity detection in nonstationary noise.," IEEE Trans. Speech Audio Process. 8, 478-482.
    • (2000) IEEE Trans. Speech Audio Process. , vol.8 , pp. 478-482
    • Tanyer, S.G.1    Ozer, H.2
  • 58
    • 4744359141 scopus 로고    scopus 로고
    • Contributions of oral and extraoral facial movement to visual and audiovisual speech perception
    • Thomas, S. M., and Jordan, T. R. (2004). " Contributions of oral and extraoral facial movement to visual and audiovisual speech perception.," J. Exp. Psychol. 30, 873-888.
    • (2004) J. Exp. Psychol. , vol.30 , pp. 873-888
    • Thomas, S.M.1    Jordan, T.R.2
  • 59
    • 33646231347 scopus 로고    scopus 로고
    • " in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia
    • Wang, W., Cosker, D., Hicks, Y., Sanei, S., and Chambers, J. A. (2005). " Video assisted speech source separation.," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia, pp. 425-428.
    • (2005) Video Assisted Speech Source Separation , pp. 425-428
    • Wang, W.1    Cosker, D.2    Hicks, Y.3    Sanei, S.4    Chambers, J.A.5
  • 60
    • 0000986839 scopus 로고    scopus 로고
    • " in Proceedings of the Seminar on Speech Production: Models and Data and CREST Workshoon Models of Speech Production: Motor Planning and Articulatory Modelling, Kloster Seeon, Germany
    • Yehia, H., Kuratate, T., and Vatikiotis-Bateson, E. (2000). " Facial animation and head motion driven by speech acoustics.," in Proceedings of the Seminar on Speech Production: Models and Data and CREST Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling, Kloster Seeon, Germany, pp. 265-268.
    • (2000) Facial Animation and Head Motion Driven by Speech Acoustics , pp. 265-268
    • Yehia, H.1    Kuratate, T.2    Vatikiotis-Bateson, E.3
  • 61
    • 0032178592 scopus 로고    scopus 로고
    • Quantitative association of vocal-tract and facial behavior
    • "
    • Yehia, H., Rubin, P., and Vatikiotis-Bateson, E. (1998). " Quantitative association of vocal-tract and facial behavior.," Speech Commun. 26, 23-43.
    • (1998) Speech Commun. , vol.26 , pp. 23-43
    • Yehia, H.1    Rubin, P.2    Vatikiotis-Bateson, E.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.