메뉴 건너뛰기




Volumn , Issue , 2009, Pages 195-202

Visual speaker localization aided by acoustic models

Author keywords

Multimodal integration; Speaker diarization; Speech; Visual localization

Indexed keywords

ACOUSTIC MODEL; AUDIO-VISUAL; AUDIO-VISUAL INTEGRATION; COMPUTATION COSTS; FAR-FIELD; MULTIMODAL INTEGRATION; OPTIMIZATION PROBLEMS; REAL-WORLD; SPEAKER DIARIZATION; SPEAKER LOCALIZATION; STATE OF THE ART; VISUAL LOCALIZATION; VISUAL MODEL;

EID: 72449147255     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/1631272.1631301     Document Type: Conference Paper
Times cited : (29)

References (28)
  • 1
    • 72449135653 scopus 로고    scopus 로고
    • Working with Very Sparse Data to Detect Speaker and Listener Participation in a Meetings Corpus
    • May
    • N. Campbell and N. Suzuki. Working with Very Sparse Data to Detect Speaker and Listener Participation in a Meetings Corpus. In Workshop Programme, volume 10, May 2006.
    • (2006) Workshop Programme , vol.10
    • Campbell, N.1    Suzuki, N.2
  • 4
    • 2642562769 scopus 로고    scopus 로고
    • Speaker association with signal-level audiovisual fusion
    • J. W. Fisher and T. Darrell. Speaker association with signal-level audiovisual fusion. IEEE Transactions on Multimedia, 6(3):406-413, 2004.
    • (2004) IEEE Transactions on Multimedia , vol.6 , Issue.3 , pp. 406-413
    • Fisher, J.W.1    Darrell, T.2
  • 6
    • 70349214881 scopus 로고    scopus 로고
    • G. Friedland, H. Hung, and C. Yeo. Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), page (to appear), 2009.
    • G. Friedland, H. Hung, and C. Yeo. Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), page (to appear), 2009.
  • 7
    • 0000159549 scopus 로고
    • Reaction time as a measure of intersensory facilitation
    • M. Hershenson. Reaction time as a measure of intersensory facilitation. J Exp Psychol, 63:289-93, 1962.
    • (1962) J Exp Psychol , vol.63 , pp. 289-293
    • Hershenson, M.1
  • 13
    • 0017199877 scopus 로고
    • Hearing lips and seeing voices
    • H. McGurk and J. MacDonald. Hearing lips and seeing voices. Nature, 264(5588):746-48, 1976.
    • (1976) Nature , vol.264 , Issue.5588 , pp. 746-748
    • McGurk, H.1    MacDonald, J.2
  • 14
    • 0032312273 scopus 로고    scopus 로고
    • Modelling facial colour and identity with gaussian mixtures
    • S. J. McKenna, S. Gong, and Y. Raja. Modelling facial colour and identity with gaussian mixtures. Pattern Recognition, 31(12):1883-1892, 1998.
    • (1998) Pattern Recognition , vol.31 , Issue.12 , pp. 1883-1892
    • McKenna, S.J.1    Gong, S.2    Raja, Y.3
  • 15
    • 33846320482 scopus 로고    scopus 로고
    • Cambridge University Press New York
    • D. McNeill. Language and Gesture. Cambridge University Press New York, 2000.
    • (2000) Language and Gesture
    • McNeill, D.1
  • 18
    • 34548310397 scopus 로고    scopus 로고
    • Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
    • J. Pardo, X. Anguera, and C. Wooters. Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information. IEEE Transactions on Computers, 56(9):1189, 2007.
    • (2007) IEEE Transactions on Computers , vol.56 , Issue.9 , pp. 1189
    • Pardo, J.1    Anguera, X.2    Wooters, C.3
  • 20
    • 41549121431 scopus 로고    scopus 로고
    • Exploiting audio-visual correlation in coding of talking head sequences
    • March
    • R. Rao and T. Chen. Exploiting audio-visual correlation in coding of talking head sequences. International Picture Coding Symposium, March 1996.
    • (1996) International Picture Coding Symposium
    • Rao, R.1    Chen, T.2
  • 23
    • 1542572925 scopus 로고    scopus 로고
    • Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images
    • S. Tamura, K. Iwano, and S. FURUI. Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images. Real World Speech Processing, 2004.
    • (2004) Real World Speech Processing
    • Tamura, S.1    Iwano, K.2    FURUI, S.3
  • 27
    • 72449191672 scopus 로고    scopus 로고
    • C. Yeo and K. Ramchandran. Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Technical Report UCB/EECS-2008-79, EECS Department, University of California, Berkeley, Jun 2008.
    • C. Yeo and K. Ramchandran. Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Technical Report UCB/EECS-2008-79, EECS Department, University of California, Berkeley, Jun 2008.


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.