SCOPUS 정보 검색 플랫폼

MM'09 - Proceedings of the 2009 ACM Multimedia Conference, with Co-located Workshops and Symposiums

Volumn , Issue , 2009, Pages 195-202

Visual speaker localization aided by acoustic models

(3) Friedland, Gerald a Yeo, Chuohao b Hung, Hayley c

a INTERNATIONAL COMPUTER SCIENCE INSTITUTE (United States)

b UNIVERSITY OF CALIFORNIA (United States)

c IDIAP RESEARCH INSTITUTE (Switzerland)

Author keywords

Multimodal integration; Speaker diarization; Speech; Visual localization

Indexed keywords

ACOUSTIC MODEL; AUDIO-VISUAL; AUDIO-VISUAL INTEGRATION; COMPUTATION COSTS; FAR-FIELD; MULTIMODAL INTEGRATION; OPTIMIZATION PROBLEMS; REAL-WORLD; SPEAKER DIARIZATION; SPEAKER LOCALIZATION; STATE OF THE ART; VISUAL LOCALIZATION; VISUAL MODEL;

METHOD OF MOMENTS; MULTIMEDIA SYSTEMS; TECHNICAL PRESENTATIONS;

AUDIO ACOUSTICS;

EID: 72449147255 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/1631272.1631301 Document Type: Conference Paper

Times cited : (29)

References (28)

1
- 72449135653
- Working with Very Sparse Data to Detect Speaker and Listener Participation in a Meetings Corpus
- May
- N. Campbell and N. Suzuki. Working with Very Sparse Data to Detect Speaker and Listener Participation in a Meetings Corpus. In Workshop Programme, volume 10, May 2006.
- (2006) Workshop Programme , vol.10
- Campbell, N.¹ Suzuki, N.²

2
- 33846265193
- The AMI meeting corpus: A pre-announcement
- J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraiij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, M. McCowan, W. Post, D. Reidsma, and P. Wellner. The AMI meeting corpus: A pre-announcement. In Joint Workshop on Machine Learning and Multimodal Interaction (MLMI), 2005.
- (2005) Joint Workshop on Machine Learning and Multimodal Interaction (MLMI)
- Carletta, J.¹ Ashby, S.² Bourban, S.³ Flynn, M.⁴ Guillemot, M.⁵ Hain, T.⁶ Kadlec, J.⁷ Karaiskos, V.⁸ Kraiij, W.⁹ Kronenthal, M.¹⁰ Lathoud, G.¹¹ Lincoln, M.¹² Lisowska, A.¹³ McCowan, M.¹⁴ Post, W.¹⁵ Reidsma, D.¹⁶ Wellner, P.¹⁷

3
- 0029746565
- Cross-modal Prediction in Audio-visual Communication
- T. Chen and R. Rao. Cross-modal Prediction in Audio-visual Communication. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), volume 4, pages 2056-2059, 1996.
- (1996) International Conference on Acoustics, Speech and Signal Processing (ICASSP) , vol.4 , pp. 2056-2059
- Chen, T.¹ Rao, R.²

4
- 2642562769
- Speaker association with signal-level audiovisual fusion
- J. W. Fisher and T. Darrell. Speaker association with signal-level audiovisual fusion. IEEE Transactions on Multimedia, 6(3):406-413, 2004.
- (2004) IEEE Transactions on Multimedia , vol.6 , Issue.3 , pp. 406-413
- Fisher, J.W.¹ Darrell, T.²

5
- 0009622481
- Learning joint statistical models for audio-visual fusion and segregation
- J. W. Fisher, T. Darrell, W. T. Freeman, and P. A. Viola. Learning joint statistical models for audio-visual fusion and segregation. In Conference on Neural Information Processing Systems (NIPS), pages 772-778, 2000.
- (2000) Conference on Neural Information Processing Systems (NIPS) , pp. 772-778
- Fisher, J.W.¹ Darrell, T.² Freeman, W.T.³ Viola, P.A.⁴

6
- 70349214881
- G. Friedland, H. Hung, and C. Yeo. Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), page (to appear), 2009.
- G. Friedland, H. Hung, and C. Yeo. Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), page (to appear), 2009.

7
- 0000159549
- Reaction time as a measure of intersensory facilitation
- M. Hershenson. Reaction time as a measure of intersensory facilitation. J Exp Psychol, 63:289-93, 1962.
- (1962) J Exp Psychol , vol.63 , pp. 289-293
- Hershenson, M.¹

8
- 72449169479
- PrintPartners Ipskamp, Enschede, The Netherlands
- M. Huijbregts. Segmentation, Diarization, and Speech Transcription: Surprise Data Unraveled. PrintPartners Ipskamp, Enschede, The Netherlands, 2008.
- (2008) Segmentation, Diarization, and Speech Transcription: Surprise Data Unraveled
- Huijbregts, M.¹

9
- 84897697234
- Towards audio-visual on-line diarization of participants in group meetings
- Marseille, France, October
- H. Hung and G. Friedland. Towards audio-visual on-line diarization of participants in group meetings. In Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications in conjunction with ECCV, Marseille, France, October 2008.
- (2008) Workshop on Multi-camera and Multi-modal Sensor Fusion Algorithms and Applications in conjunction with ECCV
- Hung, H.¹ Friedland, G.²

10
- 51449123805
- Estimating the dominant person in multi-party conversations using speaker diarization strategies
- H. Hung, Y. Huang, G. Friedland, and D. Gatica-Perez. Estimating the dominant person in multi-party conversations using speaker diarization strategies. In International Conference on Acoustics, Speech, and Signal Processing, 2008.
- (2008) International Conference on Acoustics, Speech, and Signal Processing
- Hung, H.¹ Huang, Y.² Friedland, G.³ Gatica-Perez, D.⁴

11
- 51849100406
- Associating audio-visual activity cues in a dominance estimation framework
- Ankorage, Alaska
- H. Hung, Y. Huang, C. Yeo, and D. Gatica-Perez. Associating audio-visual activity cues in a dominance estimation framework. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Human Communicative Behavior, Ankorage, Alaska, 2008.
- (2008) IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Human Communicative Behavior
- Hung, H.¹ Huang, Y.² Yeo, C.³ Gatica-Perez, D.⁴

12
- 51849100406
- Correlating audio-visual cues in a dominance estimation framework
- H. Hung, Y. Huang, C. Yeo, and D. Gatica-Perez. Correlating audio-visual cues in a dominance estimation framework. In CVPR Workshop on Human Communicative Behavior Analysis, 2008.
- (2008) CVPR Workshop on Human Communicative Behavior Analysis
- Hung, H.¹ Huang, Y.² Yeo, C.³ Gatica-Perez, D.⁴

13
- 0017199877
- Hearing lips and seeing voices
- H. McGurk and J. MacDonald. Hearing lips and seeing voices. Nature, 264(5588):746-48, 1976.
- (1976) Nature , vol.264 , Issue.5588 , pp. 746-748
- McGurk, H.¹ MacDonald, J.²

14
- 0032312273
- Modelling facial colour and identity with gaussian mixtures
- S. J. McKenna, S. Gong, and Y. Raja. Modelling facial colour and identity with gaussian mixtures. Pattern Recognition, 31(12):1883-1892, 1998.
- (1998) Pattern Recognition , vol.31 , Issue.12 , pp. 1883-1892
- McKenna, S.J.¹ Gong, S.² Raja, Y.³

15
- 33846320482
- Cambridge University Press New York
- D. McNeill. Language and Gesture. Cambridge University Press New York, 2000.
- (2000) Language and Gesture
- McNeill, D.¹

16
- 35248827017
- Speaker localisation using audio-visual synchrony: An empirical study
- H. J. Nock, G. Iyengar, and C. Neti. Speaker localisation using audio-visual synchrony: An empirical study. In ACM International Conference on Image and Video Retrieval (CIVR), pages 488-499, 2003.
- (2003) ACM International Conference on Image and Video Retrieval (CIVR) , pp. 488-499
- Nock, H.J.¹ Iyengar, G.² Neti, C.³

17
- 57649176425
- On-line multi-modal speaker diarization
- New York, USA, ACM
- A. Noulas and B. J. A. Krose. On-line multi-modal speaker diarization. In Proc. International Conference on Multimodal Interfaces (ICMI), pages 350-357, New York, USA, 2007. ACM.
- (2007) Proc. International Conference on Multimodal Interfaces (ICMI) , pp. 350-357
- Noulas, A.¹ Krose, B.J.A.²

18
- 34548310397
- Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
- J. Pardo, X. Anguera, and C. Wooters. Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information. IEEE Transactions on Computers, 56(9):1189, 2007.
- (2007) IEEE Transactions on Computers , vol.56 , Issue.9 , pp. 1189
- Pardo, J.¹ Anguera, X.² Wooters, C.³

19
- 0036299249
- CUAVE: A new audio-visual database for multimodal human-computer interface research
- E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy. CUAVE: A new audio-visual database for multimodal human-computer interface research. In International Conference on Acoustics, Speech, and Signal Processing, pages 2017-2020, 2002.
- (2002) International Conference on Acoustics, Speech, and Signal Processing , pp. 2017-2020
- Patterson, E.K.¹ Gurbuz, S.² Tufekci, Z.³ Gowdy, J.N.⁴

20
- 41549121431
- Exploiting audio-visual correlation in coding of talking head sequences
- March
- R. Rao and T. Chen. Exploiting audio-visual correlation in coding of talking head sequences. International Picture Coding Symposium, March 1996.
- (1996) International Picture Coding Symposium
- Rao, R.¹ Chen, T.²

21
- 33646380923
- Approaches and applications of audio diarization
- D. A. Reynolds and P. Torres-Carrasquillo. Approaches and applications of audio diarization. In Proc. of International Conference on Audio and Speech Signal Processing, 2005.
- (2005) Proc. of International Conference on Audio and Speech Signal Processing
- Reynolds, D.A.¹ Torres-Carrasquillo, P.²

22
- 34547527871
- Dynamic dependency tests for audio-visual speaker association
- April
- M. Siracusa and J. Fisher. Dynamic dependency tests for audio-visual speaker association. In Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), April 2007.
- (2007) Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP)
- Siracusa, M.¹ Fisher, J.²

23
- 1542572925
- Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images
- S. Tamura, K. Iwano, and S. FURUI. Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images. Real World Speech Processing, 2004.
- (2004) Real World Speech Processing
- Tamura, S.¹ Iwano, K.² FURUI, S.³

24
- 34047223614
- Audio segmentation and speaker localization in meeting videos
- H. Vajaria, T. Islam, S. Sarkar, R. Sankar, and R. Kasturi. Audio segmentation and speaker localization in meeting videos. International Conference on Pattern Recognition, 2006. ICPR 2006. 18th, 2:1150-1153, 2006.
- (2006) International Conference on Pattern Recognition, 2006. ICPR 2006. 18th , vol.2 , pp. 1150-1153
- Vajaria, H.¹ Islam, T.² Sarkar, S.³ Sankar, R.⁴ Kasturi, R.⁵

25
- 55149110077
- Exploring co-occurence between speech and body movement for audio-guided video localization
- Nov
- H. Vajaria, S. Sarkar, and R. Kasturi. Exploring co-occurence between speech and body movement for audio-guided video localization. IEEE Transactions on Circuits and Systems for Video Technology, 18:1608-1617, Nov 2008.
- (2008) IEEE Transactions on Circuits and Systems for Video Technology , vol.18 , pp. 1608-1617
- Vajaria, H.¹ Sarkar, S.² Kasturi, R.³

26
- 51449087867
- The ICSI RT07s speaker diarization system
- C. Wooters and M. Huijbregts. The ICSI RT07s speaker diarization system. In Proceedings of the Rich Transcription 2007 Meeting Recognition Evaluation Workshop, 2007.
- (2007) Proceedings of the Rich Transcription 2007 Meeting Recognition Evaluation Workshop
- Wooters, C.¹ Huijbregts, M.²

27
- 72449191672
- C. Yeo and K. Ramchandran. Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Technical Report UCB/EECS-2008-79, EECS Department, University of California, Berkeley, Jun 2008.
- C. Yeo and K. Ramchandran. Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Technical Report UCB/EECS-2008-79, EECS Department, University of California, Berkeley, Jun 2008.

28
- 34250758546
- Boosting-Based Multimodal Speaker Detection for Distributed Meetings
- C. Zhang, P. Yin, Y. Rui, R. Cutler, and P. Viola. Boosting-Based Multimodal Speaker Detection for Distributed Meetings. IEEE International Workshop on Multimedia Signal Processing (MMSP) 2006, 2006.
- (2006) IEEE International Workshop on Multimedia Signal Processing (MMSP) 2006
- Zhang, C.¹ Yin, P.² Rui, Y.³ Cutler, R.⁴ Viola, P.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.