-
1
-
-
72449135653
-
Working with Very Sparse Data to Detect Speaker and Listener Participation in a Meetings Corpus
-
May
-
N. Campbell and N. Suzuki. Working with Very Sparse Data to Detect Speaker and Listener Participation in a Meetings Corpus. In Workshop Programme, volume 10, May 2006.
-
(2006)
Workshop Programme
, vol.10
-
-
Campbell, N.1
Suzuki, N.2
-
2
-
-
33846265193
-
The AMI meeting corpus: A pre-announcement
-
J. Carletta, S. Ashby, S. Bourban, M. Flynn, M. Guillemot, T. Hain, J. Kadlec, V. Karaiskos, W. Kraiij, M. Kronenthal, G. Lathoud, M. Lincoln, A. Lisowska, M. McCowan, W. Post, D. Reidsma, and P. Wellner. The AMI meeting corpus: A pre-announcement. In Joint Workshop on Machine Learning and Multimodal Interaction (MLMI), 2005.
-
(2005)
Joint Workshop on Machine Learning and Multimodal Interaction (MLMI)
-
-
Carletta, J.1
Ashby, S.2
Bourban, S.3
Flynn, M.4
Guillemot, M.5
Hain, T.6
Kadlec, J.7
Karaiskos, V.8
Kraiij, W.9
Kronenthal, M.10
Lathoud, G.11
Lincoln, M.12
Lisowska, A.13
McCowan, M.14
Post, W.15
Reidsma, D.16
Wellner, P.17
-
4
-
-
2642562769
-
Speaker association with signal-level audiovisual fusion
-
J. W. Fisher and T. Darrell. Speaker association with signal-level audiovisual fusion. IEEE Transactions on Multimedia, 6(3):406-413, 2004.
-
(2004)
IEEE Transactions on Multimedia
, vol.6
, Issue.3
, pp. 406-413
-
-
Fisher, J.W.1
Darrell, T.2
-
5
-
-
0009622481
-
Learning joint statistical models for audio-visual fusion and segregation
-
J. W. Fisher, T. Darrell, W. T. Freeman, and P. A. Viola. Learning joint statistical models for audio-visual fusion and segregation. In Conference on Neural Information Processing Systems (NIPS), pages 772-778, 2000.
-
(2000)
Conference on Neural Information Processing Systems (NIPS)
, pp. 772-778
-
-
Fisher, J.W.1
Darrell, T.2
Freeman, W.T.3
Viola, P.A.4
-
6
-
-
70349214881
-
-
G. Friedland, H. Hung, and C. Yeo. Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), page (to appear), 2009.
-
G. Friedland, H. Hung, and C. Yeo. Multi-modal speaker diarization of real-world meetings using compressed-domain video features. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), page (to appear), 2009.
-
-
-
-
7
-
-
0000159549
-
Reaction time as a measure of intersensory facilitation
-
M. Hershenson. Reaction time as a measure of intersensory facilitation. J Exp Psychol, 63:289-93, 1962.
-
(1962)
J Exp Psychol
, vol.63
, pp. 289-293
-
-
Hershenson, M.1
-
8
-
-
72449169479
-
-
PrintPartners Ipskamp, Enschede, The Netherlands
-
M. Huijbregts. Segmentation, Diarization, and Speech Transcription: Surprise Data Unraveled. PrintPartners Ipskamp, Enschede, The Netherlands, 2008.
-
(2008)
Segmentation, Diarization, and Speech Transcription: Surprise Data Unraveled
-
-
Huijbregts, M.1
-
10
-
-
51449123805
-
Estimating the dominant person in multi-party conversations using speaker diarization strategies
-
H. Hung, Y. Huang, G. Friedland, and D. Gatica-Perez. Estimating the dominant person in multi-party conversations using speaker diarization strategies. In International Conference on Acoustics, Speech, and Signal Processing, 2008.
-
(2008)
International Conference on Acoustics, Speech, and Signal Processing
-
-
Hung, H.1
Huang, Y.2
Friedland, G.3
Gatica-Perez, D.4
-
11
-
-
51849100406
-
Associating audio-visual activity cues in a dominance estimation framework
-
Ankorage, Alaska
-
H. Hung, Y. Huang, C. Yeo, and D. Gatica-Perez. Associating audio-visual activity cues in a dominance estimation framework. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Human Communicative Behavior, Ankorage, Alaska, 2008.
-
(2008)
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR) Workshop on Human Communicative Behavior
-
-
Hung, H.1
Huang, Y.2
Yeo, C.3
Gatica-Perez, D.4
-
13
-
-
0017199877
-
Hearing lips and seeing voices
-
H. McGurk and J. MacDonald. Hearing lips and seeing voices. Nature, 264(5588):746-48, 1976.
-
(1976)
Nature
, vol.264
, Issue.5588
, pp. 746-748
-
-
McGurk, H.1
MacDonald, J.2
-
14
-
-
0032312273
-
Modelling facial colour and identity with gaussian mixtures
-
S. J. McKenna, S. Gong, and Y. Raja. Modelling facial colour and identity with gaussian mixtures. Pattern Recognition, 31(12):1883-1892, 1998.
-
(1998)
Pattern Recognition
, vol.31
, Issue.12
, pp. 1883-1892
-
-
McKenna, S.J.1
Gong, S.2
Raja, Y.3
-
15
-
-
33846320482
-
-
Cambridge University Press New York
-
D. McNeill. Language and Gesture. Cambridge University Press New York, 2000.
-
(2000)
Language and Gesture
-
-
McNeill, D.1
-
18
-
-
34548310397
-
Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information
-
J. Pardo, X. Anguera, and C. Wooters. Speaker Diarization For Multiple-Distant-Microphone Meetings Using Several Sources of Information. IEEE Transactions on Computers, 56(9):1189, 2007.
-
(2007)
IEEE Transactions on Computers
, vol.56
, Issue.9
, pp. 1189
-
-
Pardo, J.1
Anguera, X.2
Wooters, C.3
-
19
-
-
0036299249
-
CUAVE: A new audio-visual database for multimodal human-computer interface research
-
E. K. Patterson, S. Gurbuz, Z. Tufekci, and J. N. Gowdy. CUAVE: A new audio-visual database for multimodal human-computer interface research. In International Conference on Acoustics, Speech, and Signal Processing, pages 2017-2020, 2002.
-
(2002)
International Conference on Acoustics, Speech, and Signal Processing
, pp. 2017-2020
-
-
Patterson, E.K.1
Gurbuz, S.2
Tufekci, Z.3
Gowdy, J.N.4
-
20
-
-
41549121431
-
Exploiting audio-visual correlation in coding of talking head sequences
-
March
-
R. Rao and T. Chen. Exploiting audio-visual correlation in coding of talking head sequences. International Picture Coding Symposium, March 1996.
-
(1996)
International Picture Coding Symposium
-
-
Rao, R.1
Chen, T.2
-
23
-
-
1542572925
-
Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images
-
S. Tamura, K. Iwano, and S. FURUI. Multi-Modal Speech Recognition Using Optical-Flow Analysis for Lip Images. Real World Speech Processing, 2004.
-
(2004)
Real World Speech Processing
-
-
Tamura, S.1
Iwano, K.2
FURUI, S.3
-
24
-
-
34047223614
-
Audio segmentation and speaker localization in meeting videos
-
H. Vajaria, T. Islam, S. Sarkar, R. Sankar, and R. Kasturi. Audio segmentation and speaker localization in meeting videos. International Conference on Pattern Recognition, 2006. ICPR 2006. 18th, 2:1150-1153, 2006.
-
(2006)
International Conference on Pattern Recognition, 2006. ICPR 2006. 18th
, vol.2
, pp. 1150-1153
-
-
Vajaria, H.1
Islam, T.2
Sarkar, S.3
Sankar, R.4
Kasturi, R.5
-
27
-
-
72449191672
-
-
C. Yeo and K. Ramchandran. Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Technical Report UCB/EECS-2008-79, EECS Department, University of California, Berkeley, Jun 2008.
-
C. Yeo and K. Ramchandran. Compressed domain video processing of meetings for activity estimation in dominance classification and slide transition detection. Technical Report UCB/EECS-2008-79, EECS Department, University of California, Berkeley, Jun 2008.
-
-
-
-
28
-
-
34250758546
-
Boosting-Based Multimodal Speaker Detection for Distributed Meetings
-
C. Zhang, P. Yin, Y. Rui, R. Cutler, and P. Viola. Boosting-Based Multimodal Speaker Detection for Distributed Meetings. IEEE International Workshop on Multimedia Signal Processing (MMSP) 2006, 2006.
-
(2006)
IEEE International Workshop on Multimedia Signal Processing (MMSP) 2006
-
-
Zhang, C.1
Yin, P.2
Rui, Y.3
Cutler, R.4
Viola, P.5
|