-
2
-
-
77951283289
-
Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs
-
M. Ben, M. Betser, F. Bimbot, and G. Gravier, "Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs," in Proc. Int. Conf. Spoken Language Processing, 2004, pp. 2329-2332.
-
(2004)
Proc. Int. Conf. Spoken Language Processing
, pp. 2329-2332
-
-
Ben, M.1
Betser, M.2
Bimbot, F.3
Gravier, G.4
-
3
-
-
33646779170
-
-
C. Busso, S. Hernanz, C.-W. Chu, S.-I. Kwon, S. Lee, P. Georgiou, 1. Cohen, and S. Narayanan, Smart room: Participant and speaker localization and identification, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 2005, 2, pp. 1117-1120.
-
C. Busso, S. Hernanz, C.-W. Chu, S.-I. Kwon, S. Lee, P. Georgiou, 1. Cohen, and S. Narayanan, "Smart room: Participant and speaker localization and identification," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 2005, vol. 2, pp. 1117-1120.
-
-
-
-
4
-
-
85009212151
-
A sequential metric-based audio segmentation using Bayesian information criterion
-
presented at the
-
S. S. Cheng and H. M. Wang, "A sequential metric-based audio segmentation using Bayesian information criterion," presented at the Eurospeech, 2003.
-
(2003)
Eurospeech
-
-
Cheng, S.S.1
Wang, H.M.2
-
5
-
-
0034507915
-
Look who's talking: Speaker detection using video and audio correlation
-
R. Cutler and L. Davis, "Look who's talking: Speaker detection using video and audio correlation," in Proc. IEEE Int. Conf. Multimedia, 2000, pp. 1589-1592.
-
(2000)
Proc. IEEE Int. Conf. Multimedia
, pp. 1589-1592
-
-
Cutler, R.1
Davis, L.2
-
6
-
-
0038715064
-
Distributed meetings: A meeting capture and broadcast system
-
R. Cutler, Y. Rui, A. Gupta, J. Cadiz, I. Tashev, L. He, A. Colburn, Z. Zhang, Z. Liu, and S. Silverberg, "Distributed meetings: A meeting capture and broadcast system," ACM Multimedia, pp. 503-512, 2002.
-
(2002)
ACM Multimedia
, pp. 503-512
-
-
Cutler, R.1
Rui, Y.2
Gupta, A.3
Cadiz, J.4
Tashev, I.5
He, L.6
Colburn, A.7
Zhang, Z.8
Liu, Z.9
Silverberg, S.10
-
7
-
-
42549103165
-
-
Online, Available
-
J. Fiscus, N. Radde, J. Garofolo, A. Le, J. Ajot, and C. Laprun, Rich Transcription 2005 Spring Meeting Recognition Evaluation 2007 [Online]. Available: www.nist.gov/speech/publications
-
(2007)
Rich Transcription 2005 Spring Meeting Recognition Evaluation
-
-
Fiscus, J.1
Radde, N.2
Garofolo, J.3
Le, A.4
Ajot, J.5
Laprun, C.6
-
8
-
-
84949961905
-
Probabalistic models and informative subspaces for audiovisual correspondence
-
J. Fisher and T. Darrell, "Probabalistic models and informative subspaces for audiovisual correspondence," in Proc: European Conf. Computer Vision, 2002, vol. 3, pp. 592-603.
-
(2002)
Proc: European Conf. Computer Vision
, vol.3
, pp. 592-603
-
-
Fisher, J.1
Darrell, T.2
-
9
-
-
84924134587
-
The NIST meeting room pilot corpus
-
presented at the, Language Resources and Evaluation
-
J. S. Garofolo, C. D. Laprun, M. Michel, V. M. Stanford, and E. Tabassi, "The NIST meeting room pilot corpus," presented at the Int. Conf. Language Resources and Evaluation, 2004.
-
(2004)
Int. Conf
-
-
Garofolo, J.S.1
Laprun, C.D.2
Michel, M.3
Stanford, V.M.4
Tabassi, E.5
-
10
-
-
84899028297
-
Audio vision: Using audiovisual synchrony to locate sounds
-
J. Hershey and J. Movellan, "Audio vision: Using audiovisual synchrony to locate sounds," Adv. Neural Inf. Process. Syst., pp. 813-819, 1999.
-
(1999)
Adv. Neural Inf. Process. Syst
, pp. 813-819
-
-
Hershey, J.1
Movellan, J.2
-
11
-
-
0037774471
-
Audio-visual localization of multiple speakers in a video teleconferencing setting
-
B. Kapralos, M. Jenkin, and E. Milios, "Audio-visual localization of multiple speakers in a video teleconferencing setting," Int. J. Imag. Syst. Technol., vol. 13, pp. 95-105, 2003.
-
(2003)
Int. J. Imag. Syst. Technol
, vol.13
, pp. 95-105
-
-
Kapralos, B.1
Jenkin, M.2
Milios, E.3
-
13
-
-
33749427593
-
Analysis of multimodal sequences using geometric video representations
-
G. Monaci, O. D. Escoda, and P. Vandergheynst, "Analysis of multimodal sequences using geometric video representations," Signal Process., vol. 86, pp. 3534-3548, 2006.
-
(2006)
Signal Process
, vol.86
, pp. 3534-3548
-
-
Monaci, G.1
Escoda, O.D.2
Vandergheynst, P.3
-
14
-
-
35248827017
-
Speaker localization using audio-visual synchrony: An empirical study
-
H. J. Nock, G. lyengar, and C. Neti, "Speaker localization using audio-visual synchrony: An empirical study," in Proc. ACM Int. Conf. Multimedia, 2003, pp. 488-499.
-
(2003)
Proc. ACM Int. Conf. Multimedia
, pp. 488-499
-
-
Nock, H.J.1
lyengar, G.2
Neti, C.3
-
15
-
-
34548310397
-
Speaker diarization for multiple-distant-microphone meetings using several sources of information
-
Sep
-
J. Pardo, X. Anguera, and C. Wooters, "Speaker diarization for multiple-distant-microphone meetings using several sources of information," IEEE Trans. Comput., vol. 56, no. 9, pp. 1212-1224, Sep. 2007.
-
(2007)
IEEE Trans. Comput
, vol.56
, Issue.9
, pp. 1212-1224
-
-
Pardo, J.1
Anguera, X.2
Wooters, C.3
-
16
-
-
0036874756
-
Moving talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus
-
E. Patterson, S. Gurbuz, Z. Tufekci, and J. Gowdy, "Moving talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus," J. Appl. Signal Process., vol. 11, pp. 1189-1201, 2002.
-
(2002)
J. Appl. Signal Process
, vol.11
, pp. 1189-1201
-
-
Patterson, E.1
Gurbuz, S.2
Tufekci, Z.3
Gowdy, J.4
-
17
-
-
0032639979
-
Vision-based speaker detection using Bayesian networks
-
J. Regh, K. Murphy, and P. Fieguth, "Vision-based speaker detection using Bayesian networks," Comput. Vis. Pattern Recognit., pp. 110-116, 1999.
-
(1999)
Comput. Vis. Pattern Recognit
, pp. 110-116
-
-
Regh, J.1
Murphy, K.2
Fieguth, P.3
-
18
-
-
0033884858
-
Speaker verification using adapted Gaussian mixture models
-
D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Process., vol. 10, pp. 19-41, 2000.
-
(2000)
Digital Signal Process
, vol.10
, pp. 19-41
-
-
Reynolds, D.A.1
Quatieri, T.F.2
Dunn, R.B.3
-
19
-
-
34247646607
-
Combined gesture-speech correlation analysis and speech driven gesture synthesis
-
M. Sargin, O. Aran, A. Karpov, F. Ofli, Y. Yasinnik, S. Wilson, E. Erzin, Y. Yemez, and A. Tekalp, "Combined gesture-speech correlation analysis and speech driven gesture synthesis," in Proc. IEEE Int. Conf. Multimedia, 2006, pp. 893-896.
-
(2006)
Proc. IEEE Int. Conf. Multimedia
, pp. 893-896
-
-
Sargin, M.1
Aran, O.2
Karpov, A.3
Ofli, F.4
Yasinnik, Y.5
Wilson, S.6
Erzin, E.7
Yemez, Y.8
Tekalp, A.9
-
20
-
-
0034187513
-
Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata
-
May
-
S. Sarkar and P. Soundararajan, "Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 5, pp. 504-525, May 2000.
-
(2000)
IEEE Trans. Pattern Anal. Mach. Intell
, vol.22
, Issue.5
, pp. 504-525
-
-
Sarkar, S.1
Soundararajan, P.2
-
21
-
-
2642557514
-
Facesync: A linear operator for measuring synchronization of video facial images and audio tracks
-
M. Slaney and M. Covell, "Facesync: A linear operator for measuring synchronization of video facial images and audio tracks," Adv. Neural Inf. Process. Syst., vol. 14, pp. 814-820, 2000.
-
(2000)
Adv. Neural Inf. Process. Syst
, vol.14
, pp. 814-820
-
-
Slaney, M.1
Covell, M.2
-
22
-
-
35348847678
-
The CLEAR 2006 evaluation
-
R. Stiefelhagen, K. Bernardin, R. Bowers, J. Garofolo, D. Mostefa, and P. Soundararajan, "The CLEAR 2006 evaluation," Springer Lecture Notes Comput. Sci., no. 4122, pp. 1-44, 2006.
-
(2006)
Springer Lecture Notes Comput. Sci
, Issue.4122
, pp. 1-44
-
-
Stiefelhagen, R.1
Bernardin, K.2
Bowers, R.3
Garofolo, J.4
Mostefa, D.5
Soundararajan, P.6
-
23
-
-
34047261805
-
An overview of automatic speaker diarization systems
-
May
-
S. Tranter and D. Reynolds, "An overview of automatic speaker diarization systems," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, pp. 1557-1565, May 2006.
-
(2006)
IEEE Trans. Audio, Speech, Lang. Process
, vol.14
, Issue.5
, pp. 1557-1565
-
-
Tranter, S.1
Reynolds, D.2
-
24
-
-
34047223614
-
Audio segmentation and speaker localization in meeting videos
-
H. Vajaria, T. Islam, S. Sarkar, R. Sankar, and R. Kasturi, "Audio segmentation and speaker localization in meeting videos," in Proc. Int. Conf. Pattern Recognit., 2006, vol. 2, pp. 1150-1153.
-
(2006)
Proc. Int. Conf. Pattern Recognit
, vol.2
, pp. 1150-1153
-
-
Vajaria, H.1
Islam, T.2
Sarkar, S.3
Sankar, R.4
Kasturi, R.5
-
25
-
-
84960898014
-
Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures
-
L. Valbonesi, R. Ansari, D. McNeill, F. Quek, S. Duncan, K. E. McCullough, and R. Bryll, "Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures," in Proc. Eur. Signal Processing Conf., 2002, pp. 1330-1345.
-
(2002)
Proc. Eur. Signal Processing Conf
, pp. 1330-1345
-
-
Valbonesi, L.1
Ansari, R.2
McNeill, D.3
Quek, F.4
Duncan, S.5
McCullough, K.E.6
Bryll, R.7
-
26
-
-
34047268275
-
Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system
-
presented at the
-
C. Wooters, J. Fung, B. Peskin, and X. Anguera, "Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system," presented at the Rich Transcription Workshop, 2004.
-
(2004)
Rich Transcription Workshop
-
-
Wooters, C.1
Fung, J.2
Peskin, B.3
Anguera, X.4
|