메뉴 건너뛰기




Volumn 18, Issue 11, 2008, Pages 1608-1617

Exploring co-occurence between speech and body movement for audio-guided video localizagion

Author keywords

Audio visual association; Meeting analysis; Speaker diarization; Speaker localization

Indexed keywords

CLUSTER ANALYSIS; DIFFRACTIVE OPTICAL ELEMENTS; FEATURE EXTRACTION; FLOW OF SOLIDS; VISUAL COMMUNICATION;

EID: 55149110077     PISSN: 10518215     EISSN: None     Source Type: Journal    
DOI: 10.1109/TCSVT.2008.2005602     Document Type: Article
Times cited : (21)

References (26)
  • 2
    • 77951283289 scopus 로고    scopus 로고
    • Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs
    • M. Ben, M. Betser, F. Bimbot, and G. Gravier, "Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs," in Proc. Int. Conf. Spoken Language Processing, 2004, pp. 2329-2332.
    • (2004) Proc. Int. Conf. Spoken Language Processing , pp. 2329-2332
    • Ben, M.1    Betser, M.2    Bimbot, F.3    Gravier, G.4
  • 3
    • 33646779170 scopus 로고    scopus 로고
    • C. Busso, S. Hernanz, C.-W. Chu, S.-I. Kwon, S. Lee, P. Georgiou, 1. Cohen, and S. Narayanan, Smart room: Participant and speaker localization and identification, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 2005, 2, pp. 1117-1120.
    • C. Busso, S. Hernanz, C.-W. Chu, S.-I. Kwon, S. Lee, P. Georgiou, 1. Cohen, and S. Narayanan, "Smart room: Participant and speaker localization and identification," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 2005, vol. 2, pp. 1117-1120.
  • 4
    • 85009212151 scopus 로고    scopus 로고
    • A sequential metric-based audio segmentation using Bayesian information criterion
    • presented at the
    • S. S. Cheng and H. M. Wang, "A sequential metric-based audio segmentation using Bayesian information criterion," presented at the Eurospeech, 2003.
    • (2003) Eurospeech
    • Cheng, S.S.1    Wang, H.M.2
  • 5
    • 0034507915 scopus 로고    scopus 로고
    • Look who's talking: Speaker detection using video and audio correlation
    • R. Cutler and L. Davis, "Look who's talking: Speaker detection using video and audio correlation," in Proc. IEEE Int. Conf. Multimedia, 2000, pp. 1589-1592.
    • (2000) Proc. IEEE Int. Conf. Multimedia , pp. 1589-1592
    • Cutler, R.1    Davis, L.2
  • 8
    • 84949961905 scopus 로고    scopus 로고
    • Probabalistic models and informative subspaces for audiovisual correspondence
    • J. Fisher and T. Darrell, "Probabalistic models and informative subspaces for audiovisual correspondence," in Proc: European Conf. Computer Vision, 2002, vol. 3, pp. 592-603.
    • (2002) Proc: European Conf. Computer Vision , vol.3 , pp. 592-603
    • Fisher, J.1    Darrell, T.2
  • 9
    • 84924134587 scopus 로고    scopus 로고
    • The NIST meeting room pilot corpus
    • presented at the, Language Resources and Evaluation
    • J. S. Garofolo, C. D. Laprun, M. Michel, V. M. Stanford, and E. Tabassi, "The NIST meeting room pilot corpus," presented at the Int. Conf. Language Resources and Evaluation, 2004.
    • (2004) Int. Conf
    • Garofolo, J.S.1    Laprun, C.D.2    Michel, M.3    Stanford, V.M.4    Tabassi, E.5
  • 10
    • 84899028297 scopus 로고    scopus 로고
    • Audio vision: Using audiovisual synchrony to locate sounds
    • J. Hershey and J. Movellan, "Audio vision: Using audiovisual synchrony to locate sounds," Adv. Neural Inf. Process. Syst., pp. 813-819, 1999.
    • (1999) Adv. Neural Inf. Process. Syst , pp. 813-819
    • Hershey, J.1    Movellan, J.2
  • 11
    • 0037774471 scopus 로고    scopus 로고
    • Audio-visual localization of multiple speakers in a video teleconferencing setting
    • B. Kapralos, M. Jenkin, and E. Milios, "Audio-visual localization of multiple speakers in a video teleconferencing setting," Int. J. Imag. Syst. Technol., vol. 13, pp. 95-105, 2003.
    • (2003) Int. J. Imag. Syst. Technol , vol.13 , pp. 95-105
    • Kapralos, B.1    Jenkin, M.2    Milios, E.3
  • 13
    • 33749427593 scopus 로고    scopus 로고
    • Analysis of multimodal sequences using geometric video representations
    • G. Monaci, O. D. Escoda, and P. Vandergheynst, "Analysis of multimodal sequences using geometric video representations," Signal Process., vol. 86, pp. 3534-3548, 2006.
    • (2006) Signal Process , vol.86 , pp. 3534-3548
    • Monaci, G.1    Escoda, O.D.2    Vandergheynst, P.3
  • 14
    • 35248827017 scopus 로고    scopus 로고
    • Speaker localization using audio-visual synchrony: An empirical study
    • H. J. Nock, G. lyengar, and C. Neti, "Speaker localization using audio-visual synchrony: An empirical study," in Proc. ACM Int. Conf. Multimedia, 2003, pp. 488-499.
    • (2003) Proc. ACM Int. Conf. Multimedia , pp. 488-499
    • Nock, H.J.1    lyengar, G.2    Neti, C.3
  • 15
    • 34548310397 scopus 로고    scopus 로고
    • Speaker diarization for multiple-distant-microphone meetings using several sources of information
    • Sep
    • J. Pardo, X. Anguera, and C. Wooters, "Speaker diarization for multiple-distant-microphone meetings using several sources of information," IEEE Trans. Comput., vol. 56, no. 9, pp. 1212-1224, Sep. 2007.
    • (2007) IEEE Trans. Comput , vol.56 , Issue.9 , pp. 1212-1224
    • Pardo, J.1    Anguera, X.2    Wooters, C.3
  • 16
    • 0036874756 scopus 로고    scopus 로고
    • Moving talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus
    • E. Patterson, S. Gurbuz, Z. Tufekci, and J. Gowdy, "Moving talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus," J. Appl. Signal Process., vol. 11, pp. 1189-1201, 2002.
    • (2002) J. Appl. Signal Process , vol.11 , pp. 1189-1201
    • Patterson, E.1    Gurbuz, S.2    Tufekci, Z.3    Gowdy, J.4
  • 17
  • 18
    • 0033884858 scopus 로고    scopus 로고
    • Speaker verification using adapted Gaussian mixture models
    • D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Process., vol. 10, pp. 19-41, 2000.
    • (2000) Digital Signal Process , vol.10 , pp. 19-41
    • Reynolds, D.A.1    Quatieri, T.F.2    Dunn, R.B.3
  • 20
    • 0034187513 scopus 로고    scopus 로고
    • Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata
    • May
    • S. Sarkar and P. Soundararajan, "Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 5, pp. 504-525, May 2000.
    • (2000) IEEE Trans. Pattern Anal. Mach. Intell , vol.22 , Issue.5 , pp. 504-525
    • Sarkar, S.1    Soundararajan, P.2
  • 21
    • 2642557514 scopus 로고    scopus 로고
    • Facesync: A linear operator for measuring synchronization of video facial images and audio tracks
    • M. Slaney and M. Covell, "Facesync: A linear operator for measuring synchronization of video facial images and audio tracks," Adv. Neural Inf. Process. Syst., vol. 14, pp. 814-820, 2000.
    • (2000) Adv. Neural Inf. Process. Syst , vol.14 , pp. 814-820
    • Slaney, M.1    Covell, M.2
  • 23
    • 34047261805 scopus 로고    scopus 로고
    • An overview of automatic speaker diarization systems
    • May
    • S. Tranter and D. Reynolds, "An overview of automatic speaker diarization systems," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, pp. 1557-1565, May 2006.
    • (2006) IEEE Trans. Audio, Speech, Lang. Process , vol.14 , Issue.5 , pp. 1557-1565
    • Tranter, S.1    Reynolds, D.2
  • 26
    • 34047268275 scopus 로고    scopus 로고
    • Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system
    • presented at the
    • C. Wooters, J. Fung, B. Peskin, and X. Anguera, "Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system," presented at the Rich Transcription Workshop, 2004.
    • (2004) Rich Transcription Workshop
    • Wooters, C.1    Fung, J.2    Peskin, B.3    Anguera, X.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.