SCOPUS 정보 검색 플랫폼

IEEE Transactions on Circuits and Systems for Video Technology

Volumn 18, Issue 11, 2008, Pages 1608-1617

Exploring co-occurence between speech and body movement for audio-guided video localizagion

(3) Vajaria, Himanshu a Sarkar, Sudeep a Kasturi, Rangachar a

a UNIVERSITY OF SOUTH FLORIDA (United States)

Author keywords

Audio visual association; Meeting analysis; Speaker diarization; Speaker localization

Indexed keywords

CLUSTER ANALYSIS; DIFFRACTIVE OPTICAL ELEMENTS; FEATURE EXTRACTION; FLOW OF SOLIDS; VISUAL COMMUNICATION;

ANALYSIS METHODS; AUDIO-VISUAL ASSOCIATION; BLOB MODELS; BODY MOVEMENTS; BOTTOM-UP; ITERATIVE CLUSTERING; LONG TERMS; MEETING ANALYSIS; MIXTURE OF GAUSSIANS; MOTION INFORMATIONS; MUTUAL INFORMATIONS; NONFRONTAL VIEWS; REAL DATUMS; SOURCE LOCALIZATIONS; SPATIAL CLUSTERING; SPEAKER DIARIZATION; SPEAKER LOCALIZATION; STATIONARY CAMERAS; TEMPORAL CLUSTERING; VIDEO FRAMES; VISUAL INFORMATIONS;

SPEECH;

EID: 55149110077 PISSN: 10518215 EISSN: None Source Type: Journal
DOI: 10.1109/TCSVT.2008.2005602 Document Type: Article

Times cited : (21)

References (26)

1
- 34948829598
- Harmony in motion
- Z. Barzelay and Y. Schechner, "Harmony in motion," Comput. Vis. Pattern Recognit., pp. 1-8, 2007.
- (2007) Comput. Vis. Pattern Recognit , pp. 1-8
- Barzelay, Z.¹ Schechner, Y.²

2
- 77951283289
- Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs
- M. Ben, M. Betser, F. Bimbot, and G. Gravier, "Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs," in Proc. Int. Conf. Spoken Language Processing, 2004, pp. 2329-2332.
- (2004) Proc. Int. Conf. Spoken Language Processing , pp. 2329-2332
- Ben, M.¹ Betser, M.² Bimbot, F.³ Gravier, G.⁴

3
- 33646779170
- C. Busso, S. Hernanz, C.-W. Chu, S.-I. Kwon, S. Lee, P. Georgiou, 1. Cohen, and S. Narayanan, Smart room: Participant and speaker localization and identification, in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 2005, 2, pp. 1117-1120.
- C. Busso, S. Hernanz, C.-W. Chu, S.-I. Kwon, S. Lee, P. Georgiou, 1. Cohen, and S. Narayanan, "Smart room: Participant and speaker localization and identification," in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing, 2005, vol. 2, pp. 1117-1120.

4
- 85009212151
- A sequential metric-based audio segmentation using Bayesian information criterion
- presented at the
- S. S. Cheng and H. M. Wang, "A sequential metric-based audio segmentation using Bayesian information criterion," presented at the Eurospeech, 2003.
- (2003) Eurospeech
- Cheng, S.S.¹ Wang, H.M.²

5
- 0034507915
- Look who's talking: Speaker detection using video and audio correlation
- R. Cutler and L. Davis, "Look who's talking: Speaker detection using video and audio correlation," in Proc. IEEE Int. Conf. Multimedia, 2000, pp. 1589-1592.
- (2000) Proc. IEEE Int. Conf. Multimedia , pp. 1589-1592
- Cutler, R.¹ Davis, L.²

6
- 0038715064
- Distributed meetings: A meeting capture and broadcast system
- R. Cutler, Y. Rui, A. Gupta, J. Cadiz, I. Tashev, L. He, A. Colburn, Z. Zhang, Z. Liu, and S. Silverberg, "Distributed meetings: A meeting capture and broadcast system," ACM Multimedia, pp. 503-512, 2002.
- (2002) ACM Multimedia , pp. 503-512
- Cutler, R.¹ Rui, Y.² Gupta, A.³ Cadiz, J.⁴ Tashev, I.⁵ He, L.⁶ Colburn, A.⁷ Zhang, Z.⁸ Liu, Z.⁹ Silverberg, S.¹⁰

7
- 42549103165
- Online, Available
- J. Fiscus, N. Radde, J. Garofolo, A. Le, J. Ajot, and C. Laprun, Rich Transcription 2005 Spring Meeting Recognition Evaluation 2007 [Online]. Available: www.nist.gov/speech/publications
- (2007) Rich Transcription 2005 Spring Meeting Recognition Evaluation
- Fiscus, J.¹ Radde, N.² Garofolo, J.³ Le, A.⁴ Ajot, J.⁵ Laprun, C.⁶

8
- 84949961905
- Probabalistic models and informative subspaces for audiovisual correspondence
- J. Fisher and T. Darrell, "Probabalistic models and informative subspaces for audiovisual correspondence," in Proc: European Conf. Computer Vision, 2002, vol. 3, pp. 592-603.
- (2002) Proc: European Conf. Computer Vision , vol.3 , pp. 592-603
- Fisher, J.¹ Darrell, T.²

9
- 84924134587
- The NIST meeting room pilot corpus
- presented at the, Language Resources and Evaluation
- J. S. Garofolo, C. D. Laprun, M. Michel, V. M. Stanford, and E. Tabassi, "The NIST meeting room pilot corpus," presented at the Int. Conf. Language Resources and Evaluation, 2004.
- (2004) Int. Conf
- Garofolo, J.S.¹ Laprun, C.D.² Michel, M.³ Stanford, V.M.⁴ Tabassi, E.⁵

10
- 84899028297
- Audio vision: Using audiovisual synchrony to locate sounds
- J. Hershey and J. Movellan, "Audio vision: Using audiovisual synchrony to locate sounds," Adv. Neural Inf. Process. Syst., pp. 813-819, 1999.
- (1999) Adv. Neural Inf. Process. Syst , pp. 813-819
- Hershey, J.¹ Movellan, J.²

11
- 0037774471
- Audio-visual localization of multiple speakers in a video teleconferencing setting
- B. Kapralos, M. Jenkin, and E. Milios, "Audio-visual localization of multiple speakers in a video teleconferencing setting," Int. J. Imag. Syst. Technol., vol. 13, pp. 95-105, 2003.
- (2003) Int. J. Imag. Syst. Technol , vol.13 , pp. 95-105
- Kapralos, B.¹ Jenkin, M.² Milios, E.³

12
- 24644451644
- Pixels that sound
- E. Kidron and Y. Schechner, "Pixels that sound," Comput. Vis. Pattern Recognit., pp. 88-95, 2005.
- (2005) Comput. Vis. Pattern Recognit , pp. 88-95
- Kidron, E.¹ Schechner, Y.²

13
- 33749427593
- Analysis of multimodal sequences using geometric video representations
- G. Monaci, O. D. Escoda, and P. Vandergheynst, "Analysis of multimodal sequences using geometric video representations," Signal Process., vol. 86, pp. 3534-3548, 2006.
- (2006) Signal Process , vol.86 , pp. 3534-3548
- Monaci, G.¹ Escoda, O.D.² Vandergheynst, P.³

14
- 35248827017
- Speaker localization using audio-visual synchrony: An empirical study
- H. J. Nock, G. lyengar, and C. Neti, "Speaker localization using audio-visual synchrony: An empirical study," in Proc. ACM Int. Conf. Multimedia, 2003, pp. 488-499.
- (2003) Proc. ACM Int. Conf. Multimedia , pp. 488-499
- Nock, H.J.¹ lyengar, G.² Neti, C.³

15
- 34548310397
- Speaker diarization for multiple-distant-microphone meetings using several sources of information
- Sep
- J. Pardo, X. Anguera, and C. Wooters, "Speaker diarization for multiple-distant-microphone meetings using several sources of information," IEEE Trans. Comput., vol. 56, no. 9, pp. 1212-1224, Sep. 2007.
- (2007) IEEE Trans. Comput , vol.56 , Issue.9 , pp. 1212-1224
- Pardo, J.¹ Anguera, X.² Wooters, C.³

16
- 0036874756
- Moving talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus
- E. Patterson, S. Gurbuz, Z. Tufekci, and J. Gowdy, "Moving talker, speaker-independent feature study and baseline results using the CUAVE multimodal speech corpus," J. Appl. Signal Process., vol. 11, pp. 1189-1201, 2002.
- (2002) J. Appl. Signal Process , vol.11 , pp. 1189-1201
- Patterson, E.¹ Gurbuz, S.² Tufekci, Z.³ Gowdy, J.⁴

17
- 0032639979
- Vision-based speaker detection using Bayesian networks
- J. Regh, K. Murphy, and P. Fieguth, "Vision-based speaker detection using Bayesian networks," Comput. Vis. Pattern Recognit., pp. 110-116, 1999.
- (1999) Comput. Vis. Pattern Recognit , pp. 110-116
- Regh, J.¹ Murphy, K.² Fieguth, P.³

18
- 0033884858
- Speaker verification using adapted Gaussian mixture models
- D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, "Speaker verification using adapted Gaussian mixture models," Digital Signal Process., vol. 10, pp. 19-41, 2000.
- (2000) Digital Signal Process , vol.10 , pp. 19-41
- Reynolds, D.A.¹ Quatieri, T.F.² Dunn, R.B.³

19
- 34247646607
- Combined gesture-speech correlation analysis and speech driven gesture synthesis
- M. Sargin, O. Aran, A. Karpov, F. Ofli, Y. Yasinnik, S. Wilson, E. Erzin, Y. Yemez, and A. Tekalp, "Combined gesture-speech correlation analysis and speech driven gesture synthesis," in Proc. IEEE Int. Conf. Multimedia, 2006, pp. 893-896.
- (2006) Proc. IEEE Int. Conf. Multimedia , pp. 893-896
- Sargin, M.¹ Aran, O.² Karpov, A.³ Ofli, F.⁴ Yasinnik, Y.⁵ Wilson, S.⁶ Erzin, E.⁷ Yemez, Y.⁸ Tekalp, A.⁹

20
- 0034187513
- Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata
- May
- S. Sarkar and P. Soundararajan, "Supervised learning of large perceptual organization: Graph spectral partitioning and learning automata," IEEE Trans. Pattern Anal. Mach. Intell., vol. 22, no. 5, pp. 504-525, May 2000.
- (2000) IEEE Trans. Pattern Anal. Mach. Intell , vol.22 , Issue.5 , pp. 504-525
- Sarkar, S.¹ Soundararajan, P.²

21
- 2642557514
- Facesync: A linear operator for measuring synchronization of video facial images and audio tracks
- M. Slaney and M. Covell, "Facesync: A linear operator for measuring synchronization of video facial images and audio tracks," Adv. Neural Inf. Process. Syst., vol. 14, pp. 814-820, 2000.
- (2000) Adv. Neural Inf. Process. Syst , vol.14 , pp. 814-820
- Slaney, M.¹ Covell, M.²

22
- 35348847678
- The CLEAR 2006 evaluation
- R. Stiefelhagen, K. Bernardin, R. Bowers, J. Garofolo, D. Mostefa, and P. Soundararajan, "The CLEAR 2006 evaluation," Springer Lecture Notes Comput. Sci., no. 4122, pp. 1-44, 2006.
- (2006) Springer Lecture Notes Comput. Sci , Issue.4122 , pp. 1-44
- Stiefelhagen, R.¹ Bernardin, K.² Bowers, R.³ Garofolo, J.⁴ Mostefa, D.⁵ Soundararajan, P.⁶

23
- 34047261805
- An overview of automatic speaker diarization systems
- May
- S. Tranter and D. Reynolds, "An overview of automatic speaker diarization systems," IEEE Trans. Audio, Speech, Lang. Process., vol. 14, no. 5, pp. 1557-1565, May 2006.
- (2006) IEEE Trans. Audio, Speech, Lang. Process , vol.14 , Issue.5 , pp. 1557-1565
- Tranter, S.¹ Reynolds, D.²

24
- 34047223614
- Audio segmentation and speaker localization in meeting videos
- H. Vajaria, T. Islam, S. Sarkar, R. Sankar, and R. Kasturi, "Audio segmentation and speaker localization in meeting videos," in Proc. Int. Conf. Pattern Recognit., 2006, vol. 2, pp. 1150-1153.
- (2006) Proc. Int. Conf. Pattern Recognit , vol.2 , pp. 1150-1153
- Vajaria, H.¹ Islam, T.² Sarkar, S.³ Sankar, R.⁴ Kasturi, R.⁵

25
- 84960898014
- Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures
- L. Valbonesi, R. Ansari, D. McNeill, F. Quek, S. Duncan, K. E. McCullough, and R. Bryll, "Multimodal signal analysis of prosody and hand motion: Temporal correlation of speech and gestures," in Proc. Eur. Signal Processing Conf., 2002, pp. 1330-1345.
- (2002) Proc. Eur. Signal Processing Conf , pp. 1330-1345
- Valbonesi, L.¹ Ansari, R.² McNeill, D.³ Quek, F.⁴ Duncan, S.⁵ McCullough, K.E.⁶ Bryll, R.⁷

26
- 34047268275
- Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system
- presented at the
- C. Wooters, J. Fung, B. Peskin, and X. Anguera, "Towards robust speaker segmentation: The ICSI-SRI fall 2004 diarization system," presented at the Rich Transcription Workshop, 2004.
- (2004) Rich Transcription Workshop
- Wooters, C.¹ Fung, J.² Peskin, B.³ Anguera, X.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.