메뉴 건너뛰기




Volumn 68, Issue 3, 2014, Pages 747-775

Audiovisual diarization of people in video content

Author keywords

Audiovisual fusion; People diarization; Segmentation; Unsupervised clustering; Video indexing

Indexed keywords


EID: 84895063162     PISSN: 13807501     EISSN: 15737721     Source Type: Journal    
DOI: 10.1007/s11042-012-1080-6     Document Type: Article
Times cited : (30)

References (68)
  • 6
    • 77956606383 scopus 로고    scopus 로고
    • Exploiting speaker segmentations for automatic role detection. An application to broadcast news documents
    • Bigot B, Ferrané I, Pinquier J (2010) Exploiting speaker segmentations for automatic role detection. An application to broadcast news documents. In: International workshop on content-based multimedia indexing
    • (2010) International Workshop on Content-based Multimedia Indexing
    • Bigot B, F.1
  • 17
    • 79955869423 scopus 로고    scopus 로고
    • Appearance-based person re-identification in camera networks: Problem overview and current approaches
    • 10.1007/s12652-010-0034-y
    • Doretto G, Sebastian T, Tu P, Rittscher J (2011) Appearance-based person re-identification in camera networks: Problem overview and current approaches. Journal of Ambient Intelligence and Humanized Computing 2(2):127-151
    • (2011) Journal of Ambient Intelligence and Humanized Computing , vol.2 , Issue.2 , pp. 127-151
    • Doretto, G.1    Sebastian, T.2    Tu, P.3    Rittscher, J.4
  • 19
    • 62949172236 scopus 로고    scopus 로고
    • Taking the bite out of automated naming of characters in TV video
    • 10.1016/j.imavis.2008.04.018
    • Everingham M, Sivic J, Zisserman A (2009) Taking the bite out of automated naming of characters in TV video. Image Vision Comput 27(5):545-559
    • (2009) Image Vision Comput , vol.27 , Issue.5 , pp. 545-559
    • Everingham, M.1    Sivic, J.2    Zisserman, A.3
  • 23
    • 78649623318 scopus 로고    scopus 로고
    • Dialocalisation: Acoustic speaker diarization and visual localization as joint optimization problem
    • Friedland G, Yeo C, Hung H (2010) Dialocalisation: acoustic speaker diarization and visual localization as joint optimization problem. ACM Trans Multimedia Comput Commun Appl, TOMCCAP 6(4):27
    • (2010) ACM Trans Multimedia Comput Commun Appl, TOMCCAP , vol.6 , Issue.4 , pp. 27
    • Friedland, G.1    Yeo, C.2    Hung, H.3
  • 25
    • 70450180496 scopus 로고    scopus 로고
    • The ester 2 evaluation campaign for the rich transcription of French radio broadcasts
    • Galliano S, Gravier G, Chaubard L (2009) The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In:TERSPEECH
    • (2009) Interspeech
    • Galliano, S.1    Gravier, G.2    Chaubard, L.3
  • 27
    • 77953178820 scopus 로고    scopus 로고
    • Is that you? Metric learning approaches for face identification
    • Guillaumin M, Verbeek J, Schmid C (2009) Is that you? Metric learning approaches for face identification. ICCV
    • (2009) ICCV
    • Guillaumin, M.1    Verbeek, J.2    Schmid, C.3
  • 30
    • 0034866703 scopus 로고    scopus 로고
    • Human tracking with mixtures of trees
    • Ioffe S, Forsyth DA (2001) Human tracking with mixtures of trees. ICCV01
    • (2001) ICCV01
    • Ioffe, S.1    Forsyth, D.A.2
  • 31
    • 33646149888 scopus 로고    scopus 로고
    • Costume: A new feature for automatic video content indexing
    • Jaffré G, Joly P (2004) Costume: a new feature for automatic video content indexing. RIAO
    • (2004) RIAO
    • Jaffré G, J.1
  • 37
    • 17444395970 scopus 로고    scopus 로고
    • Tracking multiple people with recovery from partial and total occlusion
    • 10.1016/j.patcog.2004.11.022
    • Lerdsudwichai C, Abdel-MottalebM, Ansari AN (2005) Tracking multiple people with recovery from partial and total occlusion. Pattern Recogn 38(7):1059-1070
    • (2005) Pattern Recogn , vol.38 , Issue.7 , pp. 1059-1070
    • Lerdsudwichai, C.1    Abdel-Mottaleb, M.2    Ansari, A.N.3
  • 40
    • 33846216333 scopus 로고    scopus 로고
    • Major cast detection in video using both speaker and face information
    • 10.1109/TMM.2006.886360
    • Liu Z, Wang Y (2007) Major cast detection in video using both speaker and face information. IEEE Transactions on Multimedia 9(1):89-101
    • (2007) IEEE Transactions on Multimedia , vol.9 , Issue.1 , pp. 89-101
    • Liu, Z.1    Wang, Y.2
  • 41
    • 3042535216 scopus 로고    scopus 로고
    • Distinctive image features from scale-invariant keypoints
    • 10.1023/B:VISI.0000029664.99615.94
    • Lowe DG (2004) Distinctive image features from scale-invariant keypoints. In:t J Comput Vision 60(2):91-110
    • (2004) Int J Comput Vision , vol.60 , Issue.2 , pp. 91-110
    • Lowe, D.G.1
  • 42
    • 0030213052 scopus 로고    scopus 로고
    • Texture features for browsing and retrieval of image data
    • 10.1109/34.531803
    • Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837-842
    • (1996) IEEE Trans Pattern Anal Mach Intell , vol.18 , Issue.8 , pp. 837-842
    • Manjunath, B.S.1    Ma, W.Y.2
  • 45
    • 52049099192 scopus 로고    scopus 로고
    • Automatic classification video for person indexing
    • IEEE Computer Society Washington, DC, USA 10.1109/CISP.2008.405 978-0-7695-3119-9
    • Peng J, Lin QX (2008) Automatic classification video for person indexing. In: Proceedings of the 2008 congress on image and signal processing, CISP '08, vol 2. IEEE Computer Society, Washington, DC, USA, pp 475-479. ISBN 978-0-7695-3119-9
    • (2008) Proceedings of the 2008 Congress on Image and Signal Processing, CISP '08, Vol 2 , pp. 475-479
    • Peng, J.1    Lin, Q.X.2
  • 48
    • 0006354851 scopus 로고
    • Karl Pearson and the chi-squared test
    • 10.2307/1402731 0501.62001 703306
    • Plackett RL (1983) Karl Pearson and the chi-squared test. In:t Stat Rev 51(1):59-72
    • (1983) Int Stat Rev , vol.51 , Issue.1 , pp. 59-72
    • Plackett, R.L.1
  • 52
    • 77956739906 scopus 로고    scopus 로고
    • Online Diarization of Streaming Audio-Visual Data for Smart Environments
    • 10.1109/JSTSP.2010.2050519
    • Schmalenstroeer J, Haeb-Umbach R (2010) Online Diarization of Streaming Audio-Visual Data for Smart Environments. J Sel Topics Signal Processing 4(5):845-856
    • (2010) J Sel Topics Signal Processing , vol.4 , Issue.5 , pp. 845-856
    • Schmalenstroeer, J.1    Haeb-Umbach, R.2
  • 55
    • 77249161746 scopus 로고    scopus 로고
    • Video shot boundary detection: Seven years of trecvid activity
    • 10.1016/j.cviu.2009.03.011
    • Smeaton AF, Over P, Doherty AR (2010) Video shot boundary detection: seven years of trecvid activity. Comput Vis Image Und 114(4):411-418
    • (2010) Comput Vis Image und , vol.114 , Issue.4 , pp. 411-418
    • Smeaton, A.F.1    Over, P.2    Doherty, A.R.3
  • 56
    • 47749150338 scopus 로고    scopus 로고
    • Multimodal technologies for perception of humans: International evaluation workshops CLEAR 2007 and RT 2007
    • Springer
    • Stiefelhagen R, Bowers R, Fiscus J (2008) Multimodal technologies for perception of humans: international evaluation workshops CLEAR 2007 and RT 2007. ser. Lecture Notes in Computer Science. Springer
    • (2008) Ser. Lecture Notes in Computer Science
    • Stiefelhagen, R.1    Bowers, R.2    Fiscus, J.3
  • 57
    • 51849166611 scopus 로고    scopus 로고
    • Pose robust face tracking by combining active appearance models and cylinder head models
    • 10.1007/s11263-007-0125-1
    • Sung JW, Kanade T, Kim DJ (2008) Pose robust face tracking by combining active appearance models and cylinder head models. In:t J Comput Vis 80(2):260-274
    • (2008) Int J Comput Vis , vol.80 , Issue.2 , pp. 260-274
    • Sung, J.W.1    Kanade, T.2    Kim, D.J.3
  • 58
    • 1542572925 scopus 로고    scopus 로고
    • Multi-modal speech recognition using optical-flow analysis for lip images
    • Tamura S, Iwano K, Furui S (2004) Multi-modal speech recognition using optical-flow analysis for lip images. J VLSI Signal Process Syst 36(2/3):117-124
    • (2004) J VLSI Signal Process Syst , vol.36 , Issue.23 , pp. 117-124
    • Tamura, S.1    Iwano, K.2    Furui, S.3
  • 59
    • 0027609968 scopus 로고
    • Analysis and synthesis of facial image sequences using physical and anatomical models
    • 10.1109/34.216726
    • Terzopoulos D, Waters K (1993) Analysis and synthesis of facial image sequences using physical and anatomical models. IEEE Trans Pattern Anal Mach Intell 15:569-579
    • (1993) IEEE Trans Pattern Anal Mach Intell , vol.15 , pp. 569-579
    • Terzopoulos, D.1    Waters, K.2
  • 65
    • 2142812371 scopus 로고    scopus 로고
    • Robust real-time face detection
    • 10.1023/B:VISI.0000013087.49260.fb
    • Viola P, Jones MJ (2004) Robust real-time face detection. In:t J Comput Vis 57(2):137-154
    • (2004) Int J Comput Vis , vol.57 , Issue.2 , pp. 137-154
    • Viola, P.1    Jones, M.J.2
  • 67
    • 22544475615 scopus 로고    scopus 로고
    • Efficient audio stream segmentation via the combined T2 statistic and the bayesian information criterion
    • 10.1109/TSA.2005.845790
    • Zhou B, Hansen JHL (2005) Efficient audio stream segmentation via the combined T2 statistic and the bayesian information criterion. IEEE Trans Speech Audio Processing 13(4):467-474
    • (2005) IEEE Trans Speech Audio Processing , vol.13 , Issue.4 , pp. 467-474
    • Zhou, B.1    Hansen, J.H.L.2


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.