SCOPUS 정보 검색 플랫폼

International Journal of Multimedia Information Retrieval

Volumn 2, Issue 2, 2013, Pages 73-101

High-level event recognition in unconstrained videos

(4) Jiang, Yu Gang a Bhattacharya, Subhabrata b Chang, Shih Fu c Shah, Mubarak b

a FUDAN UNIVERSITY (China)

b UNIVERSITY OF CENTRAL FLORIDA (United States)

c Columbia University ^* (United States)

Author keywords

Fusion; Multimedia event detection; Multimodal features; Recognition; Unconstrained videos; Video events

Indexed keywords

EID: 84986185450 PISSN: 21926611 EISSN: 2192662X Source Type: Journal
DOI: 10.1007/s13735-012-0024-2 Document Type: Article

Times cited : (149)

References (174)

1
- 79955649703
- Human activity analysis: a review
- Aggarwal JK, Ryoo MS (2011) Human activity analysis: a review. ACM Comput Surv 43(3):1–16
- (2011) ACM Comput Surv , vol.43 , Issue.3 , pp. 1-16
- Aggarwal, J.K.¹ Ryoo, M.S.²

2
- 73849126715
- Human action recognition in videos using kinematic features and multiple instance learning
- Ali S, Shah M (2010) Human action recognition in videos using kinematic features and multiple instance learning. IEEE Trans Pattern Anal Mach Intell 32(2):288–303
- (2010) IEEE Trans Pattern Anal Mach Intell , vol.32 , Issue.2 , pp. 288-303
- Ali, S.¹ Shah, M.²

3
- 0020849266
- Maintaining knowledge about temporal intervals
- Allen JF (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843
- (1983) Commun ACM , vol.26 , Issue.11 , pp. 832-843
- Allen, J.F.¹

4
- 0022115986
- Kinematic features of unrestrained vertical arm movements
- Atkeson CG, Hollerbach JM (1985) Kinematic features of unrestrained vertical arm movements. J Neurosci 5(9):2318–2330
- (1985) J Neurosci , vol.5 , Issue.9 , pp. 2318-2330
- Atkeson, C.G.¹ Hollerbach, J.M.²

5
- 34547645414
- The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music
- Aucouturier JJ, Defreville B, Pachet F (2007) The bag-of-frames approach to audio pattern recognition: a sufficient model for urban soundscapes but not for polyphonic music. J Acoust Soc Am 122(2):881–891
- (2007) J Acoust Soc Am , vol.122 , Issue.2 , pp. 881-891
- Aucouturier, J.J.¹ Defreville, B.² Pachet, F.³

6
- 51949084160
- Utilizing semantic word similarity measures for video retrieval
- Providence, USA
- Aytar Y, Shah M, Luo J (2008) Utilizing semantic word similarity measures for video retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition, Providence, USA
- (2008) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Aytar, Y.¹ Shah, M.² Luo, J.³

7
- 84870398559
- Audio-based event detection for sports video
- Proceedings of international conference on image and video retrieval, Urbana-Champaign, IL
- Baillie M, Jose JM (2003) Audio-based event detection for sports video. In: Proceedings of international conference on image and video retrieval, Urbana-Champaign, IL
- (2003) In
- Baillie, M.¹ Jose, J.M.²

8
- 78651388935
- Event detection and recognition for semantic annotation of video
- Ballan L, Bertini M, Bimbo AD, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimedia Tools Appl 51(1):279–302
- (2011) Multimedia Tools Appl , vol.51 , Issue.1 , pp. 279-302
- Ballan, L.¹ Bertini, M.² Bimbo, A.D.³ Seidenari, L.⁴ Serra, G.⁵

9
- 34848878272
- Headline generation based on statistical translation
- Proceedings of the annual meeting of the association for computational linguistics, Hong Kong
- Banko M, Mittal VO, Witbrock, MJ (2000) Headline generation based on statistical translation. In: Proceedings of the annual meeting of the association for computational linguistics, Hong Kong
- (2000) In
- Banko, M.¹ Mittal, V.O.² Witbrock, M.J.³

10
- 84905241486
- In, Proceedings of NIST TRECVID, Workshop, Gaithersburg, MD, USA
- Bao L, Yu SI, Lan ZZ, Overwijk A, Jin Q, Langner B, Garbus M, Burger S, Metze F, Hauptmann A (2011) Informedia @ TRECVID 2011. In: Proceedings of NIST TRECVID, Workshop, Gaithersburg, MD, USA
- (2011) Hauptmann A (2011) Informedia @ TRECVID
- Bao, L.¹ Yu, S.I.² Lan, Z.Z.³ Overwijk, A.⁴ Jin, Q.⁵ Langner, B.⁶ Garbus, M.⁷ Burger, S.⁸ Metze, F.⁹

11
- 84881100367
- arXiv:1204.3616v1
- Barbu, A., Bridge, A., Coroian, D., Dickinson, S., Mussman, S., Narayanaswamy, S., Salvi, D., Schmidt, L., Shangguan, J., Siskind, J.M., Waggoner, J., Wang, S., Wei, J., Yin, Y., Zhang, Z.: Large-scale automatic labeling of video events with verbs based on event-participant interaction. In: arXiv:1204.3616v1 (2012)
- (2012) Large-scale automatic labeling of video events with verbs based on event-participant interaction
- Barbu, A.¹ Bridge, A.² Coroian, D.³ Dickinson, S.⁴ Mussman, S.⁵ Narayanaswamy, S.⁶ Salvi, D.⁷ Schmidt, L.⁸ Shangguan, J.⁹ Siskind, J.M.¹⁰ Waggoner, J.¹¹ Wang, S.¹² Wei, J.¹³ Yin, Y.¹⁴ Zhang, Z.¹⁵

12
- 43049174575
- SURF: speeded up robust features
- Bay H, Ess A, Tuytelaars T, van Gool L (2008) SURF: speeded up robust features. Comput Vision Image Underst 110(3):346–359
- (2008) Comput Vision Image Underst , vol.110 , Issue.3 , pp. 346-359
- Bay, H.¹ Ess, A.² Tuytelaars, T.³ van Gool, L.⁴

13
- 0042349407
- A graphical model for audiovisual object tracking
- Beal MJ, Jojic N, Attias H (2003) A graphical model for audiovisual object tracking. IEEE Trans Pattern Anal Mach Intell 25(7):828–836
- (2003) IEEE Trans Pattern Anal Mach Intell , vol.25 , Issue.7 , pp. 828-836
- Beal, M.J.¹ Jojic, N.² Attias, H.³

14
- 33745891801
- Actions as space-time shapes
- Blank M, Gorelick L, Shechtman E, Irani M, Basri R (2005) Actions as space-time shapes. In: Proceedings of International Conference on Computer Vision
- (2005) In: Proceedings of International Conference on Computer Vision
- Blank, M.¹ Gorelick, L.² Shechtman, E.³ Irani, M.⁴ Basri, R.⁵

15
- 0031590139
- Movement, activity, and action: the role of knowledge in the perception of motion
- Bobick AF (1997) Movement, activity, and action: the role of knowledge in the perception of motion. Philos Trans Royal Soc London 352:1257–1265
- (1997) Philos Trans Royal Soc London , vol.352 , pp. 1257-1265
- Bobick, A.F.¹

16
- 51949090223
- In defense of nearest-neighbor based image classification
- Boiman O, Shechtman E, Irani M (2008) In defense of nearest-neighbor based image classification. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
- (2008) In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition
- Boiman, O.¹ Shechtman, E.² Irani, M.³

17
- 43449110431
- Automatic video classification: a survey of the literature
- Brezeale D, Cook D (2008) Automatic video classification: a survey of the literature. IEEE Trans Syst Man Cybernet Part C 38(3):416–430
- (2008) IEEE Trans Syst Man Cybernet Part C , vol.38 , Issue.3 , pp. 416-430
- Brezeale, D.¹ Cook, D.²

18
- 79955857786
- Efficient structure learning of bayesian networks using constraints
- de Campos C, Ji Q (2011) Efficient structure learning of bayesian networks using constraints. J Mach Learn Res 12(3):663–689
- (2011) J Mach Learn Res , vol.12 , Issue.3 , pp. 663-689
- de Campos, C.¹ Ji, Q.²

19
- 77954608206
- MCG-WEBV: a benchmark dataset for web video analysis. Tech. rep
- Institute of Computing Technology, Chinese Academy of Sciences
- Cao J, Zhang YD, Song YC, Chen ZN, Zhang X, Li JT (2009) MCG-WEBV: a benchmark dataset for web video analysis. Tech. rep., ICT-MCG-09-001, Institute of Computing Technology, Chinese Academy of Sciences
- (2009) ICT-MCG-09-001
- Cao, J.¹ Zhang, Y.D.² Song, Y.C.³ Chen, Z.N.⁴ Zhang, X.⁵ Li, J.T.⁶

20
- 4944266418
- What is going on? a high level interpretation of sequences of images
- Springer-Verlag, London, UK
- Castel C, Chaudron L, Tessier C (1996) What is going on? a high level interpretation of sequences of images. In: Proceedings of European conference on computer vision, Springer-Verlag, London, UK
- (1996) In: Proceedings of European conference on computer vision
- Castel, C.¹ Chaudron, L.² Tessier, C.³

21
- 84905180243
- Columbia University/VIREO-CityU/IRIT TRECVID2008 high-level feature extraction and interactive video search
- Workshop, Gaithersburg
- Chang SF, He J, Jiang YG, El Khoury E, Ngo CW, Yanagawa A, Zavesky, E. (2008) Columbia University/VIREO-CityU/IRIT TRECVID2008 high-level feature extraction and interactive video search. In: Proceedings of NIST TRECVID, Workshop, Gaithersburg
- (2008) In: Proceedings of NIST TRECVID
- Chang, S.F.¹ He, J.² Jiang, Y.G.³ El Khoury, E.⁴ Ngo, C.W.⁵ Yanagawa, A.⁶ Zavesky, E.⁷

22
- 0029716457
- Integrated image and speech analysis for content-based video indexing
- Proceedings of IEEE international conference on multimedia computing and systems, Washington, DC
- Chang YL, Zeng W, Kamel I, Alonso R (1996) Integrated image and speech analysis for content-based video indexing. In: Proceedings of IEEE international conference on multimedia computing and systems, Washington, DC
- (1996) In
- Chang, Y.L.¹ Zeng, W.² Kamel, I.³ Alonso, R.⁴

23
- 84867129067
- Marginalized stacked denoising autoencoders for domain adaptation
- Chen M, Xu ZE, Weinberger KQ, Sha F (2012) Marginalized stacked denoising autoencoders for domain adaptation. In: Proceedings international conference on machine learning
- (2012) In: Proceedings international conference on machine learning
- Chen, M.¹ Xu, Z.E.² Weinberger, K.Q.³ Sha, F.⁴

24
- 84905251864
- Team SRI-Sarnoff’s AURORA System @ TRECVID 2011
- Proceedings of NIST TRECVID, Workshop
- Cheng H et al (2011) Team SRI-Sarnoff’s AURORA System @ TRECVID 2011. In: Proceedings of NIST TRECVID, Workshop
- (2011) In
- Cheng, H.¹

25
- 85019071942
- Learning to recognize complex actions using conditional random fields
- Connolly CI (2007) Learning to recognize complex actions using conditional random fields. In: Proceedings of International Conference on Advances in Visual Computing
- (2007) In: Proceedings of International Conference on Advances in Visual Computing
- Connolly, C.I.¹

26
- 80051610520
- Soundtrack classification by transient events
- Cotton CV, Ellis DPW, Loui AC (2011) Soundtrack classification by transient events. In: Proceedings of IEEE international conference acoustics, speech, signal processing, pp 473–476
- (2011) In: Proceedings of IEEE international conference acoustics, speech, signal processing , pp. 473-476
- Cotton, C.V.¹ Ellis, D.P.W.² Loui, A.C.³

27
- 33645146449
- Histogram of oriented gradients for human detection
- Dalal N, Triggs B (2005) Histogram of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2005) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Dalal, N.¹ Triggs, B.²

28
- 80052888136
- In, Proceedings of IEEE conference on computer vision and, pattern recognition
- Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of IEEE conference on computer vision and, pattern recognition
- (2009) Imagenet: a large-scale hierarchical image database
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.J.⁴ Li, K.⁵ Fei-Fei, L.⁶

29
- 33846622081
- Behavior recognition via sparse spatio-temporal features
- Dollar P, Rabaud V, Cottrell G, Belongie S (2005) Behavior recognition via sparse spatio-temporal features. In Proceedings of joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance
- (2005) In Proceedings of joint IEEE international workshop on visual surveillance and performance evaluation of tracking and surveillance
- Dollar, P.¹ Rabaud, V.² Cottrell, G.³ Belongie, S.⁴

30
- 85019131710
- Dorko G (2012) Interest point detectors local descriptors. http://lear.inrialpes.fr/people/dorko/downloads.html
- (2012) Interest point detectors local descriptors
- Dorko, G.¹

31
- 77956003629
- In, Proceedings of IEEE conference on computer vision and, pattern recognition
- Duan L, Xu D, Tsang IW, Luo J (2010) Visual event recognition in videos by learning from web data. In: Proceedings of IEEE conference on computer vision and, pattern recognition
- (2010) Visual event recognition in videos by learning from web data
- Duan, L.¹ Xu, D.² Tsang, I.W.³ Luo, J.⁴

32
- 85081863350
- Automatic annotation of human actions in video
- Duchenne O, Laptev I, Sivic J, Bach F, Ponce J (2009) Automatic annotation of human actions in video. In: Proceedings of IEEE international conference on computer vision
- (2009) In: Proceedings of IEEE international conference on computer vision
- Duchenne, O.¹ Laptev, I.² Sivic, J.³ Bach, F.⁴ Ponce, J.⁵

33
- 33744968612
- Audio-based context recognition
- Eronen A, Peltonen V, Tuomi J, Klapuri A, Fagerlund S, Sorsa T, Lorho G, Huopaniemi J (2006) Audio-based context recognition. IEEE Trans Audio Speech Lang Process 14(1):321–329
- (2006) IEEE Trans Audio Speech Lang Process , vol.14 , Issue.1 , pp. 321-329
- Eronen, A.¹ Peltonen, V.² Tuomi, J.³ Klapuri, A.⁴ Fagerlund, S.⁵ Sorsa, T.⁶ Lorho, G.⁷ Huopaniemi, J.⁸

34
- 84921069139
- results/index.shtml
- Everingham M, van Gool L, Williams CKI, Winn J, Zisserman A (2007) The PASCAL visual object classes challenge 2007 (VOC2007) Results. http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2007/ results/index.shtml
- (2007) The PASCAL visual object classes challenge 2007 (VOC2007) Results
- Everingham, M.¹ van Gool, L.² Williams, C.K.I.³ Winn, J.⁴ Zisserman, A.⁵

35
- 77955422240
- Object detection with discriminatively trained part based models
- Felzenszwalb P, Girshick R, McAllester D, Ramanan D (2010) Object detection with discriminatively trained part based models. IEEE Trans Pattern Anal Mach Intell 32(9):1530–1535
- (2010) IEEE Trans Pattern Anal Mach Intell , vol.32 , Issue.9 , pp. 1530-1535
- Felzenszwalb, P.¹ Girshick, R.² McAllester, D.³ Ramanan, D.⁴

36
- 80052878949
- How many words is a picture worth? automatic caption generation for news images
- Feng Y, Lapata M (2010) How many words is a picture worth? automatic caption generation for news images. In: Proceedings of the annual meeting of the association for computational linguistics
- (2010) Proceedings of the annual meeting of the association for computational linguistics
- Feng, Y.¹ Lapata, M.²

37
- 0002635287
- The case for case
- Universals in Linguistic Theory, New York
- Fillmore CJ (1968) The case for case. In: Bach E, Harms R (eds), Universals in Linguistic Theory, New York, pp 1–88
- (1968) Bach E , pp. 1-88
- Fillmore, C.J.¹ Harms, R.²

38
- 84905272448
- Fiscus J et al (2011) TRECVID multimedia event detection evaluation plan. http://www.nist.gov/itl/iad/mig/upload/MED11-EvalPlan-V03-20110801a.pdf
- (2011) TRECVID multimedia event detection evaluation plan
- Fiscus, J.¹

39
- 28344457205
- Verl: an ontology framework for representing and annotating video events
- Francois ARJ, Nevatia R, Hobbs J, Bolles RC (2005) Verl: an ontology framework for representing and annotating video events. IEEE Multimedia Magazine 12(4):76–86
- (2005) IEEE Multimedia Magazine , vol.12 , Issue.4 , pp. 76-86
- Francois, A.R.J.¹ Nevatia, R.² Hobbs, J.³ Bolles, R.C.⁴

40
- 25844482570
- A comparison of algorithms for inference and learning in probabilistic graphical models
- Frey BJ, Jojic N (2005) A comparison of algorithms for inference and learning in probabilistic graphical models. IEEE Trans Pattern Anal Mach Intell 27(9):1392–1416
- (2005) IEEE Trans Pattern Anal Mach Intell , vol.27 , Issue.9 , pp. 1392-1416
- Frey, B.J.¹ Jojic, N.²

41
- 77952671498
- Visual word ambiguity
- van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32(7):1271–1283
- (2010) IEEE Trans Pattern Anal Mach Intell , vol.32 , Issue.7 , pp. 1271-1283
- van Gemert, J.C.¹ Veenman, C.J.² Smeulders, A.W.M.³ Geusebroek, J.M.⁴

42
- 84932645954
- Representation and recognition of events in surveillance video using petri nets
- Ghanem N, DeMenthon D, Doermann D, Davis L (2004) Representation and recognition of events in surveillance video using petri nets. In: Proceedings of IEEE conference on computer vision and pattern recognition workshop
- (2004) In: Proceedings of IEEE conference on computer vision and pattern recognition workshop
- Ghanem, N.¹ DeMenthon, D.² Doermann, D.³ Davis, L.⁴

43
- 0000351727
- Investigating causal relations by econometric models and cross-spectral methods
- Granger C (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424–438
- (1969) Econometrica , vol.37 , Issue.3 , pp. 424-438
- Granger, C.¹

44
- 9444290414
- Casee: a hierarchical event representation for the analysis of videos
- Hakeem A, Sheikh Y, Shah M (2004) Casee: a hierarchical event representation for the analysis of videos. In: Proceedings of AAAI conference
- (2004) In: Proceedings of AAAI conference
- Hakeem, A.¹ Sheikh, Y.² Shah, M.³

45
- 0003946694
- The MIT Press, Cambridge
- Herbrich R (2001) Learning Kernel classifiers: theory and algorithms. The MIT Press, Cambridge
- (2001) Learning Kernel classifiers: theory and algorithms
- Herbrich, R.¹

46
- 77953194241
- Action detection in complex scenes with spatial and temporal ambiguities
- Hu Y, Cao L, Lv F, Yan S, Gong Y, Huang TS (2009) Action detection in complex scenes with spatial and temporal ambiguities. In: Proceedings of IEEE international conference on computer vision
- (2009) In: Proceedings of IEEE international conference on computer vision
- Hu, Y.¹ Cao, L.² Lv, F.³ Yan, S.⁴ Gong, Y.⁵ Huang, T.S.⁶

47
- 33746649771
- Semantic analysis of soccer video using dynamic bayesian network
- Huang CL, Shih HC, Chao CY (2006) Semantic analysis of soccer video using dynamic bayesian network. IEEE Trans Multimedia 8(4):749–760
- (2006) IEEE Trans Multimedia , vol.8 , Issue.4 , pp. 749-760
- Huang, C.L.¹ Shih, H.C.² Chao, C.Y.³

48
- 84905233993
- In, Proceedings of NIST TRECVID Workshop
- Inoue N, Kamishima Y, Wada T, Shinoda K, Sato S (2011) TokyoTech+Canon at TRECVID 2011. In: Proceedings of NIST TRECVID Workshop
- (2011) TokyoTech+Canon at TRECVID 2011
- Inoue, N.¹ Kamishima, Y.² Wada, T.³ Shinoda, K.⁴ Sato, S.⁵

49
- 0035270390
- Recognizing planned, multiperson action
- Intille SS, Bobick AF (2001) Recognizing planned, multiperson action. Comput Vision Image Underst 81(3):414–445
- (2001) Comput Vision Image Underst , vol.81 , Issue.3 , pp. 414-445
- Intille, S.S.¹ Bobick, A.F.²

50
- 0034245366
- Recognition of visual activities and interactions by stochastic parsing
- Ivanov YA, Bobick AF (2000) Recognition of visual activities and interactions by stochastic parsing. IEEE Trans Pattern Anal Mach Intell 22(8):852–872
- (2000) IEEE Trans Pattern Anal Mach Intell , vol.22 , Issue.8 , pp. 852-872
- Ivanov, Y.A.¹ Bobick, A.F.²

51
- 72549099611
- Short-term audio-visual atoms for generic video concept classification
- Jiang W, Cotton C, Chang SF, Ellis D, Loui AC (2009) Short-term audio-visual atoms for generic video concept classification. In: Proceedings of ACM international conference on multimedia
- (2009) In: Proceedings of ACM international conference on multimedia
- Jiang, W.¹ Cotton, C.² Chang, S.F.³ Ellis, D.⁴ Loui, A.C.⁵

52
- 84455170074
- Audio-visual grouplet: Temporal audio-visual interactions for general video concept classification
- Jiang W, Loui AC (2011) Audio-visual grouplet: Temporal audio-visual interactions for general video concept classification. In: Proceedings of ACM international conference on multimedia
- (2011) In: Proceedings of ACM international conference on multimedia
- Jiang, W.¹ Loui, A.C.²

53
- 84864116485
- SUPER: Towards real-time event recognition in Internet videos
- Jiang YG (2012) SUPER: Towards real-time event recognition in Internet videos. In: Proceedings of ACM international conference on multimedia retrieval
- (2012) In: Proceedings of ACM international conference on multimedia retrieval
- Jiang, Y.G.¹

54
- 84877645596
- Trajectory-based modeling of human actions with motion reference points
- Jiang YG, Dai Q, Xue X, Liu W, Ngo CW (2012) Trajectory-based modeling of human actions with motion reference points. In: Proceedings of European conference on computer vision
- (2012) In: Proceedings of European conference on computer vision
- Jiang, Y.G.¹ Dai, Q.² Xue, X.³ Liu, W.⁴ Ngo, C.W.⁵

55
- 36849003521
- Towards optimal bag-of-features for object categorization and semantic video retrieval
- Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proceedings of ACM international conference on image and video retrieval
- (2007) In: Proceedings of ACM international conference on image and video retrieval
- Jiang, Y.G.¹ Ngo, C.W.² Yang, J.³

56
- 72949121298
- Representations of keypoint-based semantic concept detection: a comprehensive study
- Jiang YG, Yang J, Ngo CW, Hauptmann AG (2010) Representations of keypoint-based semantic concept detection: a comprehensive study. IEEE Trans Multimedia 12(1):42–53
- (2010) IEEE Trans Multimedia , vol.12 , Issue.1 , pp. 42-53
- Jiang, Y.G.¹ Yang, J.² Ngo, C.W.³ Hauptmann, A.G.⁴

57
- 79959766559
- Consumer video understanding: a bechmark database and an evaluation of human and machine performance
- Jiang YG, Ye G, Chang SF, Ellis D, Loui AC (2011) Consumer video understanding: a bechmark database and an evaluation of human and machine performance. In: Proceedings of ACM international conference on multimedia retrieval
- (2011) In: Proceedings of ACM international conference on multimedia retrieval
- Jiang, Y.G.¹ Ye, G.² Chang, S.F.³ Ellis, D.⁴ Loui, A.C.⁵

58
- 84905161670
- Columbia-UCF TRECVID2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching
- Proceedings of NIST TRECVID, Workshop
- Jiang YG, Zeng X, Ye G, Bhattacharya S, Ellis D, Shah M, Chang SF (2010) Columbia-UCF TRECVID2010 multimedia event detection: Combining multiple modalities, contextual concepts, and temporal matching. In: Proceedings of NIST TRECVID, Workshop
- (2010) In
- Jiang, Y.G.¹ Zeng, X.² Ye, G.³ Bhattacharya, S.⁴ Ellis, D.⁵ Shah, M.⁶ Chang, S.F.⁷

59
- 33845524029
- Attribute grammar-based event recognition and anomaly detection
- Proceedings of IEEE conference on computer vision and pattern recognition, Workshop
- Joo SW, Chellappa R (2006) Attribute grammar-based event recognition and anomaly detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, Workshop
- (2006) In
- Joo, S.W.¹ Chellappa, R.²

60
- 5044233274
- PCA-SIFT: a more distinctive representation for local image descriptors
- Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2004) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Ke, Y.¹ Sukthankar, R.²

61
- 84898426452
- A spatio-temporal descriptor based on 3d-gradients
- Klaser A, Marszalek M, Schmid C (2008) A spatio-temporal descriptor based on 3d-gradients. In: Proceedings of British machine vision conference
- (2008) In: Proceedings of British machine vision conference
- Klaser, A.¹ Marszalek, M.² Schmid, C.³

62
- 80052896089
- Hough transform and 3D SURF for robust three dimensional classification
- Knopp J, Prasad M, Willems G, Timofte R, van Gool L (2010) Hough transform and 3D SURF for robust three dimensional classification. In: Proceedings of European conference on computer vision
- (2010) In: Proceedings of European conference on computer vision
- Knopp, J.¹ Prasad, M.² Willems, G.³ Timofte, R.⁴ van Gool, L.⁵

63
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- Kojima A, Tamura T, Fukunaga K (2002) Natural language description of human activities from video images based on concept hierarchy of actions. Int J Comput Vision 50(2):171–184
- (2002) Int J Comput Vision , vol.50 , Issue.2 , pp. 171-184
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

64
- 84856682691
- HMDB: a large video database for human motion recognition
- Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) HMDB: a large video database for human motion recognition. In: Proceedings of IEEE international conference on computer vision
- (2011) In: Proceedings of IEEE international conference on computer vision
- Kuehne, H.¹ Jhuang, H.² Garrote, E.³ Poggio, T.⁴ Serre, T.⁵

65
- 24944451092
- On space-time interest points
- Laptev I (2005) On space-time interest points. Int J Comput Vision 64:107–123
- (2005) Int J Comput Vision , vol.64 , pp. 107-123
- Laptev, I.¹

66
- 51949083365
- Learning realistic human actions from movies
- Laptev I, Marszalek M, Schmid C, Rozenfeld B (2008) Learning realistic human actions from movies. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2008) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Laptev, I.¹ Marszalek, M.² Schmid, C.³ Rozenfeld, B.⁴

67
- 69549119986
- Understanding video events: a survey of methods for automatic interpretation of semantic occurrences in videos
- Lavee G, Rivlin E, Rudzsky M (2009) Understanding video events: a survey of methods for automatic interpretation of semantic occurrences in videos. IEEE Trans Syst Man Cybernet Part C 39(5):489–504
- (2009) IEEE Trans Syst Man Cybernet Part C , vol.39 , Issue.5 , pp. 489-504
- Lavee, G.¹ Rivlin, E.² Rudzsky, M.³

68
- 33845572523
- Beyond bags of features: spatial pyramid matching for recognizing natural scene categories
- Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2006) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Lazebnik, S.¹ Schmid, C.² Ponce, J.³

69
- 80052874098
- Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In, Proceedings of IEEE conference on computer vision and, pattern recognition
- Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of IEEE conference on computer vision and, pattern recognition
- (2011) Ng AY
- Le, Q.V.¹ Zou, W.Y.² Yeung, S.Y.³

70
- 77955746721
- Audio-based semantic concept classification for consumer video
- Lee K, Ellis DPW (2010) Audio-based semantic concept classification for consumer video. IEEE Trans Audio Speech Lang Process 18(6):1406–1416
- (2010) IEEE Trans Audio Speech Lang Process , vol.18 , Issue.6 , pp. 1406-1416
- Lee, K.¹ Ellis, D.P.W.²

71
- 55149112799
- Expandable data-driven graphical modeling of human actions based on salient postures
- Li W, Zhang Z, Liu Z (2008) Expandable data-driven graphical modeling of human actions based on salient postures. IEEE Trans Circ Syst Video Technol 18(11):1499–1510
- (2008) IEEE Trans Circ Syst Video Technol , vol.18 , Issue.11 , pp. 1499-1510
- Li, W.¹ Zhang, Z.² Liu, Z.³

72
- 0032209062
- Feature detection with automatic scale selection
- Lindeberg T (1998) Feature detection with automatic scale selection. Int J Comput Vision 30:79–116
- (1998) Int J Comput Vision , vol.30 , pp. 79-116
- Lindeberg, T.¹

73
- 80052915325
- Recognizing human actions by attributes
- Liu J, Kuipers B, Savarese S (2011) Recognizing human actions by attributes. In: Proceedings of IEEE conference on computer vision and, pattern recognition, pp 3337–3344
- (2011) In: Proceedings of IEEE conference on computer vision and, pattern recognition , pp. 3337-3344
- Liu, J.¹ Kuipers, B.² Savarese, S.³

74
- 70450203660
- Recognizing realistic actions from videos “in the wild
- Liu J, Luo J, Shah M (2009) Recognizing realistic actions from videos “in the wild”. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2009) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Liu, J.¹ Luo, J.² Shah, M.³

75
- 51949085157
- Learning human actions via information maximization
- Liu J, Shah M (2008) Learning human actions via information maximization. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2008) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Liu, J.¹ Shah, M.²

76
- 37849015208
- In: Proceedings of ACM international workshop on multimedia, information retrieval
- Loui AC, Luo J, Chang SF, Ellis D, Jiang W, Kennedy L, Lee K, Yanagawa A (2007) Kodak’s consumer video benchmark data set: concept definition and annotation. In: Proceedings of ACM international workshop on multimedia, information retrieval
- (2007) Kodak’s consumer video benchmark data set: concept definition and annotation
- Loui, A.C.¹ Luo, J.² Chang, S.F.³ Ellis, D.⁴ Jiang, W.⁵ Kennedy, L.⁶ Lee, K.⁷ Yanagawa, A.⁸

77
- 3042535216
- Distinctive image features from scale-invariant keypoints
- Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vision 60:91–110
- (2004) Int J Comput Vision , vol.60 , pp. 91-110
- Lowe, D.¹

78
- 85008010045
- Audio keywords discovery for text-like audio content analysis and retrieval
- Lu L, Hanjalic A (2008) Audio keywords discovery for text-like audio content analysis and retrieval. IEEE Trans Multimedia 10(1):74–85
- (2008) IEEE Trans Multimedia , vol.10 , Issue.1 , pp. 74-85
- Lu, L.¹ Hanjalic, A.²

79
- 0019647180
- An iterative image registration technique with an application to stereo vision
- Lucas BD, Kanade T (1981) An iterative image registration technique with an application to stereo vision. In: Proceedings of international joint conference on artificial intelligence
- (1981) In: Proceedings of international joint conference on artificial intelligence
- Lucas, B.D.¹ Kanade, T.²

80
- 78149304826
- Sound retrieval and ranking using sparse auditory representations
- Lyon RF, Rehn M, Bengio S, Walters TC, Chechik G (2010) Sound retrieval and ranking using sparse auditory representations. Neural Comput 22(9):2390–2416
- (2010) Neural Comput , vol.22 , Issue.9 , pp. 2390-2416
- Lyon, R.F.¹ Rehn, M.² Bengio, S.³ Walters, T.C.⁴ Chechik, G.⁵

81
- 51949098112
- Classification using intersection kernel support vector machines is efficient
- Maji S, Berg AC, Malik J (2008) Classification using intersection kernel support vector machines is efficient. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2008) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Maji, S.¹ Berg, A.C.² Malik, J.³

82
- 84873528643
- Song-level features and support vector machines for music classification
- Mandel MI, Ellis DPW (2005) Song-level features and support vector machines for music classification. In: Proceedings of international society of music information retrieval conference
- (2005) In: Proceedings of international society of music information retrieval conference
- Mandel, M.I.¹ Ellis, D.P.W.²

83
- 0030213052
- Texture features for browsing and retrieval of image data
- Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans Pattern Anal Mach Intell 18(8):837–842
- (1996) IEEE Trans Pattern Anal Mach Intell , vol.18 , Issue.8 , pp. 837-842
- Manjunath, B.S.¹ Ma, W.Y.²

84
- 85046873967
- The det curve in assessment of detection task performance
- Martin A, Doddington G, Kamm T, Ordowski M, Przybocki M (1997) The det curve in assessment of detection task performance. In: Procedings of European conference on speech communication and technology, pp 1895–1898
- (1997) In: Procedings of European conference on speech communication and technology , pp. 1895-1898
- Martin, A.¹ Doddington, G.² Kamm, T.³ Ordowski, M.⁴ Przybocki, M.⁵

85
- 0041416425
- Robust wide baseline stereo from maximally stable extremal regions
- Matas J, Chum O, Urban M, Pajdla T (2002) Robust wide baseline stereo from maximally stable extremal regions. In: Proceedings of British machine vision conference, vol 1, pp 384–393
- (2002) Proceedings of British machine vision conference , vol.1 , pp. 384-393
- Matas, J.¹ Chum, O.² Urban, M.³ Pajdla, T.⁴

86
- 85019118042
- MediaEval: Multimedia retrieval benchmark evaluation. http://www.multimediaeval.org
- MediaEval: Multimedia retrieval benchmark evaluation

87
- 77953182943
- Activity recognition using the velocity histories of tracked keypoints
- Messing R, Pal C, Kautz H (2009) Activity recognition using the velocity histories of tracked keypoints. In: Proceedings of IEEE international conference on computer vision
- (2009) In: Proceedings of IEEE international conference on computer vision
- Messing, R.¹ Pal, C.² Kautz, H.³

88
- 9644260534
- Scale and affine invariant interest point detectors
- Mikolajczyk K, Schmid C (2004) Scale and affine invariant interest point detectors. Int J Comput Vision 60:63–86
- (2004) Int J Comput Vision , vol.60 , pp. 63-86
- Mikolajczyk, K.¹ Schmid, C.²

89
- 27644547620
- A performance evaluation of local descriptors
- Mikolajczyk K, Schmid C (2005) A performance evaluation of local descriptors. IEEE Trans Pattern Anal Mach Intell 27(10):1615–1630
- (2005) IEEE Trans Pattern Anal Mach Intell , vol.27 , Issue.10 , pp. 1615-1630
- Mikolajczyk, K.¹ Schmid, C.²

90
- 33244468369
- A comparison of affine region detectors
- Mikolajczyk K, Tuytelaars T, Schmid C, Zisserman A, Matas J et al (2005) A comparison of affine region detectors. Int J Comput Vision 65(1/2):43–72
- (2005) Int J Comput Vision , vol.65 , Issue.1-2 , pp. 43-72
- Mikolajczyk, K.¹ Tuytelaars, T.² Schmid, C.³ Zisserman, A.⁴ Matas, J.⁵

91
- 0032115209
- Video handling with music and speech detection
- Minami K, Akutsu A, Hamada H, Tonomura Y (1998) Video handling with music and speech detection. IEEE Multimedia Magazine 5:17–25
- (1998) IEEE Multimedia Magazine , vol.5 , pp. 17-25
- Minami, K.¹ Akutsu, A.² Hamada, H.³ Tonomura, Y.⁴

92
- 55449128654
- Recognizing multitasked activities using stochastic context-free grammar
- Moore D, Essa I (2001) Recognizing multitasked activities using stochastic context-free grammar. In: Proceedings of AAAI conference
- (2001) In: Proceedings of AAAI conference
- Moore, D.¹ Essa, I.²

93
- 48049095024
- Randomized clustering forests for image classification
- Moosmann F, Nowak E, Jurie F (2008) Randomized clustering forests for image classification. IEEE Trans Pattern Anal Mach Intell 30(9):1632–1646
- (2008) IEEE Trans Pattern Anal Mach Intell , vol.30 , Issue.9 , pp. 1632-1646
- Moosmann, F.¹ Nowak, E.² Jurie, F.³

94
- 77951750177
- Youtube scale, large vocabulary video annotation, Chapter 14 in video search and mining. Springer-Verlag series on studies in computational intelligence
- Morsillo N, Mann G, Pal C (2010) Youtube scale, large vocabulary video annotation, Chapter 14 in video search and mining. Springer-Verlag series on studies in computational intelligence. Springer, Berlin, pp 357–386
- (2010) Springer, Berlin , pp. 357-386
- Morsillo, N.¹ Mann, G.² Pal, C.³

95
- 33747626730
- Large-scale concept ontology for multimedia
- Naphade M, Smith J, Tesic J, Chang SF, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimedia Magazine 13(3):86–91
- (2006) IEEE Multimedia Magazine , vol.13 , Issue.3 , pp. 86-91
- Naphade, M.¹ Smith, J.² Tesic, J.³ Chang, S.F.⁴ Hsu, W.⁵ Kennedy, L.⁶ Hauptmann, A.⁷ Curtis, J.⁸

96
- 84905259534
- Proceedings of NIST TRECVID, Workshop
- Natarajan P et al (2011) BBN VISER TRECVID 2011 multimedia event detection system. In: Proceedings of NIST TRECVID, Workshop
- (2011) BBN VISER TRECVID 2011 multimedia event detection system
- Natarajan, P.¹

97
- 50949127608
- Online, real-time tracking and recognition of human actions
- Natarajan P, Nevatia R (2008) Online, real-time tracking and recognition of human actions. In: Proceedings of IEEE workshop on motion and video, computing, pp 1–8
- (2008) In: Proceedings of IEEE workshop on motion and video, computing , pp. 1-8
- Natarajan, P.¹ Nevatia, R.²

98
- 84905189035
- Proceedings of NIST TRECVID, Workshop
- Natsev A, Smith JR, Hill M, Hua G, Huang B, Merler M, Xie L, Ouyang H, Zhou, M (2010) IBM Research TRECVID-2010 video copy detection and multimedia event detection system. In: Proceedings of NIST TRECVID, Workshop
- (2010) IBM Research TRECVID-2010 video copy detection and multimedia event detection system
- Natsev, A.¹ Smith, J.R.² Hill, M.³ Hua, G.⁴ Huang, B.⁵ Merler, M.⁶ Xie, L.⁷ Ouyang, H.⁸ Zhou, M.⁹

99
- 85019125692
- NIST Trecvid Multimedia Event Detection (MED) task. http://www.nist.gov/itl/iad/mig/med.cfm
- NIST Trecvid Multimedia Event Detection (MED) task

100
- 33845592987
- Scalable recognition with a vocabulary tree
- Nister D, Stewenius H (2006) Scalable recognition with a vocabulary tree. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2006) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Nister, D.¹ Stewenius, H.²

101
- 33846249191
- Sampling strategies for bag-of-features image classification
- Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of European conference on computer vision
- (2006) In: Proceedings of European conference on computer vision
- Nowak, E.¹ Jurie, F.² Triggs, B.³

102
- 79952952363
- Spatiotemporal localization and categorization of human actions in unsegmented image sequences
- Oikonomopoulos A, Patras I, Pantic M (2011) Spatiotemporal localization and categorization of human actions in unsegmented image sequences. IEEE Trans Image Process 20(4):1126–1140
- (2011) IEEE Trans Image Process , vol.20 , Issue.4 , pp. 1126-1140
- Oikonomopoulos, A.¹ Patras, I.² Pantic, M.³

103
- 0036647193
- Multiresolution gray-scale and rotation invariant texture classification with local binary patterns
- Ojala T, Pietikainen M, Maenpaa T (2002) Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans Pattern Anal Mach Intell 24(7):971–987
- (2002) IEEE Trans Pattern Anal Mach Intell , vol.24 , Issue.7 , pp. 971-987
- Ojala, T.¹ Pietikainen, M.² Maenpaa, T.³

104
- 0035328421
- Modeling the shape of the scene: a holistic representation of the spatial envelope
- Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vision 42:145–175
- (2001) Int J Comput Vision , vol.42 , pp. 145-175
- Oliva, A.¹ Torralba, A.²

105
- 84866647901
- In: Proceedings of advances in neural information processing systems
- Ordonez V, Kulkarni G, Berg TL (2011) Im2Text: describing images using 1 million captioned photographs. In: Proceedings of advances in neural information processing systems
- (2011) Im2Text: describing images using 1 million captioned photographs
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

106
- 85133336275
- Bleu: a method for automatic evaluation of machine translation
- Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceeedings of the annual meeting of the association for computational linguistics
- (2002) In: Proceeedings of the annual meeting of the association for computational linguistics
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.J.⁴

107
- 0000460671
- Complex sounds and auditory images
- Patterson RD, Robinson K, Holdsworth J, McKeown D, Zhang C, Allerhand M (1992) Complex sounds and auditory images. In: Proceedings of international symposium on hearing, pp 429–446
- (1992) In: Proceedings of international symposium on hearing , pp. 429-446
- Patterson, R.D.¹ Robinson, K.² Holdsworth, J.³ McKeown, D.⁴ Zhang, C.⁵ Allerhand, M.⁶

108
- 79959771606
- Improving the fisher kernel for large-scale image classification
- Perronnin F, Sanchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: Proceedings of European conference on computer vision
- (2010) In: Proceedings of European conference on computer vision
- Perronnin, F.¹ Sanchez, J.² Mensink, T.³

109
- 51949105132
- Lost in quantization: improving particular object retrieval in large scale image databases
- Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2008) Lost in quantization: improving particular object retrieval in large scale image databases. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2008) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Philbin, J.¹ Chum, O.² Isard, M.³ Sivic, J.⁴ Zisserman, A.⁵

110
- 0003676543
- Chicago University Press, Chicago
- Pollard C, Sag I (1994) Head-driven phrase structure grammar. Chicago University Press, Chicago
- (1994) Head-driven phrase structure grammar
- Pollard, C.¹ Sag, I.²

111
- 77949275097
- Survey on vision-based human action recognition
- Poppe R (2010) Survey on vision-based human action recognition. Image Vision Comput 28(6):976–990
- (2010) Image Vision Comput , vol.28 , Issue.6 , pp. 976-990
- Poppe, R.¹

112
- 70450192896
- Dense saliency- based spatiotemporal feature points for action recognition
- Rapantzikos K, Avrithis Y, Kollias S (2009) Dense saliency- based spatiotemporal feature points for action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2009) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Rapantzikos, K.¹ Avrithis, Y.² Kollias, S.³

113
- 79958707658
- Tracklet descriptors for action modeling and video analysis
- Raptis M, Soatto S (2010) Tracklet descriptors for action modeling and video analysis. In: Proceedings of European conference on computer vision
- (2010) In: Proceedings of European conference on computer vision
- Raptis, M.¹ Soatto, S.²

114
- 51949084792
- Action mach: a spatio-temporal maximum average correlation height filter for action recognition
- Rodriguez MD, Ahmed J, Shah M (2008) Action mach: a spatio-temporal maximum average correlation height filter for action recognition. In: Procedings of IEEE conference on computer vision and pattern recognition
- (2008) In: Procedings of IEEE conference on computer vision and pattern recognition
- Rodriguez, M.D.¹ Ahmed, J.² Shah, M.³

115
- 0034313871
- The earth mover’s distance as a metric for image retrieval
- Rubner Y, Tomasi C, Guibas LJ (2000) The earth mover’s distance as a metric for image retrieval. Int J Comput Vision 40(2):99– 121
- (2000) Int J Comput Vision , vol.40 , Issue.2 , pp. 99-121
- Rubner, Y.¹ Tomasi, C.² Guibas, L.J.³

116
- 39749186006
- LabelMe: a database and web-based tool for image annotation
- Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vision 77(1–3):157–173
- (2008) Int J Comput Vision , vol.77 , Issue.1-3 , pp. 157-173
- Russell, B.¹ Torralba, A.² Murphy, K.³ Freeman, W.T.⁴

117
- 33845588233
- Recognition of composite human activities through context-free grammar based representation
- Ryoo MS, Aggarwal JK (2006) Recognition of composite human activities through context-free grammar based representation. In: Proceedings pf IEEE conference on computer vision and pattern recognition
- (2006) In: Proceedings pf IEEE conference on computer vision and pattern recognition
- Ryoo, M.S.¹ Aggarwal, J.K.²

118
- 27844565238
- Event detection in field sports video using audio-visual features and a support vector machine
- Sadlier DA, O’Connor NE (2005) Event detection in field sports video using audio-visual features and a support vector machine. IEEE Trans Circ Syst Video Technol 15(10):1225–1233
- (2005) IEEE Trans Circ Syst Video Technol , vol.15 , Issue.10 , pp. 1225-1233
- Sadlier, D.A.¹ O’Connor, N.E.²

119
- 77955426203
- Evaluating color descriptors for object and scene recognition
- van de Sande KEA, Gevers T, Snoek CGM (2010) Evaluating color descriptors for object and scene recognition. IEEE Trans Pattern Anal Mach Intell 32(9):1582–1596
- (2010) IEEE Trans Pattern Anal Mach Intell , vol.32 , Issue.9 , pp. 1582-1596
- van de Sande, K.E.A.¹ Gevers, T.² Snoek, C.G.M.³

120
- 80052871195
- Modeling the temporal extent of actions
- Satkin S, Hebert M (2010) Modeling the temporal extent of actions. In: Proceedings of European conference on computer vision
- (2010) In: Proceedings of European conference on computer vision
- Satkin, S.¹ Hebert, M.²

121
- 10044233701
- Recognizing human actions: a local SVM approach
- Schuldt C, Laptev I, Caputo B (2004) Recognizing human actions: a local SVM approach. In: Proceedings of international conference on pattern recognition
- (2004) In: Proceedings of international conference on pattern recognition
- Schuldt, C.¹ Laptev, I.² Caputo, B.³

122
- 37849037402
- A 3-dimensional SIFT descriptor and its application to action recognition
- Scovanner P, Ali S, Shah M (2007) A 3-dimensional SIFT descriptor and its application to action recognition. In: Proceedings of ACM international conference on multimedia
- (2007) In: Proceedings of ACM international conference on multimedia
- Scovanner, P.¹ Ali, S.² Shah, M.³

123
- 34948845616
- Matching local self-similarities across images and videos
- Shechtman E, Irani M (2007) Matching local self-similarities across images and videos. In: Proceedings lo IEEE conference on computer vision and pattern recognition
- (2007) In: Proceedings lo IEEE conference on computer vision and pattern recognition
- Shechtman, E.¹ Irani, M.²

124
- 51949114829
- Semantic texton forests for image categorization and segmentation
- Shotton J, Johnson M, Cipolla R (2008) Semantic texton forests for image categorization and segmentation. In: Proceedings of IEEE conference on computer vision and pattern recognitio
- (2008) In: Proceedings of IEEE conference on computer vision and pattern recognitio
- Shotton, J.¹ Johnson, M.² Cipolla, R.³

125
- 84856636962
- Unsupervised learning of event and-or grammar and semantics from video
- Si Z, Pei M, Yao B, Zhu SC (2011) Unsupervised learning of event and-or grammar and semantics from video. In: Proceedings IEEE international conference on computer vision
- (2011) In: Proceedings IEEE international conference on computer vision
- Si, Z.¹ Pei, M.² Yao, B.³ Zhu, S.C.⁴

126
- 51949097915
- Optimised KD-trees for fast image descriptor matching
- Silpa-Anan C, Hartley R (2008) Optimised KD-trees for fast image descriptor matching. In: IEEE conference on computer vision and pattern recognition
- (2008) In: IEEE conference on computer vision and pattern recognition
- Silpa-Anan, C.¹ Hartley, R.²

127
- 0345414182
- Video Google: a text retrieval approach to object matching in videos
- Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of IEEE international conference on computer vision
- (2003) In: Proceedings of IEEE international conference on computer vision
- Sivic, J.¹ Zisserman, A.²

128
- 34547401486
- In: Proceedings of ACM international workshop on multimedia information retrieval
- Smeaton AF, Over P, Kraaij W (2006) Evaluation campaigns and TRECVid. In: Proceedings of ACM international workshop on multimedia information retrieval
- (2006) Evaluation campaigns and TRECVid
- Smeaton, A.F.¹ Over, P.² Kraaij, W.³

129
- 0034498523
- Content based image retrieval at the end of the early years
- Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380
- (2000) IEEE Trans Pattern Anal Mach Intell , vol.22 , Issue.12 , pp. 1349-1380
- Smeulders, A.W.M.¹ Worring, M.² Santini, S.³ Gupta, A.⁴ Jain, R.⁵

130
- 68349121465
- Concept-based video retrieval
- Snoek CGM, Worring M (2008) Concept-based video retrieval. Found Trends Inf Retr 2(4):215–322
- (2008) Found Trends Inf Retr , vol.2 , Issue.4 , pp. 215-322
- Snoek, C.G.M.¹ Worring, M.²

131
- 0003459124
- Visual recognition of american sign language using hidden markov models
- Starner TE (1995) Visual recognition of american sign language using hidden markov models. Ph.D. thesis
- (1995) Ph.D thesis
- Starner, T.E.¹

132
- 70450214829
- Hierarchical spatio-temporal context modeling for action recognition
- Sun J, Wu X, Yan S, Cheong LF, Chua TS, Li J (2009) Hierarchical spatio-temporal context modeling for action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2009) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Sun, J.¹ Wu, X.² Yan, S.³ Cheong, L.F.⁴ Chua, T.S.⁵ Li, J.⁶

133
- 80155180597
- Automatic annotation of web videos. In: Proceedings of IEEE international conference on multimedia and expo
- Sun SW, Wang YCF, Hung YL, Chang CL, Chen KC, Cheng SS, Wang HM, Liao HYM (2011) Automatic annotation of web videos. In: Proceedings of IEEE international conference on multimedia and expo
- (2011) Liao HYM
- Sun, S.W.¹ Wang, Y.C.F.² Hung, Y.L.³ Chang, C.L.⁴ Chen, K.C.⁵ Cheng, S.S.⁶ Wang, H.M.⁷

134
- 84455192418
- Towards textually describing complex video contents with audio-visual concept classifiers
- Tan CC, Jiang YG, Ngo CW (2011) Towards textually describing complex video contents with audio-visual concept classifiers. In: Proceedings of ACM international conference on multimedia
- (2011) In: Proceedings of ACM international conference on multimedia
- Tan, C.C.¹ Jiang, Y.G.² Ngo, C.W.³

135
- 84867652321
- Convolutional learning of spatio-temporal features
- Taylor G, Fergus R, LeCun Y, Bregler C (2010) Convolutional learning of spatio-temporal features. In: Proceedings of European conference on computer vision
- (2010) In: Proceedings of European conference on computer vision
- Taylor, G.¹ Fergus, R.² LeCun, Y.³ Bregler, C.⁴

136
- 80052896768
- Efficient object category recognition using classemes
- Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: Proceedings of European conference on computer vision
- (2010) In: Proceedings of European conference on computer vision
- Torresani, L.¹ Szummer, M.² Fitzgibbon, A.³

137
- 69549103866
- Event modeling and recognition using markov logic networks
- Tran SD, Davis LS (2008) Event modeling and recognition using markov logic networks. In: Proceedings of European conference on computer vision
- (2008) In: Proceedings of European conference on computer vision
- Tran, S.D.¹ Davis, L.S.²

138
- 0035308821
- Content-based video parsing and indexing based on audio-visual interaction
- Tsekeridou S, Pitas I (2001) Content-based video parsing and indexing based on audio-visual interaction. IEEE Transactions on Circuits and Systems for Video Technology 11(4):522–535
- (2001) IEEE Transactions on Circuits and Systems for Video Technology , vol.11 , Issue.4 , pp. 522-535
- Tsekeridou, S.¹ Pitas, I.²

139
- 55149089260
- Machine recognition of human activities: a survey
- Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circ Syst Video Technol 18(11):1473–1488
- (2008) IEEE Trans Circ Syst Video Technol , vol.18 , Issue.11 , pp. 1473-1488
- Turaga, P.¹ Chellappa, R.² Subrahmanian, V.S.³ Udrea, O.⁴

140
- 77956000050
- Dense interest points
- Tuytelaars T (2010) Dense interest points. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2281–2288
- (2010) In: Proceedings of IEEE conference on computer vision and pattern recognition , pp. 2281-2288
- Tuytelaars, T.¹

141
- 84898475671
- Feature tracking and motion compensation for action recognition
- Uemura H, Ishikawa S, Mikolajczyk K (2008) Feature tracking and motion compensation for action recognition. In: Proceedings British machine vision conference
- (2008) In: Proceedings British machine vision conference
- Uemura, H.¹ Ishikawa, S.² Mikolajczyk, K.³

142
- 77958122462
- Real-time visual concept classification
- Uijlings JRR, Smeulders AWM, Scha RJH (2010) Real-time visual concept classification. IEEE Trans Multimedia 12(7): 665–680
- (2010) IEEE Trans Multimedia , vol.12 , Issue.7 , pp. 665-680
- Uijlings, J.R.R.¹ Smeulders, A.W.M.² Scha, R.J.H.³

143
- 85019139247
- University of Central Florida 50 human action dataset (2010). http://server.cs.ucf.edu/~ision/data/UCF50.rar
- (2010) University of Central Florida 50 human action dataset

144
- 57749118369
- Conditional random fields for activity recognition
- Vail DL, Veloso MM, Lafferty JD (2007) Conditional random fields for activity recognition. In: Proceedings of international joint conference on autonomous agents and multiagent systems
- (2007) In: Proceedings of international joint conference on autonomous agents and multiagent systems
- Vail, D.L.¹ Veloso, M.M.² Lafferty, J.D.³

145
- 77953196456
- Multiple kernels for object detection
- Vedaldi A, Gulshan V, Varma M, Zisserman A (2009) Multiple kernels for object detection. In: Proceedings of IEEE international conference on computer vision
- (2009) In: Proceedings of IEEE international conference on computer vision
- Vedaldi, A.¹ Gulshan, V.² Varma, M.³ Zisserman, A.⁴

146
- 56449089103
- Extracting and composing robust features with denoising autoencoders
- Vincent P, Larochelle H, Bengio Y, Manzagol PA (2008) Extracting and composing robust features with denoising autoencoders. In: Procedings of international conference on machine learning
- (2008) In: Procedings of international conference on machine learning
- Vincent, P.¹ Larochelle, H.² Bengio, Y.³ Manzagol, P.A.⁴

147
- 79551480483
- Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion
- Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol PA (2010) Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 11(12):3371–3408
- (2010) J Mach Learn Res , vol.11 , Issue.12 , pp. 3371-3408
- Vincent, P.¹ Larochelle, H.² Lajoie, I.³ Bengio, Y.⁴ Manzagol, P.A.⁵

148
- 0035680116
- Rapid object detection using a boosted cascade of simple features
- Viola P, Jones M (2001) Rapid object detection using a boosted cascade of simple features. In; Proceedings of IEEE conference on computer vision and pattern recognition
- (2001) In; Proceedings of IEEE conference on computer vision and pattern recognition
- Viola, P.¹ Jones, M.²

149
- 70350656206
- Video event detection using motion relativity and visual relatedness
- Wang F, Jiang YG, Ngo CW (2008) Video event detection using motion relativity and visual relatedness. In: Proceedings of ACM international conference on multimedia
- (2008) In: Proceedings of ACM international conference on multimedia
- Wang, F.¹ Jiang, Y.G.² Ngo, C.W.³

150
- 80052877143
- Action recognition by dense trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition
- Wang H, Klaser A, Schmid C, Liu CL (2011) Action recognition by dense trajectories. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2011) Liu CL
- Wang, H.¹ Klaser, A.² Schmid, C.³

151
- 77958592879
- Evaluation of local spatio-temporal features for action recognition
- Wang H, Ullah MM, Klaser A, Laptev I, Schmid C (2008) Evaluation of local spatio-temporal features for action recognition. In: Proceedings of British machine vision conference
- (2008) In: Proceedings of British machine vision conference
- Wang, H.¹ Ullah, M.M.² Klaser, A.³ Laptev, I.⁴ Schmid, C.⁵

152
- 77955988108
- Semi-supervised hashing for scalable image retrieval
- Wang J, Kumar S, Chang SF (2010) Semi-supervised hashing for scalable image retrieval. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2010) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Wang, J.¹ Kumar, S.² Chang, S.F.³

153
- 34948844544
- Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model
- Wang L, Suter D (2007) Recognizing human activities from silhouettes: motion subspace and factorial discriminative graphical model. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2007) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Wang, L.¹ Suter, D.²

154
- 70450216856
- Max-margin hidden conditional random fields for human action recognition
- Wang Y, Mori G (2009) Max-margin hidden conditional random fields for human action recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (2009) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Wang, Y.¹ Mori, G.²

155
- 33750025833
- Free viewpoint action recognition using motion history volumes
- Weinland D, Ronfard R, Boyer E (2006) Free viewpoint action recognition using motion history volumes. Comput Vision Image Underst 104(2):249–257
- (2006) Comput Vision Image Underst , vol.104 , Issue.2 , pp. 249-257
- Weinland, D.¹ Ronfard, R.² Boyer, E.³

156
- 85162007960
- Weiss Y, Torralba A, Fergus R (2008) Spectral hashing. In: Proceedings of advances in neural information processing systems
- (2008) Spectral hashing. In: Proceedings of advances in neural information processing systems
- Weiss, Y.¹ Torralba, A.² Fergus, R.³

157
- 85019057706
- In: Proceedings of ACM SIGKDD workshop on multimedia data mining
- White B, Yeh T, Lin J, Davis L (2009) Web-scale computer vision using mapreduce for multimedia data mining. In: Proceedings of ACM SIGKDD workshop on multimedia data mining
- (2009) Web-scale computer vision using mapreduce for multimedia data mining
- White, B.¹ Yeh, T.² Lin, J.³ Davis, L.⁴

158
- 70450196950
- An efficient dense and scale-invariant spatio-temporal interest point detector
- Willems G, Tuytelaars T, van Gool L (2008) An efficient dense and scale-invariant spatio-temporal interest point detector. In: Proceedings European conference on computer vision
- (2008) In: Proceedings European conference on computer vision
- Willems, G.¹ Tuytelaars, T.² van Gool, L.³

159
- 84863082785
- Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories
- Wu S, Oreifej O, Shah M (2011) Action recognition in videos acquired by a moving camera using motion decomposition of lagrangian particle trajectories. In: Proceedings of IEEE international conference on computer vision
- (2011) In: Proceedings of IEEE international conference on computer vision
- Wu, S.¹ Oreifej, O.² Shah, M.³

160
- 2142771243
- Structure analysis of soccer video with domain knowledge and hidden markov models
- Xie L, Xu P, Chang SF, Divakaran A, Sun H (2004) Structure analysis of soccer video with domain knowledge and hidden markov models. Pattern Recognit Lett 25(7):767–775
- (2004) Pattern Recognit Lett , vol.25 , Issue.7 , pp. 767-775
- Xie, L.¹ Xu, P.² Chang, S.F.³ Divakaran, A.⁴ Sun, H.⁵

161
- 41549084805
- A novel framework for semantic annotation and personalized retrieval of sports video
- Xu C, Wang J, Lu H, Zhang Y (2008) A novel framework for semantic annotation and personalized retrieval of sports video. IEEE Trans Multimedia 10(3):421–436
- (2008) IEEE Trans Multimedia , vol.10 , Issue.3 , pp. 421-436
- Xu, C.¹ Wang, J.² Lu, H.³ Zhang, Y.⁴

162
- 54749131961
- Video event recognition using Kernel methods with multilevel temporal alignment
- Xu D, Chang SF (2008) Video event recognition using Kernel methods with multilevel temporal alignment. IEEE Trans Pattern Anal Mach Intell 30(11):1985–1997
- (2008) IEEE Trans Pattern Anal Mach Intell , vol.30 , Issue.11 , pp. 1985-1997
- Xu, D.¹ Chang, S.F.²

163
- 84908595706
- Creating audio keywords for event detection in soccer video
- Xu M, Maddage NC, Xu C, Kankanhalli M, Tian Q (2003) Creating audio keywords for event detection in soccer video. In: Proceedings IEEE international conference on multimedia and expo
- (2003) In: Proceedings IEEE international conference on multimedia and expo
- Xu, M.¹ Maddage, N.C.² Xu, C.³ Kankanhalli, M.⁴ Tian, Q.⁵

164
- 85060905667
- Recognizing human action in time-sequential images using hidden markov model
- Yamato J, Ohya J, Ishii K (1992) Recognizing human action in time-sequential images using hidden markov model. In: Proceedings of IEEE conference on computer vision and pattern recognition
- (1992) In: Proceedings of IEEE conference on computer vision and pattern recognition
- Yamato, J.¹ Ohya, J.² Ishii, K.³

165
- 72449167906
- In: Proceedings of ACM workshop on large-scale multimedia retrieval and mining
- Yan R, Fleury MO, Merler M, Natsev A, Smith JR (2009) Large-scale multimedia semantic concept modeling using robust subspace bagging and mapreduce. In: Proceedings of ACM workshop on large-scale multimedia retrieval and mining
- (2009) Large-scale multimedia semantic concept modeling using robust subspace bagging and mapreduce
- Yan, R.¹ Fleury, M.O.² Merler, M.³ Natsev, A.⁴ Smith, J.R.⁵

166
- 36849001806
- Columbia University, Tech. rep
- Yanagawa A, Hsu W, Chang SF (2006) Brief descriptions of visual features for baseline trecvid concept detectors. Columbia University, Tech. rep.
- (2006) Brief descriptions of visual features for baseline trecvid concept detectors
- Yanagawa, A.¹ Hsu, W.² Chang, S.F.³

167
- 77954862144
- I2T: Image parsing to text description
- Yao B, Yang X, Lin L, Lee M, Zhu S (2010) I2T: Image parsing to text description. Proc IEEE 98(8):1485–1508
- (2010) Proc IEEE , vol.98 , Issue.8 , pp. 1485-1508
- Yao, B.¹ Yang, X.² Lin, L.³ Lee, M.⁴ Zhu, S.⁵

168
- 84864147835
- Joint audio-visual bi-modal codewords for video event detection
- Ye G, Jhuo IH, Liu D, Jiang YG, Chang SF (2012) Joint audio-visual bi-modal codewords for video event detection. In: Proceedings of ACM international conference on multimedia retrieval
- (2012) In: Proceedings of ACM international conference on multimedia retrieval
- Ye, G.¹ Jhuo, I.H.² Liu, D.³ Jiang, Y.G.⁴ Chang, S.F.⁵

169
- 84866712367
- Ye G, Liu D, Jhuo IH, Chang SF (2012) Robust late fusion with rank minimization. In: Proceedings IEEE conference on computer vision and pattern recognition
- (2012) Robust late fusion with rank minimization. In: Proceedings IEEE conference on computer vision and pattern recognition
- Ye, G.¹ Liu, D.² Jhuo, I.H.³ Chang, S.F.⁴

170
- 84898414417
- Real-time action recognition by sptiotemoral semantic and structural forests
- Yu TH, Kim TK, Cipolla R (2010) Real-time action recognition by sptiotemoral semantic and structural forests. In: Proceedings of British machine vision conference
- (2010) In: Proceedings of British machine vision conference
- Yu, T.H.¹ Kim, T.K.² Cipolla, R.³

171
- 84867825531
- Middle-level representation for human activities recognition: the role of spatio-temporal relationships
- Yuan F, Prinet V, Yuan J (2010) Middle-level representation for human activities recognition: the role of spatio-temporal relationships. In: Proceedings of ECCV Workshop on human motion: understanding, modeling, capture and animation
- (2010) In: Proceedings of ECCV Workshop on human motion: understanding, modeling capture and animation
- Yuan, F.¹ Prinet, V.² Yuan, J.³

172
- 77953190737
- LabelMe video: building a video database with human annotations
- Yuen J, Russell BC, Liu C, Torralba A (2009) LabelMe video: building a video database with human annotations. In: Proceedings of international conference on computer vision
- (2009) In: Proceedings of international conference on computer vision
- Yuen, J.¹ Russell, B.C.² Liu, C.³ Torralba, A.⁴

173
- 0037700828
- Event detection in baseball video using superimposed caption recognition
- Zhang D, Chang SF (2002) Event detection in baseball video using superimposed caption recognition. In: Proceedings of ACM international conference on multimedia
- (2002) In: Proceedings of ACM international conference on multimedia
- Zhang, D.¹ Chang, S.F.²

174
- 33846580425
- Local features and kernels for classification of texture and object categories: a comprehensive study
- Zhang J, Marszalek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vision 73(2):213–238
- (2007) Int J Comput Vision , vol.73 , Issue.2 , pp. 213-238
- Zhang, J.¹ Marszalek, M.² Lazebnik, S.³ Schmid, C.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.