SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 3112-3121

Harnessing Object and Scene Semantics for Large-Scale Video Understanding

(4) Wu, Zuxuan a Fu, Yanwei b Jiang, Yu Gang a Sigal, Leonid b

a FUDAN UNIVERSITY (China)

b DISNEY RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPLEX NETWORKS; COMPUTER VISION; NETWORK LAYERS; PATTERN RECOGNITION;

ACTION RECOGNITION; CLASSIFICATION AND CLUSTERING; LARGE-SCALE DATASETS; SEMANTIC RELATIONSHIPS; SEMANTIC REPRESENTATION; THREE-LAYER NEURAL NETWORKS; VIDEO CATEGORIZATION; VIDEO UNDERSTANDING;

SEMANTICS;

EID: 84986250477 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.339 Document Type: Conference Paper

Times cited : (107)

References (42)

1
- 0023322501
- Recognition by components-a theory of human image understanding
- I. Biederman. Recognition by components-a theory of human image understanding. Psychological Review, 1987. 3.1
- (1987) Psychological Review , pp. 3-10
- Biederman, I.¹

2
- 85198028989
- ImageNet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009. 3.1
- (2009) CVPR , pp. 3-10
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

3
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

4
- 84899022684
- Learning multi-modal latent attributes
- Y. Fu, T. M. Hospedales, T. Xiang, and S. Gong. Learning multi-modal latent attributes. TPAMI, 2013.
- (2013) TPAMI
- Fu, Y.¹ Hospedales, T.M.² Xiang, T.³ Gong, S.⁴

5
- 84986246085
- Semi-supervised vocabulary-informed learning
- Y. Fu and L. Sigal. Semi-supervised vocabulary-informed learning. In CVPR, 2016.
- (2016) CVPR
- Fu, Y.¹ Sigal, L.²

6
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
- (2014) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

7
- 84959216468
- Activitynet: A large-scale video benchmark for human activity understanding
- 4.1, 4.2
- F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles. Activitynet: A large-scale video benchmark for human activity understanding. In CVPR, 2015. 1, 4.1, 4.2
- (2015) CVPR , pp. 1
- Heilbron, F.C.¹ Escorcia, V.² Ghanem, B.³ Niebles, J.C.⁴

8
- 84937837455
- A unified semantic embedding: Relating taxonomies and attributes
- S. J. Hwang and L. Sigal. A unified semantic embedding: relating taxonomies and attributes. In NIPS, 2014.
- (2014) NIPS
- Hwang, S.J.¹ Sigal, L.²

9
- 79958737093
- Object, scene and actions: Combining multiple features for human action recognition
- N. Ikizler-Cinbis and S. Sclaroff. Object, scene and actions: Combining multiple features for human action recognition. In ECCV, 2010.
- (2010) ECCV
- Ikizler-Cinbis, N.¹ Sclaroff, S.²

10
- 84959235126
- What do 15,000 object categories tell us about classifying and localizing actions?
- M. Jain, J. C. van Gemert, and C. G. Snoek. What do 15,000 object categories tell us about classifying and localizing actions? In CVPR, 2015.
- (2015) CVPR
- Jain, M.¹ Van Gemert, J.C.² Snoek, C.G.³

11
- 84986185450
- High-level event recognition in unconstrained videos
- Y.-G. Jiang, S. Bhattacharya, S.-F. Chang, and M. Shah. High-level event recognition in unconstrained videos. IJMIR, 2013.
- (2013) IJMIR
- Jiang, Y.-G.¹ Bhattacharya, S.² Chang, S.-F.³ Shah, M.⁴

12
- 84986243687
- Exploiting feature and class relationships in video categorization with regularized deep neural networks
- Y.-G. Jiang, Z. Wu, J. Wang, X. Xue, and S.-F. Chang. Exploiting feature and class relationships in video categorization with regularized deep neural networks. CoRR, 2015. 1, 4.1, 4.2
- (2015) CoRR , vol.1
- Jiang, Y.-G.¹ Wu, Z.² Wang, J.³ Xue, X.⁴ Chang, S.-F.⁵

13
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

14
- 84856682691
- HMDB: A large video database for human motion recognition
- H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: A large video database for human motion recognition. In ICCV, 2011. 4.1
- (2011) ICCV
- Kuehne, H.¹ Jhuang, H.² Garrote, E.³ Poggio, T.⁴ Serre, T.⁵

15
- 84925402963
- Attributebased classification for zero-shot visual object categorization
- 4.4
- C. H. Lampert, H. Nickisch, and S. Harmeling. Attributebased classification for zero-shot visual object categorization. IEEE TPAMI, 2013. 1, 1, 4.4
- (2013) IEEE TPAMI , vol.1 , pp. 1
- Lampert, C.H.¹ Nickisch, H.² Harmeling, S.³

16
- 85162513516
- Object bank: A high-level image representation for scene classification & semantic feature sparsification
- L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS, 2010.
- (2010) NIPS
- Li, L.-J.¹ Su, H.² Xing, E.P.³ Fei-Fei, L.⁴

17
- 84875599426
- Video event recognition using concept attributes
- J. Liu, Q. Yu, O. Javed, S. Ali, A. Tamrakar, A. Divakaran, H. Cheng, and H. Sawhney. Video event recognition using concept attributes. In IEEE Workshop on WACV, 2013.
- (2013) IEEE Workshop on WACV
- Liu, J.¹ Yu, Q.² Javed, O.³ Ali, S.⁴ Tamrakar, A.⁵ Divakaran, A.⁶ Cheng, H.⁷ Sawhney, H.⁸

18
- 70450177757
- Actions in context
- M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In CVPR, 2009.
- (2009) CVPR
- Marszalek, M.¹ Laptev, I.² Schmid, C.³

19
- 84898956512
- Distributed representations of words and phrases and their compositionality
- T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013. 3.3, 4.1
- (2013) NIPS , vol.3 , Issue.3
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.⁴ Dean, J.⁵

20
- 84866712341
- Multimodal feature fusion for robust event detection in web videos
- P. Natarajan, S. Wu, S. Vitaladevuni, X. Zhuang, S. Tsakalidis, U. Park, R. Prasad, and P. Natarajan. Multimodal feature fusion for robust event detection in web videos. In CVPR, 2012. 5, 4.2
- (2012) CVPR
- Natarajan, P.¹ Wu, S.² Vitaladevuni, S.³ Zhuang, X.⁴ Tsakalidis, S.⁵ Park, U.⁶ Prasad, R.⁷ Natarajan, P.⁸

21
- 84959228762
- Beyond short snippets: Deep networks for video classification
- J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. In CVPR, 2015.
- (2015) CVPR
- Ng, J.Y.-H.¹ Hausknecht, M.² Vijayanarasimhan, S.³ Vinyals, O.⁴ Monga, R.⁵ Toderici, G.⁶

22
- 85083952206
- Zero-shot learning by convex combination of semantic embeddings
- 4.4
- M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. S. Corrado, and J. Dean. Zero-shot learning by convex combination of semantic embeddings. In ICLR, 2014. 3.3, 2, 4.4
- (2014) ICLR , vol.3 , Issue.3 , pp. 2
- Norouzi, M.¹ Mikolov, T.² Bengio, S.³ Singer, Y.⁴ Shlens, J.⁵ Frome, A.⁶ Corrado, G.S.⁷ Dean, J.⁸

23
- 84961289992
- Glove: Global vectors for word representation
- J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014. 3.3
- (2014) EMNLP , vol.3 , Issue.3
- Pennington, J.¹ Socher, R.² Manning, C.D.³

24
- 33845589915
- Probabilities for SV machines
- J. Platt. Probabilities for SV machines. In Advances in Large Margin Classifiers, 2000. 3.3
- (2000) Advances in Large Margin Classifiers , vol.3 , Issue.3
- Platt, J.¹

25
- 84867844062
- Weakly supervised learning of interactions between humans and objects
- A. Prest, C. Schmid, and V. Ferrari. Weakly supervised learning of interactions between humans and objects. IEEE TPAMI, 2012.
- (2012) IEEE TPAMI
- Prest, A.¹ Schmid, C.² Ferrari, V.³

26
- 77955989949
- What helps where-and why? Semantic relatedness for knowledge transfer
- M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele. What helps where-and why? semantic relatedness for knowledge transfer. In CVPR, 2010. 3.3
- (2010) CVPR , vol.3 , Issue.3
- Rohrbach, M.¹ Stark, M.² Szarvas, G.³ Gurevych, I.⁴ Schiele, B.⁵

27
- 84866718894
- Action bank: A high-level representation of activity in video
- S. Sadanand and J. Corso. Action bank: A high-level representation of activity in video. In CVPR, 2012.
- (2012) CVPR
- Sadanand, S.¹ Corso, J.²

28
- 84883487458
- Image classification with the fisher vector: Theory and practice
- J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. IJCV, 2013.
- (2013) IJCV
- Sánchez, J.¹ Perronnin, F.² Mensink, T.³ Verbeek, J.⁴

29
- 84887325615
- Similarity constrained latent support vector machine: An application to weakly supervised action classification
- N. Shapovalova, A. Vahdat, K. Cannons, T. Lan, and G. Mori. Similarity constrained latent support vector machine: An application to weakly supervised action classification. In ECCV, 2012.
- (2012) ECCV
- Shapovalova, N.¹ Vahdat, A.² Cannons, K.³ Lan, T.⁴ Mori, G.⁵

30
- 85083953896
- Deep inside convolutional networks: Visualising image classification models and saliency maps
- K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR. 2014. 2, 3.2
- (2014) ICLR
- Simonyan, K.¹ Vedaldi, A.² Zisserman, A.³

31
- 84937862424
- Two-stream convolutional networks for action recognition in videos
- K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
- (2014) NIPS
- Simonyan, K.¹ Zisserman, A.²

32
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 2, 3.1
- (2015) ICLR , vol.2 , pp. 3-10
- Simonyan, K.¹ Zisserman, A.²

33
- 84904972001
- UCF101: A dataset of 101 human actions classes from videos in the wild
- K. Soomro, A. R. Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. CRCVTR-12-01, 2012. 4.1
- (2012) CRCVTR-12-01
- Soomro, K.¹ Zamir, A.R.² Shah, M.³

34
- 84973865953
- Learning spatiotemporal features with 3d convolutional networks
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
- (2015) ICCV
- Tran, D.¹ Bourdev, L.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

35
- 84898805910
- Action recognition with improved trajectories
- H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
- (2013) ICCV
- Wang, H.¹ Schmid, C.²

36
- 84990022067
- Modeling spatial-temporal clues in a hybrid deep learning framework for video classification
- Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In ACM MM, 2015.
- (2015) ACM MM
- Wu, Z.¹ Wang, X.² Jiang, Y.-G.³ Ye, H.⁴ Xue, X.⁵

37
- 84959236591
- Can humans fly? Action understanding with multiple classes of actors
- C. Xu, S.-H. Hsieh, C. Xiong, and J. J. Corso. Can humans fly? action understanding with multiple classes of actors. In CVPR, 2015.
- (2015) CVPR
- Xu, C.¹ Hsieh, S.-H.² Xiong, C.³ Corso, J.J.⁴

38
- 84887368641
- Designing category-level attributes for discriminative visual recognition
- F. X. Yu, L. Cao, R. S. Feris, J. R. Smith, and S.-F. Chang. Designing category-level attributes for discriminative visual recognition. In CVPR, 2013. 3.3
- (2013) CVPR , pp. 3
- Yu, F.X.¹ Cao, L.² Feris, R.S.³ Smith, J.R.⁴ Chang, S.-F.⁵

39
- 84921476116
- Fergus:. Visualizing and understanding convolutional networks
- M. D. Zeiler and R. Fergus:. Visualizing and understanding convolutional networks. In ECCV. 2014.
- (2014) ECCV
- Zeiler, M.D.¹

40
- 85083952996
- Object detectors emerge in deep scene cnns
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. In ICLR, 2015.
- (2015) ICLR
- Zhou, B.¹ Khosla, A.² Lapedriza, A.³ Oliva, A.⁴ Torralba, A.⁵

41
- 84937964578
- Learning deep features for scene recognition using places database
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In NIPS. 2014. 3.1
- (2014) NIPS , pp. 3-10
- Zhou, B.¹ Lapedriza, A.² Xiao, J.³ Torralba, A.⁴ Oliva, A.⁵

42
- 84952058866
- Reasoning about object affordances in a knowledge base representation
- Y. Zhu, A. Fathi, and L. Fei-Fei. Reasoning about object affordances in a knowledge base representation. In ECCV, 2014.
- (2014) ECCV
- Zhu, Y.¹ Fathi, A.² Fei-Fei, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.