-
1
-
-
0023322501
-
Recognition by components-a theory of human image understanding
-
I. Biederman. Recognition by components-a theory of human image understanding. Psychological Review, 1987. 3.1
-
(1987)
Psychological Review
, pp. 3-10
-
-
Biederman, I.1
-
2
-
-
85198028989
-
ImageNet: A large-scale hierarchical image database
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A large-scale hierarchical image database. In CVPR, 2009. 3.1
-
(2009)
CVPR
, pp. 3-10
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
3
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
5
-
-
84986246085
-
Semi-supervised vocabulary-informed learning
-
Y. Fu and L. Sigal. Semi-supervised vocabulary-informed learning. In CVPR, 2016.
-
(2016)
CVPR
-
-
Fu, Y.1
Sigal, L.2
-
6
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
7
-
-
84959216468
-
Activitynet: A large-scale video benchmark for human activity understanding
-
4.1, 4.2
-
F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles. Activitynet: A large-scale video benchmark for human activity understanding. In CVPR, 2015. 1, 4.1, 4.2
-
(2015)
CVPR
, pp. 1
-
-
Heilbron, F.C.1
Escorcia, V.2
Ghanem, B.3
Niebles, J.C.4
-
8
-
-
84937837455
-
A unified semantic embedding: Relating taxonomies and attributes
-
S. J. Hwang and L. Sigal. A unified semantic embedding: relating taxonomies and attributes. In NIPS, 2014.
-
(2014)
NIPS
-
-
Hwang, S.J.1
Sigal, L.2
-
9
-
-
79958737093
-
Object, scene and actions: Combining multiple features for human action recognition
-
N. Ikizler-Cinbis and S. Sclaroff. Object, scene and actions: Combining multiple features for human action recognition. In ECCV, 2010.
-
(2010)
ECCV
-
-
Ikizler-Cinbis, N.1
Sclaroff, S.2
-
10
-
-
84959235126
-
What do 15,000 object categories tell us about classifying and localizing actions?
-
M. Jain, J. C. van Gemert, and C. G. Snoek. What do 15,000 object categories tell us about classifying and localizing actions? In CVPR, 2015.
-
(2015)
CVPR
-
-
Jain, M.1
Van Gemert, J.C.2
Snoek, C.G.3
-
12
-
-
84986243687
-
Exploiting feature and class relationships in video categorization with regularized deep neural networks
-
Y.-G. Jiang, Z. Wu, J. Wang, X. Xue, and S.-F. Chang. Exploiting feature and class relationships in video categorization with regularized deep neural networks. CoRR, 2015. 1, 4.1, 4.2
-
(2015)
CoRR
, vol.1
-
-
Jiang, Y.-G.1
Wu, Z.2
Wang, J.3
Xue, X.4
Chang, S.-F.5
-
13
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
14
-
-
84856682691
-
HMDB: A large video database for human motion recognition
-
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. HMDB: A large video database for human motion recognition. In ICCV, 2011. 4.1
-
(2011)
ICCV
-
-
Kuehne, H.1
Jhuang, H.2
Garrote, E.3
Poggio, T.4
Serre, T.5
-
15
-
-
84925402963
-
Attributebased classification for zero-shot visual object categorization
-
4.4
-
C. H. Lampert, H. Nickisch, and S. Harmeling. Attributebased classification for zero-shot visual object categorization. IEEE TPAMI, 2013. 1, 1, 4.4
-
(2013)
IEEE TPAMI
, vol.1
, pp. 1
-
-
Lampert, C.H.1
Nickisch, H.2
Harmeling, S.3
-
16
-
-
85162513516
-
Object bank: A high-level image representation for scene classification & semantic feature sparsification
-
L.-J. Li, H. Su, E. P. Xing, and L. Fei-Fei. Object bank: A high-level image representation for scene classification & semantic feature sparsification. In NIPS, 2010.
-
(2010)
NIPS
-
-
Li, L.-J.1
Su, H.2
Xing, E.P.3
Fei-Fei, L.4
-
17
-
-
84875599426
-
Video event recognition using concept attributes
-
J. Liu, Q. Yu, O. Javed, S. Ali, A. Tamrakar, A. Divakaran, H. Cheng, and H. Sawhney. Video event recognition using concept attributes. In IEEE Workshop on WACV, 2013.
-
(2013)
IEEE Workshop on WACV
-
-
Liu, J.1
Yu, Q.2
Javed, O.3
Ali, S.4
Tamrakar, A.5
Divakaran, A.6
Cheng, H.7
Sawhney, H.8
-
19
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
T. Mikolov, I. Sutskever, K. Chen, G. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013. 3.3, 4.1
-
(2013)
NIPS
, vol.3
, Issue.3
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.4
Dean, J.5
-
20
-
-
84866712341
-
Multimodal feature fusion for robust event detection in web videos
-
P. Natarajan, S. Wu, S. Vitaladevuni, X. Zhuang, S. Tsakalidis, U. Park, R. Prasad, and P. Natarajan. Multimodal feature fusion for robust event detection in web videos. In CVPR, 2012. 5, 4.2
-
(2012)
CVPR
-
-
Natarajan, P.1
Wu, S.2
Vitaladevuni, S.3
Zhuang, X.4
Tsakalidis, S.5
Park, U.6
Prasad, R.7
Natarajan, P.8
-
21
-
-
84959228762
-
Beyond short snippets: Deep networks for video classification
-
J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. In CVPR, 2015.
-
(2015)
CVPR
-
-
Ng, J.Y.-H.1
Hausknecht, M.2
Vijayanarasimhan, S.3
Vinyals, O.4
Monga, R.5
Toderici, G.6
-
22
-
-
85083952206
-
Zero-shot learning by convex combination of semantic embeddings
-
4.4
-
M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. S. Corrado, and J. Dean. Zero-shot learning by convex combination of semantic embeddings. In ICLR, 2014. 3.3, 2, 4.4
-
(2014)
ICLR
, vol.3
, Issue.3
, pp. 2
-
-
Norouzi, M.1
Mikolov, T.2
Bengio, S.3
Singer, Y.4
Shlens, J.5
Frome, A.6
Corrado, G.S.7
Dean, J.8
-
23
-
-
84961289992
-
Glove: Global vectors for word representation
-
J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014. 3.3
-
(2014)
EMNLP
, vol.3
, Issue.3
-
-
Pennington, J.1
Socher, R.2
Manning, C.D.3
-
25
-
-
84867844062
-
Weakly supervised learning of interactions between humans and objects
-
A. Prest, C. Schmid, and V. Ferrari. Weakly supervised learning of interactions between humans and objects. IEEE TPAMI, 2012.
-
(2012)
IEEE TPAMI
-
-
Prest, A.1
Schmid, C.2
Ferrari, V.3
-
26
-
-
77955989949
-
What helps where-and why? Semantic relatedness for knowledge transfer
-
M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele. What helps where-and why? semantic relatedness for knowledge transfer. In CVPR, 2010. 3.3
-
(2010)
CVPR
, vol.3
, Issue.3
-
-
Rohrbach, M.1
Stark, M.2
Szarvas, G.3
Gurevych, I.4
Schiele, B.5
-
27
-
-
84866718894
-
Action bank: A high-level representation of activity in video
-
S. Sadanand and J. Corso. Action bank: A high-level representation of activity in video. In CVPR, 2012.
-
(2012)
CVPR
-
-
Sadanand, S.1
Corso, J.2
-
28
-
-
84883487458
-
Image classification with the fisher vector: Theory and practice
-
J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. IJCV, 2013.
-
(2013)
IJCV
-
-
Sánchez, J.1
Perronnin, F.2
Mensink, T.3
Verbeek, J.4
-
29
-
-
84887325615
-
Similarity constrained latent support vector machine: An application to weakly supervised action classification
-
N. Shapovalova, A. Vahdat, K. Cannons, T. Lan, and G. Mori. Similarity constrained latent support vector machine: An application to weakly supervised action classification. In ECCV, 2012.
-
(2012)
ECCV
-
-
Shapovalova, N.1
Vahdat, A.2
Cannons, K.3
Lan, T.4
Mori, G.5
-
30
-
-
85083953896
-
Deep inside convolutional networks: Visualising image classification models and saliency maps
-
K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. In ICLR. 2014. 2, 3.2
-
(2014)
ICLR
-
-
Simonyan, K.1
Vedaldi, A.2
Zisserman, A.3
-
31
-
-
84937862424
-
Two-stream convolutional networks for action recognition in videos
-
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
-
(2014)
NIPS
-
-
Simonyan, K.1
Zisserman, A.2
-
32
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 2, 3.1
-
(2015)
ICLR
, vol.2
, pp. 3-10
-
-
Simonyan, K.1
Zisserman, A.2
-
33
-
-
84904972001
-
UCF101: A dataset of 101 human actions classes from videos in the wild
-
K. Soomro, A. R. Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. CRCVTR-12-01, 2012. 4.1
-
(2012)
CRCVTR-12-01
-
-
Soomro, K.1
Zamir, A.R.2
Shah, M.3
-
34
-
-
84973865953
-
Learning spatiotemporal features with 3d convolutional networks
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
-
(2015)
ICCV
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
35
-
-
84898805910
-
Action recognition with improved trajectories
-
H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
-
(2013)
ICCV
-
-
Wang, H.1
Schmid, C.2
-
36
-
-
84990022067
-
Modeling spatial-temporal clues in a hybrid deep learning framework for video classification
-
Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. In ACM MM, 2015.
-
(2015)
ACM MM
-
-
Wu, Z.1
Wang, X.2
Jiang, Y.-G.3
Ye, H.4
Xue, X.5
-
37
-
-
84959236591
-
Can humans fly? Action understanding with multiple classes of actors
-
C. Xu, S.-H. Hsieh, C. Xiong, and J. J. Corso. Can humans fly? action understanding with multiple classes of actors. In CVPR, 2015.
-
(2015)
CVPR
-
-
Xu, C.1
Hsieh, S.-H.2
Xiong, C.3
Corso, J.J.4
-
38
-
-
84887368641
-
Designing category-level attributes for discriminative visual recognition
-
F. X. Yu, L. Cao, R. S. Feris, J. R. Smith, and S.-F. Chang. Designing category-level attributes for discriminative visual recognition. In CVPR, 2013. 3.3
-
(2013)
CVPR
, pp. 3
-
-
Yu, F.X.1
Cao, L.2
Feris, R.S.3
Smith, J.R.4
Chang, S.-F.5
-
39
-
-
84921476116
-
Fergus:. Visualizing and understanding convolutional networks
-
M. D. Zeiler and R. Fergus:. Visualizing and understanding convolutional networks. In ECCV. 2014.
-
(2014)
ECCV
-
-
Zeiler, M.D.1
-
40
-
-
85083952996
-
Object detectors emerge in deep scene cnns
-
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Object detectors emerge in deep scene cnns. In ICLR, 2015.
-
(2015)
ICLR
-
-
Zhou, B.1
Khosla, A.2
Lapedriza, A.3
Oliva, A.4
Torralba, A.5
-
41
-
-
84937964578
-
Learning deep features for scene recognition using places database
-
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In NIPS. 2014. 3.1
-
(2014)
NIPS
, pp. 3-10
-
-
Zhou, B.1
Lapedriza, A.2
Xiao, J.3
Torralba, A.4
Oliva, A.5
-
42
-
-
84952058866
-
Reasoning about object affordances in a knowledge base representation
-
Y. Zhu, A. Fathi, and L. Fei-Fei. Reasoning about object affordances in a knowledge base representation. In ECCV, 2014.
-
(2014)
ECCV
-
-
Zhu, Y.1
Fathi, A.2
Fei-Fei, L.3
|