-
1
-
-
85044309526
-
-
Mexaction2. http://mexculture.cnam.fr/xwiki/bin/view/Datasets/Mex+action+dataset, 2015.
-
(2015)
-
-
-
4
-
-
85038956512
-
Segnet: A deep convolutional encoder-decoder architecture for image segmentation
-
V. Badrinarayanan, A. Kendall, and R. Cipolla. Segnet: A deep convolutional encoder-decoder architecture for image segmentation. TPAMI, 2016.
-
(2016)
TPAMI
-
-
Badrinarayanan, V.1
Kendall, A.2
Cipolla, R.3
-
5
-
-
85083954148
-
Semantic image segmentation with deep con-volutional nets and fully connected crfs
-
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep con-volutional nets and fully connected crfs. In ICLR, 2015.
-
(2015)
ICLR
-
-
Chen, L.-C.1
Papandreou, G.2
Kokkinos, I.3
Murphy, K.4
Yuille, A.L.5
-
6
-
-
84990051868
-
-
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. 2016.
-
(2016)
Deeplab: Semantic Image Segmentation With Deep Convolutional Nets, Atrous Convolution, and Fully Connected Crfs
-
-
Chen, L.-C.1
Papandreou, G.2
Kokkinos, I.3
Murphy, K.4
Yuille, A.L.5
-
7
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
8
-
-
80054908266
-
Automatic annotation of human actions in video
-
O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce. Automatic annotation of human actions in video. In ICCV, 2007.
-
(2007)
ICCV
-
-
Duchenne, O.1
Laptev, I.2
Sivic, J.3
Bach, F.4
Ponce, J.5
-
10
-
-
84986266741
-
Convolutional two-stream network fusion for video action recognition
-
C. Feichtenhofer, A. Pinz, and A. Zisserman. Convolutional two-stream network fusion for video action recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
Feichtenhofer, C.1
Pinz, A.2
Zisserman, A.3
-
11
-
-
80052915321
-
Actom sequence models for efficient action detection
-
A. Gaidon, Z. Harchaoui, and C. Schmid. Actom sequence models for efficient action detection. In CVPR, 2011.
-
(2011)
CVPR
-
-
Gaidon, A.1
Harchaoui, Z.2
Schmid, C.3
-
12
-
-
84973872525
-
Temporal localization of actions with actoms
-
A. Gaidon, Z. Harchaoui, and C. Schmid. Temporal localization of actions with actoms. In TPAMI, 2013.
-
(2013)
TPAMI
-
-
Gaidon, A.1
Harchaoui, Z.2
Schmid, C.3
-
13
-
-
84959230113
-
Devnet: A deep event network for multimedia event detection and evidence recounting
-
C. Gan, N. Wang, Y. Yang, D.-Y. Yeung, and A. G. Hauptmann. Devnet: A deep event network for multimedia event detection and evidence recounting. In CVPR, 2015.
-
(2015)
CVPR
-
-
Gan, C.1
Wang, N.2
Yang, Y.3
Yeung, D.-Y.4
Hauptmann, A.G.5
-
15
-
-
84961136088
-
-
A. Gorban, H. Idrees, Y.-G. Jiang, A. R. Zamir, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. http://www.thumos.info/, 2015.
-
(2015)
THUMOS Challenge: Action Recognition With a Large Number of Classes
-
-
Gorban, A.1
Idrees, H.2
Jiang, Y.-G.3
Zamir, A.R.4
Laptev, I.5
Shah, M.6
Sukthankar, R.7
-
16
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
17
-
-
84959216468
-
Activitynet: A large-scale video benchmark for human activity understanding
-
F. C. Heilbron, V. Escorcia, B. Ghanem, and J. C. Niebles. Activitynet: A large-scale video benchmark for human activity understanding. In CVPR, 2015.
-
(2015)
CVPR
-
-
Heilbron, F.C.1
Escorcia, V.2
Ghanem, B.3
Niebles, J.C.4
-
18
-
-
84986275821
-
Fast temporal activity proposals for efficient detection of human actions in untrimmed videos
-
F. C. Heilbron, J. C. Niebles, and B. Ghanem. Fast temporal activity proposals for efficient detection of human actions in untrimmed videos. In CVPR, 2016.
-
(2016)
CVPR
-
-
Heilbron, F.C.1
Niebles, J.C.2
Ghanem, B.3
-
19
-
-
84965099276
-
Decoupled deep neural network for semi-supervised semantic segmentation
-
S. Hong, H. Noh, and B. Han. Decoupled deep neural network for semi-supervised semantic segmentation. In NIPS, 2015.
-
(2015)
NIPS
-
-
Hong, S.1
Noh, H.2
Han, B.3
-
20
-
-
84911453664
-
Action localization with tubelets from motion
-
M. Jain, J. van Gemert, H. Jégou, P. Bouthemy, and C. Snoek. Action localization with tubelets from motion. In CVPR, 2014.
-
(2014)
CVPR
-
-
Jain, M.1
Van Gemert, J.2
Jégou, H.3
Bouthemy, P.4
Snoek, C.5
-
21
-
-
84973868024
-
Objects2action: Classifying and localizing actions without any video example
-
M. Jain, J. van Gemert, T. Mensink, and C. Snoek. Objects2action: Classifying and localizing actions without any video example. In ICCV, 2015.
-
(2015)
ICCV
-
-
Jain, M.1
Van Gemert, J.2
Mensink, T.3
Snoek, C.4
-
22
-
-
84959235126
-
What do 15, 000 object categories tell us about classifying and localizing actions?
-
M. Jain, J. van Gemert, and C. Snoek. What do 15, 000 object categories tell us about classifying and localizing actions? In CVPR, 2015.
-
(2015)
CVPR
-
-
Jain, M.1
Van Gemert, J.2
Snoek, C.3
-
23
-
-
77956004473
-
Aggregating local descriptors into a compact image representation
-
H. Jégou, M. Douze, C. Schmid, and P. Pérez. Aggregating local descriptors into a compact image representation. In CVPR, 2010.
-
(2010)
CVPR
-
-
Jégou, H.1
Douze, M.2
Schmid, C.3
Pérez, P.4
-
24
-
-
85009867858
-
Caffe: Convolutional architecture for fast feature embedding
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Gir-shick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM MM, 2014.
-
(2014)
ACM MM
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Gir-Shick, R.6
Guadarrama, S.7
Darrell, T.8
-
25
-
-
84905052261
-
-
Y.-G. Jiang, J. Liu, A. R. Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/, 2014.
-
(2014)
THUMOS Challenge: Action Recognition With a Large Number of Classes
-
-
Jiang, Y.-G.1
Liu, J.2
Zamir, A.R.3
Toderici, G.4
Laptev, I.5
Shah, M.6
Sukthankar, R.7
-
27
-
-
84911364368
-
Large-scale video classification with convo-lutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convo-lutional neural networks. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
29
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
30
-
-
50649122769
-
Retrieving actions in movies
-
I. Laptev and P. Pérez. Retrieving actions in movies. In ICCV, 2007.
-
(2007)
ICCV
-
-
Laptev, I.1
Pérez, P.2
-
31
-
-
85044291629
-
Segmental spatiotemporal cnns for fine-grained action segmentation
-
C. Lea, A. Reiter, R. Vidal, and G. D. Hager. Segmental spatiotemporal cnns for fine-grained action segmentation. In ECCV, 2016.
-
(2016)
ECCV
-
-
Lea, C.1
Reiter, A.2
Vidal, R.3
Hager, G.D.4
-
32
-
-
84986261676
-
Efficient piecewise training of deep structured models for semantic segmentation
-
G. Lin, C. Shen, A. van den Hengel, and I. Reid. Efficient piecewise training of deep structured models for semantic segmentation. In CVPR, 2016.
-
(2016)
CVPR
-
-
Lin, G.1
Shen, C.2
Van Den Hengel, A.3
Reid, I.4
-
33
-
-
84986256919
-
Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation
-
S. Liu, X. Qi, J. Shi, H. Zhang, and J. Jia. Multi-scale patch aggregation (mpa) for simultaneous detection and segmentation. In CVPR, 2016.
-
(2016)
CVPR
-
-
Liu, S.1
Qi, X.2
Shi, J.3
Zhang, H.4
Jia, J.5
-
34
-
-
84959205572
-
Fully convolutional networks for semantic segmentation
-
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Long, J.1
Shelhamer, E.2
Darrell, T.3
-
35
-
-
84866710901
-
A database for fine grained activity detection of cooking activities
-
M. A. M. Rohrbach, S. Amin and B. Schiele. A database for fine grained activity detection of cooking activities. In CVPR, 2012.
-
(2012)
CVPR
-
-
Rohrbach, M.A.M.1
Amin, S.2
Schiele, B.3
-
36
-
-
84994583262
-
Spot on: Action localization from pointly-supervised proposals
-
P. Mettes, J. van Gemert, and C. Snoek. Spot on: Action localization from pointly-supervised proposals. In ECCV, 2016.
-
(2016)
ECCV
-
-
Mettes, P.1
Van Gemert, J.2
Snoek, C.3
-
37
-
-
84973879016
-
Learning deconvolution network for semantic segmentation
-
H. Noh, S. Hong, and B. Han. Learning deconvolution network for semantic segmentation. In ICCV, 2015.
-
(2015)
ICCV
-
-
Noh, H.1
Hong, S.2
Han, B.3
-
38
-
-
84898791167
-
Action and event recognition with fisher vectors on a compact feature set
-
D. Oneata, J. Verbeek, and C. Schmid. Action and event recognition with fisher vectors on a compact feature set. In ICCV, 2013.
-
(2013)
ICCV
-
-
Oneata, D.1
Verbeek, J.2
Schmid, C.3
-
40
-
-
79959771606
-
Improving the fisher kernel for large-scale image classification
-
F. Perronnin, J. Sánchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In ECCV, 2010.
-
(2010)
ECCV
-
-
Perronnin, F.1
Sánchez, J.2
Mensink, T.3
-
41
-
-
77949275097
-
A survey on vision-based human action recognition
-
R. Poppe. A survey on vision-based human action recognition. In Image and vision computing, 2010.
-
(2010)
Image and Vision Computing
-
-
Poppe, R.1
-
42
-
-
84973879045
-
Un-supervised tube extraction using transductive learning and dense trajectories
-
M. M. Puscas, E. Sangineto, D. Culibrk, and N. Sebe. Un-supervised tube extraction using transductive learning and dense trajectories. In ICCV, 2015.
-
(2015)
ICCV
-
-
Puscas, M.M.1
Sangineto, E.2
Culibrk, D.3
Sebe, N.4
-
43
-
-
84986270053
-
Temporal action detection using a statistical language model
-
A. Richard and J. Gall. Temporal action detection using a statistical language model. In CVPR, 2016.
-
(2016)
CVPR
-
-
Richard, A.1
Gall, J.2
-
44
-
-
84947041871
-
ImageNet large scale visual recognition challenge
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
-
(2015)
IJCV
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
45
-
-
85011076500
-
Fully convolutional networks for semantic segmentation
-
E. Shelhamer, J. Long, and T. Darrell. Fully convolutional networks for semantic segmentation. TPAMI, 2016.
-
(2016)
TPAMI
-
-
Shelhamer, E.1
Long, J.2
Darrell, T.3
-
46
-
-
85044270610
-
-
Z. Shou, J. Chan, A. Zareian, K. Miyazawa, and S.-F. Chang. Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. arXiv preprint arXiv:1703.01515, 2017.
-
(2017)
CDC: Convolutional-de-convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
-
-
Shou, Z.1
Chan, J.2
Zareian, A.3
Miyazawa, K.4
Chang, S.-F.5
-
47
-
-
84986268774
-
Temporal action localization in untrimmed videos via multi-stage cnns
-
Z. Shou, D. Wang, and S.-F. Chang. Temporal action localization in untrimmed videos via multi-stage cnns. In CVPR, 2016.
-
(2016)
CVPR
-
-
Shou, Z.1
Wang, D.2
Chang, S.-F.3
-
48
-
-
85167598864
-
Much ado about time: Exhaustive annotation of temporal data
-
G. A. Sigurdsson, O. Russakovsky, A. Farhadi, I. Laptev, and A. Gupta. Much ado about time: Exhaustive annotation of temporal data. In HCOMP, 2016.
-
(2016)
HCOMP
-
-
Sigurdsson, G.A.1
Russakovsky, O.2
Farhadi, A.3
Laptev, I.4
Gupta, A.5
-
49
-
-
85041903747
-
Hollywood in homes: Crowdsourcing data collection for activity understanding
-
G. A. Sigurdsson, G. Varol, X. Wang, A. Farhadi, I. Laptev, and A. Gupta. Hollywood in homes: Crowdsourcing data collection for activity understanding. In ECCV, 2016.
-
(2016)
ECCV
-
-
Sigurdsson, G.A.1
Varol, G.2
Wang, X.3
Farhadi, A.4
Laptev, I.5
Gupta, A.6
-
50
-
-
84937862424
-
Two-stream convolutional networks for action recognition in videos
-
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
-
(2014)
NIPS
-
-
Simonyan, K.1
Zisserman, A.2
-
52
-
-
84986328004
-
A multi-stream bi-directional recurrent neural network for finegrained action detection
-
B. Singh, T. K. Marks, M. Jones, O. Tuzel, and M. Shao. A multi-stream bi-directional recurrent neural network for finegrained action detection. In CVPR, 2016.
-
(2016)
CVPR
-
-
Singh, B.1
Marks, T.K.2
Jones, M.3
Tuzel, O.4
Shao, M.5
-
53
-
-
85044257995
-
Untrimmed classification for activity detection: Submission to activitynet challenge
-
G. Singh and F. Cuzzolin. Untrimmed classification for activity detection: submission to activitynet challenge. In CVPR ActivityNet Workshop, 2016.
-
(2016)
CVPR ActivityNet Workshop
-
-
Singh, G.1
Cuzzolin, F.2
-
54
-
-
84973931629
-
Action localization in videos through context walk
-
K. Soomro, H. Idrees, and M. Shah. Action localization in videos through context walk. In ICCV, 2015.
-
(2015)
ICCV
-
-
Soomro, K.1
Idrees, H.2
Shah, M.3
-
55
-
-
84986246311
-
Predicting the where and what of actors and actions through online action localization
-
K. Soomro, H. Idrees, and M. Shah. Predicting the where and what of actors and actions through online action localization. In CVPR, 2016.
-
(2016)
CVPR
-
-
Soomro, K.1
Idrees, H.2
Shah, M.3
-
58
-
-
84986265065
-
What if we do not have multiple videos of the same action? - Video action localization using web images
-
W. Sultani and M. Shah. What if we do not have multiple videos of the same action? - video action localization using web images. In CVPR, 2016.
-
(2016)
CVPR
-
-
Sultani, W.1
Shah, M.2
-
59
-
-
84986290264
-
Temporal localization of fine-grained actions in videos by domain transfer from web images
-
C. Sun, S. Shetty, R. Sukthankar, and R. Nevatia. Temporal localization of fine-grained actions in videos by domain transfer from web images. In ACM MM, 2015.
-
(2015)
ACM MM
-
-
Sun, C.1
Shetty, S.2
Sukthankar, R.3
Nevatia, R.4
-
60
-
-
84973865953
-
Learning spatiotemporal features with 3d convolutional networks
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
-
(2015)
ICCV
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
61
-
-
85010192577
-
Deep end2end voxel2voxel prediction
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Deep end2end voxel2voxel prediction. In CVPR Workshop on Deep Learning in Computer Vision, 2016.
-
(2016)
CVPR Workshop on Deep Learning in Computer Vision
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
62
-
-
84973913561
-
Apt: Action localization proposals from dense trajectories
-
J. van Gemert, M. Jain, E. Gati, and C. Snoek. Apt: Action localization proposals from dense trajectories. In BMVC, 2015.
-
(2015)
BMVC
-
-
Van Gemert, J.1
Jain, M.2
Gati, E.3
Snoek, C.4
-
64
-
-
84898805910
-
Action recognition with improved trajectories
-
H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
-
(2013)
ICCV
-
-
Wang, H.1
Schmid, C.2
-
65
-
-
84986274451
-
Action recognition and detection by combining motion and appearance features
-
L. Wang, Y. Qiao, and X. Tang. Action recognition and detection by combining motion and appearance features. In ECCV THUMOS Workshop, 2014.
-
(2014)
ECCV THUMOS Workshop
-
-
Wang, L.1
Qiao, Y.2
Tang, X.3
-
66
-
-
85019099168
-
Temporal segment networks: Towards good practices for deep action recognition
-
L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, and L. V. Gool. Temporal segment networks: Towards good practices for deep action recognition. In ECCV, 2016.
-
(2016)
ECCV
-
-
Wang, L.1
Xiong, Y.2
Wang, Z.3
Qiao, Y.4
Lin, D.5
Tang, X.6
Gool, L.V.7
-
69
-
-
84973931775
-
Learning to track for spatio-temporal action localization
-
P. Weinzaepfel, Z. Harchaoui, and C. Schmid. Learning to track for spatio-temporal action localization. In ICCV, 2015.
-
(2015)
ICCV
-
-
Weinzaepfel, P.1
Harchaoui, Z.2
Schmid, C.3
-
70
-
-
84986313829
-
Actor-action semantic segmentation with grouping process models
-
C. Xu and J. J. Corso. Actor-action semantic segmentation with grouping process models. In CVPR, 2016.
-
(2016)
CVPR
-
-
Xu, C.1
Corso, J.J.2
-
71
-
-
84959226659
-
A discriminative cnn video representation for event detection
-
Z. Xu, Y. Yang, and A. G. Hauptmann. A discriminative cnn video representation for event detection. In CVPR, 2015.
-
(2015)
CVPR
-
-
Xu, Z.1
Yang, Y.2
Hauptmann, A.G.3
-
72
-
-
84986240394
-
-
S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and L. Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos. arXiv preprint arXiv:1507.05738, 2015.
-
(2015)
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
-
-
Yeung, S.1
Russakovsky, O.2
Jin, N.3
Andriluka, M.4
Mori, G.5
Fei-Fei, L.6
-
73
-
-
84986253505
-
End-to-end learning of action detection from frame glimpses in videos
-
S. Yeung, O. Russakovsky, G. Mori, and L. Fei-Fei. End-to-end learning of action detection from frame glimpses in videos. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yeung, S.1
Russakovsky, O.2
Mori, G.3
Fei-Fei, L.4
-
74
-
-
85083952059
-
Multi-scale context aggregation by dilated convolutions
-
F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. In ICLR, 2016.
-
(2016)
ICLR
-
-
Yu, F.1
Koltun, V.2
-
75
-
-
84959191147
-
Fast action proposals for human action detection and search
-
G. Yu and J. Yuan. Fast action proposals for human action detection and search. In CVPR, 2015.
-
(2015)
CVPR
-
-
Yu, G.1
Yuan, J.2
-
76
-
-
84986267340
-
Temporal action localization with pyramid of score distribution features
-
J. Yuan, B. Ni, X. Yang, and A. Kassim. Temporal action localization with pyramid of score distribution features. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yuan, J.1
Ni, B.2
Yang, X.3
Kassim, A.4
-
77
-
-
84921476116
-
Visualizing and understanding con-volutional networks
-
M. Zeiler and R. Fergus. Visualizing and understanding con-volutional networks. In ECCV, 2014.
-
(2014)
ECCV
-
-
Zeiler, M.1
Fergus, R.2
-
79
-
-
84973861983
-
Conditional random fields as recurrent neural networks
-
S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. S. Torr. Conditional random fields as recurrent neural networks. In ICCV, 2015.
-
(2015)
ICCV
-
-
Zheng, S.1
Jayasumana, S.2
Romera-Paredes, B.3
Vineet, V.4
Su, Z.5
Du, D.6
Huang, C.7
Torr, P.H.S.8
|