-
1
-
-
85198028989
-
Imagenet: A large-scale hierarchical image database
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
-
(2009)
CVPR
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
3
-
-
84958589374
-
-
arXiv, abs/1512.03385
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv, abs/1512.03385, 2015.
-
(2015)
Deep Residual Learning for Image Recognition
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
4
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
6
-
-
84994600590
-
-
abs/1605.03705
-
A. Rohrbach, A. Torabi, M. Rohrbach, N. Tandon, C. J. Pal, H. Larochelle, A. C. Courville, and B. Schiele. Movie description. arXiv, abs/1605.03705, 2016.
-
(2016)
Movie Description. ArXiv
-
-
Rohrbach, A.1
Torabi, A.2
Rohrbach, M.3
Tandon, N.4
Pal, C.J.5
Larochelle, H.6
Courville, A.C.7
Schiele, B.8
-
8
-
-
84977650097
-
Video captioning with recurrent networks based on frame-and video-level features and visual content classification
-
arXiv abs/1512.02949
-
R. Shetty and J. Laaksonen. Video captioning with recurrent networks based on frame-and video-level features and visual content classification. ICCV Workshop on LSMDC, arXiv abs/1512.02949, 2015.
-
(2015)
ICCV Workshop on LSMDC
-
-
Shetty, R.1
Laaksonen, J.2
-
9
-
-
84964983441
-
-
arXiv, abs/1409.4842
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv, abs/1409.4842, 2014.
-
(2014)
Going Deeper with Convolutions
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
10
-
-
84959246420
-
-
arXiv, abs/1503.01070
-
A. Torabi, P. Chris, L. Hugo, and C. Aaron. Using descriptive video services to create a large data source for video annotation research. arXiv, abs/1503.01070, 2015.
-
(2015)
Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
-
-
Torabi, A.1
Chris, P.2
Hugo, L.3
Aaron, C.4
-
11
-
-
84969504307
-
-
arXiv, abs/1412.0767
-
D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: generic features for video analysis. arXiv, abs/1412.0767, 2014.
-
(2014)
C3D: Generic Features for Video Analysis
-
-
Tran, D.1
Bourdev, L.D.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
12
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
June
-
R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, June 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Lawrence Zitnick, C.2
Parikh, D.3
-
13
-
-
84973882730
-
Sequence to sequence-video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In CVPR, 2015.
-
(2015)
CVPR
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
16
-
-
84898805910
-
Action recognition with improved trajectories
-
H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
-
(2013)
ICCV
-
-
Wang, H.1
Schmid, C.2
-
17
-
-
84986260127
-
MSR-VTT: A large video description dataset for bridging video and language
-
J. Xu, T. Mei, T. Yao, and Y. Rui. MSR-VTT: A large video description dataset for bridging video and language. In CVPR, 2016.
-
(2016)
CVPR
-
-
Xu, J.1
Mei, T.2
Yao, T.3
Rui, Y.4
|