-
2
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
3
-
-
84962835109
-
-
CoRR, abs/1502.07209
-
Y.-G. Jiang, Z. Wu, J. Wang, X. Xue, and S.-F. Chang. Exploiting feature and class relationships in video categorization with regularized deep neural networks. CoRR, abs/1502.07209, 2015.
-
(2015)
Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks
-
-
Jiang, Y.-G.1
Wu, Z.2
Wang, J.3
Xue, X.4
Chang, S.-F.5
-
4
-
-
84994639031
-
Improving image captioning by concept-based sentence reranking
-
X. Li and Q. Jin. Improving image captioning by concept-based sentence reranking. In PCM, 2016.
-
(2016)
PCM
-
-
Li, X.1
Jin, Q.2
-
5
-
-
70350333307
-
Learning social tag relevance by neighbor voting
-
X. Li, C. Snoek, and M. Worring. Learning social tag relevance by neighbor voting. IEEE Trans. Multimedia, 11(7):1310-1322, 2009.
-
(2009)
IEEE Trans. Multimedia
, vol.11
, Issue.7
, pp. 1310-1322
-
-
Li, X.1
Snoek, C.2
Worring, M.3
-
6
-
-
84975263305
-
Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval
-
X. Li, T. Uricchio, L. Ballan, M. Bertini, C. Snoek, and A. D. Bimbo. Socializing the semantic gap: A comparative survey on image tag assignment, refinement, and retrieval. ACM Computing Surveys, 49(1):14:1-14:39, 2016.
-
(2016)
ACM Computing Surveys
, vol.49
, Issue.1
, pp. 1401-1439
-
-
Li, X.1
Uricchio, T.2
Ballan, L.3
Bertini, M.4
Snoek, C.5
Bimbo, A.D.6
-
7
-
-
84978696136
-
The ImageNet shuffle: Reorganized pre-training for video event detection
-
P. Mettes, D. Koelma, and C. Snoek. The ImageNet shuffle: Reorganized pre-training for video event detection. In ICMR, 2016.
-
(2016)
ICMR
-
-
Mettes, P.1
Koelma, D.2
Snoek, C.3
-
8
-
-
85083951332
-
Efficient estimation of word representations in vector space
-
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In ICLR, 2013.
-
(2013)
ICLR
-
-
Mikolov, T.1
Chen, K.2
Corrado, G.3
Dean, J.4
-
9
-
-
84986332702
-
Jointly modeling embedding and translation to bridge video and language
-
Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui. Jointly modeling embedding and translation to bridge video and language. In CVPR, 2016.
-
(2016)
CVPR
-
-
Pan, Y.1
Mei, T.2
Yao, T.3
Li, H.4
Rui, Y.5
-
10
-
-
84937522268
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
11
-
-
84973865953
-
Learning spatiotemporal features with 3d convolutional networks
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
-
(2015)
ICCV
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
12
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
13
-
-
84973882730
-
Sequence to sequence-video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, 2015.
-
(2015)
ICCV
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
15
-
-
84986260127
-
MSR-VTT: A large video description dataset for bridging video and language
-
J. Xu, T. Mei, T. Yao, and Y. Rui. MSR-VTT: A large video description dataset for bridging video and language. In CVPR, 2016.
-
(2016)
CVPR
-
-
Xu, J.1
Mei, T.2
Yao, T.3
Rui, Y.4
-
16
-
-
84986275061
-
Video paragraph captioning using hierarchical recurrent neural networks
-
H. Yu, J. Wang, Z. Huang, Y. Yang, and W. Xu. Video paragraph captioning using hierarchical recurrent neural networks. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yu, H.1
Wang, J.2
Huang, Z.3
Yang, Y.4
Xu, W.5
|