-
1
-
-
84973890960
-
Vqa: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, pages 2425-2433, 2015.
-
(2015)
ICCV
, pp. 2425-2433
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
2
-
-
84952349295
-
-
arXiv preprint arXiv
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollár, P.6
Zitnick, C.L.7
-
4
-
-
84913526249
-
Pedestrian attribute recognition at far distance
-
Y. Deng, P. Luo, C. C. Loy, and X. Tang. Pedestrian attribute recognition at far distance. In ACM MM, pages 789-792, 2014.
-
(2014)
ACM MM
, pp. 789-792
-
-
Deng, Y.1
Luo, P.2
Loy, C.C.3
Tang, X.4
-
5
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015.
-
(2015)
CVPR
, pp. 1473-1482
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
6
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In NIPS, pages 2121-2129, 2013.
-
(2013)
NIPS
, pp. 2121-2129
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
7
-
-
84990060711
-
-
arXiv preprint arXiv
-
A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. arXiv preprint arXiv:1606.01847, 2016.
-
(2016)
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
, pp. 01847
-
-
Fukui, A.1
Park, D.H.2
Yang, D.3
Rohrbach, A.4
Darrell, T.5
Rohrbach, M.6
-
8
-
-
84965148420
-
Are you talking to a machine? Dataset and methods for multilingual image question
-
H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? dataset and methods for multilingual image question. In NIPS, pages 2296-2304, 2015.
-
(2015)
NIPS
, pp. 2296-2304
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
9
-
-
79951590356
-
Evaluating appearance models for recognition, reacquisition, and tracking
-
D. Gray, S. Brennan, and H. Tao. Evaluating appearance models for recognition, reacquisition, and tracking. In Proc. IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS), number 5, 2007.
-
(2007)
Proc IEEE International Workshop on Performance Evaluation for Tracking and Surveillance (PETS)
, Issue.5
-
-
Gray, D.1
Brennan, S.2
Tao, H.3
-
10
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770-778, 2016.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 770-778
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
12
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853-899, 2013.
-
(2013)
Journal of Artificial Intelligence Research
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
14
-
-
84986305787
-
Natural language object retrieval
-
R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell. Natural language object retrieval. CVPR, 2016.
-
(2016)
CVPR
-
-
Hu, R.1
Xu, H.2
Rohrbach, M.3
Feng, J.4
Saenko, K.5
Darrell, T.6
-
16
-
-
85041925966
-
Object detection in videos with tubelet proposal networks
-
K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang. Object detection in videos with tubelet proposal networks. In CVPR, 2017.
-
(2017)
CVPR
-
-
Kang, K.1
Li, H.2
Xiao, T.3
Ouyang, W.4
Yan, J.5
Liu, X.6
Wang, X.7
-
17
-
-
84986301354
-
-
arXiv preprint arXiv
-
K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z.Wang, R.Wang, X.Wang, et al. T-cnn: Tubelets with convolutional neural networks for object detection from videos. arXiv preprint arXiv:1604.02532, 2016.
-
(2016)
T-cnn: Tubelets with Convolutional Neural Networks for Object Detection from Videos
-
-
Kang, K.1
Li, H.2
Yan, J.3
Zeng, X.4
Yang, B.5
Xiao, T.6
Zhang, C.7
Wang, Z.8
Wang, R.9
Wang, X.10
-
18
-
-
84986331475
-
Object detection from video tubelets with convolutional neural networks
-
K. Kang, W. Ouyang, H. Li, and X. Wang. Object detection from video tubelets with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 817-825, 2016.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 817-825
-
-
Kang, K.1
Ouyang, W.2
Li, H.3
Wang, X.4
-
19
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, pages 3128-3137, 2015.
-
(2015)
CVPR
, pp. 3128-3137
-
-
Karpathy, A.1
Fei-Fei, L.2
-
20
-
-
84978730111
-
-
arXiv preprint arXiv
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332, 2016.
-
(2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
22
-
-
84875891971
-
Human reidentification with transferred metric learning
-
W. Li, R. Zhao, and X. Wang. Human reidentification with transferred metric learning. In ACCV, pages 31-44, 2012.
-
(2012)
ACCV
, pp. 31-44
-
-
Li, W.1
Zhao, R.2
Wang, X.3
-
23
-
-
84911383794
-
Deepreid: Deep filter pairing neural network for person re-identification
-
W. Li, R. Zhao, T. Xiao, and X. Wang. Deepreid: Deep filter pairing neural network for person re-identification. In CVPR, pages 152-159, 2014.
-
(2014)
CVPR
, pp. 152-159
-
-
Li, W.1
Zhao, R.2
Xiao, T.3
Wang, X.4
-
24
-
-
84955305813
-
Person re-identification by local maximal occurrence representation and metric learning
-
S. Liao, Y. Hu, X. Zhu, and S. Z. Li. Person re-identification by local maximal occurrence representation and metric learning. In CVPR, pages 2197-2206, 2015.
-
(2015)
CVPR
, pp. 2197-2206
-
-
Liao, S.1
Hu, Y.2
Zhu, X.3
Li, S.Z.4
-
25
-
-
84906493406
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, pages 740-755, 2014.
-
(2014)
ECCV
, pp. 740-755
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
26
-
-
84947599757
-
Multi-task deep visual-semantic embedding for video thumbnail selection
-
W. Liu, T. Mei, Y. Zhang, C. Che, and J. Luo. Multi-task deep visual-semantic embedding for video thumbnail selection. In CVPR, pages 3707-3715, 2015.
-
(2015)
CVPR
, pp. 3707-3715
-
-
Liu, W.1
Mei, T.2
Zhang, Y.3
Che, C.4
Luo, J.5
-
27
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, pages 1-9, 2015.
-
(2015)
ICCV
, pp. 1-9
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
28
-
-
84951072975
-
-
arXiv preprint arXiv
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv preprint arXiv:1412.6632, 2014.
-
(2014)
Deep Captioning with Multimodal Recurrent Neural Networks (M-rnn)
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
32
-
-
84965170394
-
Exploring models and data for image question answering
-
M. Ren, R. Kiros, and R. Zemel. Exploring models and data for image question answering. In NIPS, pages 2953-2961, 2015.
-
(2015)
NIPS
, pp. 2953-2961
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
33
-
-
84962816362
-
Image question answering: A visual semantic embedding model and a new dataset
-
M. Ren, R. Kiros, and R. Zemel. Image question answering: A visual semantic embedding model and a new dataset. CoRR, abs/1505.02074, 7, 2015.
-
(2015)
CoRR, abs/1505
, vol.2074
, Issue.7
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
35
-
-
85018367088
-
-
arXiv preprint arXiv
-
C. Su, S. Zhang, J. Xing, W. Gao, and Q. Tian. Deep attributes driven multi-camera person re-identification. arXiv preprint arXiv:1605.03259, 2016.
-
(2016)
Deep Attributes Driven Multi-camera Person Re-identification
-
-
Su, C.1
Zhang, S.2
Xing, J.3
Gao, W.4
Tian, Q.5
-
36
-
-
77951180261
-
Attribute-based people search in surveillance environments
-
D. A. Vaquero, R. S. Feris, D. Tran, L. Brown, A. Hampapur, and M. Turk. Attribute-based people search in surveillance environments. In WACV, pages 1-8, 2009.
-
(2009)
WACV
, pp. 1-8
-
-
Vaquero, D.A.1
Feris, R.S.2
Tran, D.3
Brown, L.4
Hampapur, A.5
Turk, M.6
-
37
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015.
-
(2015)
CVPR
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
38
-
-
85044321254
-
-
P. Welinder, S. Branson, T. Mita, C. Wah, F. Schroff, S. Belongie, and P. Perona. Caltech-ucsd birds 200. 2010.
-
(2010)
Caltech-ucsd Birds
, vol.200
-
-
Welinder, P.1
Branson, S.2
Mita, T.3
Wah, C.4
Schroff, F.5
Belongie, S.6
Perona, P.7
-
39
-
-
85041926497
-
-
arXiv preprint arXiv
-
T. Xiao, S. Li, B. Wang, L. Lin, and X. Wang. Endto-end deep learning for person search. arXiv preprint arXiv:1604.01850, 2016.
-
(2016)
Endto-end Deep Learning for Person Search
, pp. 01850
-
-
Xiao, T.1
Li, S.2
Wang, B.3
Lin, L.4
Wang, X.5
-
40
-
-
84939821074
-
-
arXiv preprint arXiv
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
41
-
-
84998809208
-
-
arXiv preprint arXiv
-
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. arXiv preprint arXiv:1511.02274, 2015.
-
(2015)
Stacked Attention Networks for Image Question Answering
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
42
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67-78, 2014.
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
43
-
-
84952062153
-
-
arXiv preprint arXiv
-
L. Zheng, L. Shen, L. Tian, S. Wang, J. Bu, and Q. Tian. Person re-identification meets image search. arXiv preprint arXiv:1502.02171, 2015.
-
(2015)
Person Re-identification Meets Image Search
-
-
Zheng, L.1
Shen, L.2
Tian, L.3
Wang, S.4
Bu, J.5
Tian, Q.6
-
44
-
-
80052906103
-
Person re-identification by probabilistic relative distance comparison
-
W.-S. Zheng, S. Gong, and T. Xiang. Person re-identification by probabilistic relative distance comparison. In CVPR, pages 649-656, 2011.
-
(2011)
CVPR
, pp. 649-656
-
-
Zheng, W.-S.1
Gong, S.2
Xiang, T.3
-
45
-
-
84986301525
-
-
arXiv preprint arXiv
-
B. Zhou, Y. Tian, S. Sukhbaatar, A. Szlam, and R. Fergus. Simple baseline for visual question answering. arXiv preprint arXiv:1512.02167, 2015.
-
(2015)
Simple Baseline for Visual Question Answering
-
-
Zhou, B.1
Tian, Y.2
Sukhbaatar, S.3
Szlam, A.4
Fergus, R.5
|