-
1
-
-
84973890960
-
Vqa: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, 2015
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
3
-
-
0041876117
-
Matching words and pictures
-
K. Barnard, P. Duygulu, D. Forsyth, N. d. Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003
-
(2003)
JMLR
-
-
Barnard, K.1
Duygulu, P.2
Forsyth, D.3
Freitas, D.N.4
Blei, D.M.5
Jordan, M.I.6
-
4
-
-
85107362379
-
Nltk: The natural language toolkit
-
S. Bird. Nltk: The natural language toolkit. In ACL, 2006
-
(2006)
ACL
-
-
Bird, S.1
-
6
-
-
84887394346
-
Understanding indoor scenes using 3d geometric phrases
-
W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese. Understanding indoor scenes using 3d geometric phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 33-40, 2013
-
(2013)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 33-40
-
-
Choi, W.1
Chao, Y.-W.2
Pantofaru, C.3
Savarese, S.4
-
8
-
-
85041892861
-
Detecting visual relationships with deep relational networks
-
B. Dai, Y. Zhang, and D. Lin. Detecting visual relationships with deep relational networks. CVPR, 2017
-
(2017)
CVPR
-
-
Dai, B.1
Zhang, Y.2
Lin, D.3
-
9
-
-
84877748784
-
Detecting actions, poses, and objects with relational phraselets
-
C. Desai and D. Ramanan. Detecting actions, poses, and objects with relational phraselets. In ECCV, 2012
-
(2012)
ECCV
-
-
Desai, C.1
Ramanan, D.2
-
10
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
11
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
12
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
13
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
14
-
-
85029359197
-
Fast r-cnn
-
R. Girshick. Fast r-cnn. In ICCV, 2015
-
(2015)
ICCV
-
-
Girshick, R.1
-
15
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
16
-
-
70450155469
-
Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
-
A. Gupta and L. S. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV, 2008
-
(2008)
ECCV
-
-
Gupta, A.1
Davis, L.S.2
-
18
-
-
84959229874
-
Spatial pyramid pooling in deep convolutional networks for visual recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In CVPR, 2014
-
(2014)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
19
-
-
84856653718
-
Learning cross-modality similarity for multinomial data
-
Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, 2011
-
(2011)
ICCV
-
-
Jia, Y.1
Salzmann, M.2
Darrell, T.3
-
21
-
-
80053435765
-
Learning with whom to share in multi-task feature learning
-
Z. Kang, K. Grauman, and F. Sha. Learning with whom to share in multi-task feature learning. In ICML, 2011
-
(2011)
ICML
-
-
Kang, Z.1
Grauman, K.2
Sha, F.3
-
22
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
23
-
-
84978730111
-
-
arXiv preprint arXiv:1602.07332
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. ArXiv preprint arXiv:1602.07332, 2016
-
(2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
24
-
-
84887601544
-
Babytalk: Understanding and generating simple image descriptions
-
G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. TPAMI, 2013
-
(2013)
TPAMI
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
25
-
-
77955997860
-
Efficiently selecting regions for scene understanding
-
M. P. Kumar and D. Koller. Efficiently selecting regions for scene understanding. In CVPR, 2010
-
(2010)
CVPR
-
-
Kumar, M.P.1
Koller, D.2
-
26
-
-
84907331257
-
Generalizing image captions for image-text parallel corpus
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Generalizing image captions for image-text parallel corpus. In ACL, 2013
-
(2013)
ACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
27
-
-
85041893972
-
Person search with natural language description
-
S. Li, T. Xiao, H. Li, B. Zhou, D. Yue, and X. Wang. Person search with natural language description. In CVPR, 2017
-
(2017)
CVPR
-
-
Li, S.1
Xiao, T.2
Li, H.3
Zhou, B.4
Yue, D.5
Wang, X.6
-
28
-
-
85041906062
-
Vip-cnn: Visual phrase guided convolutional neural network
-
Y. Li, W. Ouyang, X. Wang, and X. Tang. Vip-cnn: Visual phrase guided convolutional neural network. CVPR, 2017
-
(2017)
CVPR
-
-
Li, Y.1
Ouyang, W.2
Wang, X.3
Tang, X.4
-
29
-
-
85041926899
-
Deep variation-structured reinforcement learning for visual relationship and attribute detection
-
X. Liang, L. Lee, and E. P. Xing. Deep variation-structured reinforcement learning for visual relationship and attribute detection. CVPR, 2017
-
(2017)
CVPR
-
-
Liang, X.1
Lee, L.2
Xing, E.P.3
-
30
-
-
85011035819
-
-
arXiv preprint arXiv:1512.02325
-
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. Ssd: Single shot multibox detector. ArXiv preprint arXiv:1512.02325, 2015
-
(2015)
Ssd: Single Shot Multibox Detector
-
-
Liu, W.1
Anguelov, D.2
Erhan, D.3
Szegedy, C.4
Reed, S.5
-
32
-
-
85030238250
-
-
arXiv preprint arXiv:1611.06641
-
B. A. Plummer, A. Mallya, C. M. Cervantes, J. Hockenmaier, and S. Lazebnik. Phrase localization and visual relationship detection with comprehensive linguistic cues. ArXiv preprint arXiv:1611.06641, 2016
-
(2016)
Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues
-
-
Plummer, B.A.1
Mallya, A.2
Cervantes, C.M.3
Hockenmaier, J.4
Lazebnik, S.5
-
33
-
-
84961917629
-
-
arXiv preprint arXiv:1506.02640
-
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. ArXiv preprint arXiv:1506.02640, 2015
-
(2015)
You only Look Once: Unified, Real-time Object Detection
-
-
Redmon, J.1
Divvala, S.2
Girshick, R.3
Farhadi, A.4
-
34
-
-
84960980241
-
Faster r-cnn: Towards real-time object detection with region proposal networks
-
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015
-
(2015)
NIPS
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
35
-
-
33845596932
-
Using multiple segmentations to discover objects and their extent in image collections
-
B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR, 2006
-
(2006)
CVPR
-
-
Russell, B.C.1
Freeman, W.T.2
Efros, A.A.3
Sivic, J.4
Zisserman, A.5
-
36
-
-
80052889458
-
Recognition using visual phrases
-
M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011
-
(2011)
CVPR
-
-
Sadeghi, M.A.1
Farhadi, A.2
-
39
-
-
77955998009
-
Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
-
R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010
-
(2010)
CVPR
-
-
Socher, R.1
Fei-Fei, L.2
-
41
-
-
84939821074
-
-
arXiv preprint arXiv:1502.03044
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ArXiv preprint arXiv:1502.03044, 2015
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
43
-
-
85041906381
-
Multi-level attention networks for visual question answering
-
D. Yu, J. Fu, T. Mei, and Y. Rui. Multi-level attention networks for visual question answering. In CVPR, 2017
-
(2017)
CVPR
-
-
Yu, D.1
Fu, J.2
Mei, T.3
Rui, Y.4
-
44
-
-
85029388674
-
Visual translation embedding network for visual relation detection
-
H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua. Visual translation embedding network for visual relation detection. In CVPR, 2017
-
(2017)
CVPR
-
-
Zhang, H.1
Kyaw, Z.2
Chang, S.-F.3
Chua, T.-S.4
-
45
-
-
85035223616
-
Ppr-fcn: Weakly supervised visual relation detection via parallel pairwise rfcn
-
H. Zhang, Z. Kyaw, J. Yu, and S.-F. Chang. Ppr-fcn: Weakly supervised visual relation detection via parallel pairwise rfcn. In ICCV, 2017
-
(2017)
ICCV
-
-
Zhang, H.1
Kyaw, Z.2
Yu, J.3
Chang, S.-F.4
-
47
-
-
85162027638
-
Probabilistic multi-task feature selection
-
Y. Zhang, D.-Y. Yeung, and Q. Xu. Probabilistic multi-task feature selection. In NIPS, 2010
-
(2010)
NIPS
-
-
Zhang, Y.1
Yeung, D.-Y.2
Xu, Q.3
-
48
-
-
85009935878
-
Facial landmark detection by deep multi-task learning
-
Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In ECCV, 2014
-
(2014)
ECCV
-
-
Zhang, Z.1
Luo, P.2
Loy, C.C.3
Tang, X.4
-
49
-
-
84986301525
-
-
arXiv preprint arXiv:1512.02167
-
B. Zhou, Y. Tian, S. Sukhbaatar, A. Szlam, and R. Fergus. Simple baseline for visual question answering. ArXiv preprint arXiv:1512.02167, 2015
-
(2015)
Simple Baseline for Visual Question Answering
-
-
Zhou, B.1
Tian, Y.2
Sukhbaatar, S.3
Szlam, A.4
Fergus, R.5
-
50
-
-
85029782049
-
Towards contextaware interaction recognition
-
B. Zhuang, L. Liu, C. Shen, and I. Reid. Towards contextaware interaction recognition. ICCV, 2017
-
(2017)
ICCV
-
-
Zhuang, B.1
Liu, L.2
Shen, C.3
Reid, I.4
-
51
-
-
85041920577
-
-
arXiv preprint arXiv:1705.09892
-
B. Zhuang, Q. Wu, C. Shen, I. Reid, and A. v. d. Hengel. Care about you: Towards large-scale human-centric visual relationship detection. ArXiv preprint arXiv:1705.09892, 2017.
-
(2017)
Care about You: Towards Large-scale Human-centric Visual Relationship Detection
-
-
Zhuang, B.1
Wu, Q.2
Shen, C.3
Reid, I.4
Hengel, A.V.D.5
|