-
2
-
-
84973890960
-
Vqa: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
3
-
-
85040312797
-
-
Y. Atzmon, J. Berant, V. Kezami, A. Globerson, and G. Chechik. Learning to generalize to new compositions in image understanding. arXiv preprint arXiv:1608.07639, 2016.
-
(2016)
Learning to Generalize to New Compositions in Image Understanding
-
-
Atzmon, Y.1
Berant, J.2
Kezami, V.3
Globerson, A.4
Chechik, G.5
-
4
-
-
0041876117
-
Matching words and pictures
-
K. Barnard, P. Duygulu, D. Forsyth, N. d. Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003.
-
(2003)
JMLR
-
-
Barnard, K.1
Duygulu, P.2
Forsyth, D.3
Freitas, N.D.4
Blei, D.M.5
Jordan, M.I.6
-
7
-
-
85018891340
-
Crf-cnn: Modeling structured information in human pose estimation
-
X. Chu, W. Ouyang, X. Wang, et al. Crf-cnn: Modeling structured information in human pose estimation. In NIPS, 2016.
-
(2016)
NIPS
-
-
Chu, X.1
Ouyang, W.2
Wang, X.3
-
8
-
-
84877748784
-
Detecting actions, poses, and objects with relational phraselets
-
C. Desai and D. Ramanan. Detecting actions, poses, and objects with relational phraselets. In ECCV, 2012.
-
(2012)
ECCV
-
-
Desai, C.1
Ramanan, D.2
-
9
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
10
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
11
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
12
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
13
-
-
85029359197
-
Fast r-cnn
-
R. Girshick. Fast r-cnn. In ICCV, 2015.
-
(2015)
ICCV
-
-
Girshick, R.1
-
14
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
15
-
-
84959195179
-
Deformable part models are convolutional neural networks
-
R. Girshick, F. Iandola, T. Darrell, and J. Malik. Deformable part models are convolutional neural networks. In CVPR, 2015.
-
(2015)
CVPR
-
-
Girshick, R.1
Iandola, F.2
Darrell, T.3
Malik, J.4
-
16
-
-
70450155469
-
Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
-
A. Gupta and L. S. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV, 2008.
-
(2008)
ECCV
-
-
Gupta, A.1
Davis, L.S.2
-
17
-
-
84959229874
-
Spatial pyramid pooling in deep convolutional networks for visual recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In CVPR, 2014.
-
(2014)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
19
-
-
84856653718
-
Learning cross-modality similarity for multinomial data
-
Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, 2011.
-
(2011)
ICCV
-
-
Jia, Y.1
Salzmann, M.2
Darrell, T.3
-
20
-
-
85009867858
-
Caffe: Convolutional architecture for fast feature embedding
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM MM, 2014.
-
(2014)
ACM MM
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
22
-
-
85041925966
-
Object detection in videos with tubelet proposal networks
-
K. Kang, H. Li, T. Xiao, W. Ouyang, J. Yan, X. Liu, and X. Wang. Object detection in videos with tubelet proposal networks. In CVPR, 2017.
-
(2017)
CVPR
-
-
Kang, K.1
Li, H.2
Xiao, T.3
Ouyang, W.4
Yan, J.5
Liu, X.6
Wang, X.7
-
23
-
-
84986301354
-
-
K. Kang, H. Li, J. Yan, X. Zeng, B. Yang, T. Xiao, C. Zhang, Z. Wang, R. Wang, X. Wang, et al. T-cnn: Tubelets with convolutional neural networks for object detection from videos. arXiv preprint arXiv:1604.02532, 2016.
-
(2016)
T-cnn: Tubelets with Convolutional Neural Networks for Object Detection from Videos
-
-
Kang, K.1
Li, H.2
Yan, J.3
Zeng, X.4
Yang, B.5
Xiao, T.6
Zhang, C.7
Wang, Z.8
Wang, R.9
Wang, X.10
-
24
-
-
84986331475
-
Object detection from video tubelets with convolutional neural networks
-
K. Kang, W. Ouyang, H. Li, and X. Wang. Object detection from video tubelets with convolutional neural networks. In CVPR, 2016.
-
(2016)
CVPR
-
-
Kang, K.1
Ouyang, W.2
Li, H.3
Wang, X.4
-
25
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
26
-
-
85162351107
-
Efficient inference in fully connected crfs with Gaussian edge potentials
-
V. Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. NIPS, 2011.
-
(2011)
NIPS
-
-
Koltun, V.1
-
27
-
-
84978730111
-
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332, 2016.
-
(2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
28
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097-1105, 2012.
-
(2012)
NIPS
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
29
-
-
84887601544
-
Babytalk: Understanding and generating simple image descriptions
-
G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. TPAMI, 2013.
-
(2013)
TPAMI
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
30
-
-
77955997860
-
Efficiently selecting regions for scene understanding
-
M. P. Kumar and D. Koller. Efficiently selecting regions for scene understanding. In CVPR, 2010.
-
(2010)
CVPR
-
-
Kumar, M.P.1
Koller, D.2
-
31
-
-
84907331257
-
Generalizing image captions for image-text parallel corpus
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Generalizing image captions for image-text parallel corpus. In ACL, 2013.
-
(2013)
ACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
32
-
-
85011302702
-
-
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. Ssd: Single shot multibox detector. arXiv preprint arXiv:1512.02325, 2015.
-
(2015)
Ssd: Single Shot Multibox Detector
-
-
Liu, W.1
Anguelov, D.2
Erhan, D.3
Szegedy, C.4
Reed, S.5
-
35
-
-
84948382785
-
Deepid-net: Deformable deep convolutional neural networks for object detection
-
W. Ouyang, X. Wang, X. Zeng, S. Qiu, P. Luo, Y. Tian, H. Li, S. Yang, Z. Wang, C.-C. Loy, et al. Deepid-net: Deformable deep convolutional neural networks for object detection. In CVPR, 2015.
-
(2015)
CVPR
-
-
Ouyang, W.1
Wang, X.2
Zeng, X.3
Qiu, S.4
Luo, P.5
Tian, Y.6
Li, H.7
Yang, S.8
Wang, Z.9
Loy, C.-C.10
-
36
-
-
84961917629
-
-
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640, 2015.
-
(2015)
You Only Look Once: Unified, Real-time Object Detection
-
-
Redmon, J.1
Divvala, S.2
Girshick, R.3
Farhadi, A.4
-
37
-
-
84960980241
-
Faster r-cnn: Towards real-time object detection with region proposal networks
-
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015.
-
(2015)
NIPS
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
38
-
-
84986327251
-
-
A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele. Grounding of textual phrases in images by reconstruction. arXiv preprint arXiv:1511.03745, 2015.
-
(2015)
Grounding of Textual Phrases in Images by Reconstruction
-
-
Rohrbach, A.1
Rohrbach, M.2
Hu, R.3
Darrell, T.4
Schiele, B.5
-
39
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015.
-
(2015)
IJCV
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
-
40
-
-
33845596932
-
Using multiple segmentations to discover objects and their extent in image collections
-
B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR, 2006.
-
(2006)
CVPR
-
-
Russell, B.C.1
Freeman, W.T.2
Efros, A.A.3
Sivic, J.4
Zisserman, A.5
-
41
-
-
80052889458
-
Recognition using visual phrases
-
M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011.
-
(2011)
CVPR
-
-
Sadeghi, M.A.1
Farhadi, A.2
-
44
-
-
77955998009
-
Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
-
R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.
-
(2010)
CVPR
-
-
Socher, R.1
Fei-Fei, L.2
-
45
-
-
0000903748
-
Generalization of backpropagation with application to a recurrent gas market model
-
P. J. Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1988.
-
(1988)
Neural Networks
-
-
Werbos, P.J.1
-
46
-
-
84939821074
-
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
48
-
-
84998809208
-
-
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. arXiv preprint arXiv:1511.02274, 2015.
-
(2015)
Stacked Attention Networks for Image Question Answering
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
49
-
-
84995439884
-
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. arXiv preprint arXiv:1603.03925, 2016.
-
(2016)
Image Captioning with Semantic Attention
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
50
-
-
84973861983
-
Conditional random fields as recurrent neural networks
-
S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z. Su, D. Du, C. Huang, and P. H. Torr. Conditional random fields as recurrent neural networks. In ICCV, 2015.
-
(2015)
ICCV
-
-
Zheng, S.1
Jayasumana, S.2
Romera-Paredes, B.3
Vineet, V.4
Su, Z.5
Du, D.6
Huang, C.7
Torr, P.H.8
|