-
1
-
-
84985013144
-
Deep compositional question answering with neural module networks
-
J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Deep compositional question answering with neural module networks. In CVPR, 2016.
-
(2016)
CVPR
-
-
Andreas, J.1
Rohrbach, M.2
Darrell, T.3
Klein, D.4
-
2
-
-
84973890960
-
Vqa: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
3
-
-
85041922388
-
Learning to generalize to new compositions in image understanding
-
Y. Atzmon, J. Berant, V. Kezami, A. Globerson, and G. Chechik. Learning to generalize to new compositions in image understanding. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Atzmon, Y.1
Berant, J.2
Kezami, V.3
Globerson, A.4
Chechik, G.5
-
4
-
-
84960130911
-
Automatic description generation from images: A survey of models, datasets, and evaluation measures
-
R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler- Cinbis, F. Keller, A. Muscat, and B. Plank. Automatic description generation from images: A survey of models, datasets, and evaluation measures. JAIR, 2016.
-
(2016)
JAIR
-
-
Bernardi, R.1
Cakici, R.2
Elliott, D.3
Erdem, A.4
Erdem, E.5
Ikizler-Cinbis, N.6
Keller, F.7
Muscat, A.8
Plank, B.9
-
5
-
-
84899013802
-
Translating embeddings for modeling multirelational data
-
A. Bordes, N. Usunier, A. Garcia-Duran, J. Weston, and O. Yakhnenko. Translating embeddings for modeling multirelational data. In NIPS, 2013.
-
(2013)
NIPS
-
-
Bordes, A.1
Usunier, N.2
Garcia-Duran, A.3
Weston, J.4
Yakhnenko, O.5
-
6
-
-
0033778373
-
Representation of manipulable man-made objects in the dorsal stream
-
L. L. Chao and A. Martin. Representation of manipulable man-made objects in the dorsal stream. Neuroimage, 2000.
-
(2000)
Neuroimage
-
-
Chao, L.L.1
Martin, A.2
-
7
-
-
79953187637
-
Discriminative models for multi-class object layout
-
C. Desai, D. Ramanan, and C. C. Fowlkes. Discriminative models for multi-class object layout. IJCV, 2011.
-
(2011)
IJCV
-
-
Desai, C.1
Ramanan, D.2
Fowlkes, C.C.3
-
8
-
-
84943769848
-
Question answering over freebase with multi-column convolutional neural networks
-
L. Dong, F. Wei, M. Zhou, and K. Xu. Question answering over freebase with multi-column convolutional neural networks. In ACL, 2015.
-
(2015)
ACL
-
-
Dong, L.1
Wei, F.2
Zhou, M.3
Xu, K.4
-
9
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
11
-
-
85029359197
-
Fast r-cnn
-
R. Girshick. Fast r-cnn. In ICCV, 2015.
-
(2015)
ICCV
-
-
Girshick, R.1
-
12
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
13
-
-
84965100881
-
-
arXiv preprint arXiv: 1502.04623
-
K. Gregor, I. Danihelka, A. Graves, D. J. Rezende, and D. Wierstra. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015.
-
(2015)
Draw: A Recurrent Neural Network for Image Generation
-
-
Gregor, K.1
Danihelka, I.2
Graves, A.3
Rezende, D.J.4
Wierstra, D.5
-
14
-
-
70450155469
-
Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
-
A. Gupta and L. S. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV, 2008.
-
(2008)
ECCV
-
-
Gupta, A.1
Davis, L.S.2
-
15
-
-
69549121743
-
Observing human-object interactions: Using spatial and functional compatibility for recognition
-
A. Gupta, A. Kembhavi, and L. S. Davis. Observing human-object interactions: Using spatial and functional compatibility for recognition. TPAMI, 2009.
-
(2009)
TPAMI
-
-
Gupta, A.1
Kembhavi, A.2
Davis, L.S.3
-
17
-
-
85041926703
-
Revisiting visual question answering baselines
-
A. Jabri, A. Joulin, and L. van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016.
-
(2016)
ECCV
-
-
Jabri, A.1
Joulin, A.2
Van der Maaten, L.3
-
19
-
-
84986245786
-
Densecap: Fully convolutional localization networks for dense captioning
-
J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. In CVPR, 2016.
-
(2016)
CVPR
-
-
Johnson, J.1
Karpathy, A.2
Fei-Fei, L.3
-
20
-
-
84959233256
-
Image retrieval using scene graphs
-
J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei. Image retrieval using scene graphs. In CVPR, 2015.
-
(2015)
CVPR
-
-
Johnson, J.1
Krishna, R.2
Stark, M.3
Li, L.-J.4
Shamma, D.A.5
Bernstein, M.S.6
Fei-Fei, L.7
-
21
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
23
-
-
84990070438
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 2016.
-
(2016)
IJCV
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
24
-
-
85041926899
-
Deep variation-structured reinforcement learning for visual relationship and attribute detection
-
X. Liang, L. Lee, and E. P. Xing. Deep variation-structured reinforcement learning for visual relationship and attribute detection. In CVPR, 2017.
-
(2017)
CVPR
-
-
Liang, X.1
Lee, L.2
Xing, E.P.3
-
25
-
-
84952316342
-
Learning entity and relation embeddings for knowledge graph completion
-
Y. Lin, Z. Liu, M. Sun, Y. Liu, and X. Zhu. Learning entity and relation embeddings for knowledge graph completion. In AAAI, 2015.
-
(2015)
AAAI
-
-
Lin, Y.1
Liu, Z.2
Sun, M.3
Liu, Y.4
Zhu, X.5
-
26
-
-
85011302702
-
Ssd: Single shot multibox detector
-
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. Ssd: Single shot multibox detector. In ECCV, 2016.
-
(2016)
ECCV
-
-
Liu, W.1
Anguelov, D.2
Erhan, D.3
Szegedy, C.4
Reed, S.5
-
28
-
-
57249084011
-
Visualizing data using t-sne
-
L. v. d. Maaten and G. Hinton. Visualizing data using t-sne. JMLR, 2008.
-
(2008)
JMLR
-
-
Maaten, L.V.D.1
Hinton, G.2
-
29
-
-
85041909637
-
Learning models for actions and personobject interactions with transfer to question answering
-
A. Mallya and S. Lazebnik. Learning models for actions and personobject interactions with transfer to question answering. In ECCV, 2016.
-
(2016)
ECCV
-
-
Mallya, A.1
Lazebnik, S.2
-
30
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
-
(2013)
NIPS
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
32
-
-
84973856017
-
Flickr30k entities: Collecting regionto- phrase correspondences for richer image-to-sentence models
-
B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting regionto- phrase correspondences for richer image-to-sentence models. In ICCV, 2015.
-
(2015)
ICCV
-
-
Plummer, B.A.1
Wang, L.2
Cervantes, C.M.3
Caicedo, J.C.4
Hockenmaier, J.5
Lazebnik, S.6
-
33
-
-
84959233994
-
Learning semantic relationships for better action retrieval in images
-
V. Ramanathan, C. Li, J. Deng, W. Han, Z. Li, K. Gu, Y. Song, S. Bengio, C. Rossenberg, and L. Fei-Fei. Learning semantic relationships for better action retrieval in images. In CVPR, 2015.
-
(2015)
CVPR
-
-
Ramanathan, V.1
Li, C.2
Deng, J.3
Han, W.4
Li, Z.5
Gu, K.6
Song, Y.7
Bengio, S.8
Rossenberg, C.9
Fei-Fei, L.10
-
34
-
-
84986308404
-
You only look once: Unified, real-time object detection
-
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In CVPR, 2016.
-
(2016)
CVPR
-
-
Redmon, J.1
Divvala, S.2
Girshick, R.3
Farhadi, A.4
-
35
-
-
84960980241
-
Faster r-cnn: Towards realtime object detection with region proposal networks
-
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards realtime object detection with region proposal networks. In NIPS, 2015.
-
(2015)
NIPS
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
36
-
-
84959184467
-
Viske: Visual knowledge extraction and question answering by visual verification of relation phrases
-
F. Sadeghi, S. K. Divvala, and A. Farhadi. Viske: Visual knowledge extraction and question answering by visual verification of relation phrases. In CVPR, 2015.
-
(2015)
CVPR
-
-
Sadeghi, F.1
Divvala, S.K.2
Farhadi, A.3
-
37
-
-
80052889458
-
Recognition using visual phrases
-
M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011.
-
(2011)
CVPR
-
-
Sadeghi, M.A.1
Farhadi, A.2
-
39
-
-
80052896768
-
Efficient object category recognition using classemes
-
L. Torresani, M. Szummer, and A. Fitzgibbon. Efficient object category recognition using classemes. In ECCV, 2010.
-
(2010)
ECCV
-
-
Torresani, L.1
Szummer, M.2
Fitzgibbon, A.3
-
40
-
-
85044362471
-
Show and tell: Lessons learned from the 2015 mscoco image captioning challenge
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: Lessons learned from the 2015 mscoco image captioning challenge. TPAMI, 2016.
-
(2016)
TPAMI
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
41
-
-
85032356206
-
-
arXiv preprint arXiv: 1606.05433
-
P. Wang, Q. Wu, C. Shen, A. v. d. Hengel, and A. Dick. Fvqa: Factbased visual question answering. arXiv preprint arXiv:1606.05433, 2016.
-
(2016)
Fvqa: Factbased Visual Question Answering
-
-
Wang, P.1
Wu, Q.2
Shen, C.3
Hengel, A.V.D.4
Dick, A.5
-
42
-
-
77955988492
-
Modeling mutual context of object and human pose in human-object interaction activities
-
B. Yao and L. Fei-Fei. Modeling mutual context of object and human pose in human-object interaction activities. In CVPR, 2010.
-
(2010)
CVPR
-
-
Yao, B.1
Fei-Fei, L.2
-
43
-
-
85035206689
-
Learning from collective intelligence: Feature learning using social images and tags
-
H. Zhang, X. Shang, H. Luan, M. Wang, and T.-S. Chua. Learning from collective intelligence: Feature learning using social images and tags. TOMM, 2016.
-
(2016)
TOMM
-
-
Zhang, H.1
Shang, X.2
Luan, H.3
Wang, M.4
Chua, T.-S.5
-
44
-
-
84986325880
-
Online collaborative learning for open-vocabulary visual classifiers
-
H. Zhang, X. Shang, W. Yang, H. Xu, H. Luan, and T.-S. Chua. Online collaborative learning for open-vocabulary visual classifiers. In CVPR, 2016.
-
(2016)
CVPR
-
-
Zhang, H.1
Shang, X.2
Yang, W.3
Xu, H.4
Luan, H.5
Chua, T.-S.6
|