-
1
-
-
84973890960
-
VQA: Visual question answering
-
Santiago, Chile, December 7-13, 2015
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual question answering. In 2015 IEEE International Conference on Computer Vision, ICCV 2015, Santiago, Chile, December 7-13, 2015, pages 2425-2433, 2015.
-
(2015)
2015 IEEE International Conference on Computer Vision, ICCV 2015
, pp. 2425-2433
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
2
-
-
85041914730
-
Annotating object instances with a polygon-rnn
-
Honolulu, HI, USA, July 21-26, 2017
-
L. Castrejon, K. Kundu, R. Urtasun, and S. Fidler. Annotating object instances with a polygon-rnn. In 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21-26, 2017, pages 4485-4493, 2017.
-
(2017)
2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017
, pp. 4485-4493
-
-
Castrejon, L.1
Kundu, K.2
Urtasun, R.3
Fidler, S.4
-
3
-
-
84973868179
-
Hico: A benchmark for recognizing human-object interactions in images
-
Y.-W. Chao, Z. Wang, Y. He, J. Wang, and J. Deng. Hico: A benchmark for recognizing human-object interactions in images. In Proceedings of the IEEE International Conference on Computer Vision, 2015.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
-
-
Chao, Y.-W.1
Wang, Z.2
He, Y.3
Wang, J.4
Deng, J.5
-
7
-
-
70450161428
-
An empirical study of context in object detection
-
IEEE
-
S. K. Divvala, D. Hoiem, J. H. Hays, A. A. Efros, and M. Hebert. An empirical study of context in object detection. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 1271-1278. IEEE, 2009.
-
(2009)
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
, pp. 1271-1278
-
-
Divvala, S.K.1
Hoiem, D.2
Hays, J.H.3
Efros, A.A.4
Hebert, M.5
-
9
-
-
84944115860
-
-
June
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. C. Platt, C. Lawrence Zitnick, and G. Zweig. From captions to visual concepts and back. June 2015.
-
(2015)
From Captions to Visual Concepts and Back.
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Lawrence Zitnick, C.11
Zweig, G.12
-
10
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Springer
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In European Conference on Computer Vision, pages 15-29. Springer, 2010.
-
(2010)
European Conference on Computer Vision
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
13
-
-
84902318725
-
A survey on still image based human action recognition
-
G. Guo et al. A survey on still image based human action recognition. Pattern Recognition, 2014.
-
(2014)
Pattern Recognition
-
-
Guo, G.1
-
15
-
-
0031573117
-
Long short-term memory
-
Nov
-
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8): 1735-1780, Nov. 1997.
-
(1997)
Neural Comput.
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
16
-
-
85040312808
-
-
arXiv preprint arXiv: 1704.05526
-
R. Hu, J. Andreas, M. Rohrbach, T. Darrell, and K. Saenko. Learning to reason: End-To-end module networks for visual question answering. ArXiv preprint arXiv: 1704.05526, 2017.
-
(2017)
Learning to Reason: End-To-end Module Networks for Visual Question Answering.
-
-
Hu, R.1
Andreas, J.2
Rohrbach, M.3
Darrell, T.4
Saenko, K.5
-
17
-
-
84959233256
-
Image retrieval using scene graphs
-
J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. Shamma, M. Bernstein, and L. Fei-Fei. Image retrieval using scene graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3668-3678, 2015.
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 3668-3678
-
-
Johnson, J.1
Krishna, R.2
Stark, M.3
Li, L.-J.4
Shamma, D.5
Bernstein, M.6
Fei-Fei, L.7
-
19
-
-
85087529518
-
Hadamard product for low-rank bilinear pooling
-
J.-H. Kim, K. W. On, W. Lim, J. Kim, J.-W. Ha, and B.-T. Zhang. Hadamard Product for Low-rank Bilinear Pooling. In The 5th International Conference on Learning Representations, 2017.
-
(2017)
The 5th International Conference on Learning Representations
-
-
Kim, J.-H.1
On, K.W.2
Lim, W.3
Kim, J.4
Ha, J.-W.5
Zhang, B.-T.6
-
20
-
-
84911370987
-
What are you talking about? Text-To-image coreference
-
C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-To-image coreference. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3558-3565, 2014.
-
(2014)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 3558-3565
-
-
Kong, C.1
Lin, D.2
Bansal, M.3
Urtasun, R.4
Fidler, S.5
-
21
-
-
85040907831
-
Neural amr: Sequence-To-sequence models for parsing and generation
-
Long Papers, volume 1
-
I. Konstas, S. Iyer, M. Yatskar, Y. Choi, and L. Zettlemoyer. Neural amr: Sequence-To-sequence models for parsing and generation. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 146-157, 2017.
-
(2017)
Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics
, vol.1
, pp. 146-157
-
-
Konstas, I.1
Iyer, S.2
Yatskar, M.3
Choi, Y.4
Zettlemoyer, L.5
-
22
-
-
85011596790
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1): 32-73, 2017.
-
(2017)
International Journal of Computer Vision
, vol.123
, Issue.1
, pp. 32-73
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
25
-
-
85041915815
-
Scene graph generation from objects, phrases and region captions
-
Y. Li, W. Ouyang, B. Zhou, K. Wang, and X. Wang. Scene graph generation from objects, phrases and region captions. In Proceedings of the IEEE International Conference on Computer Vision, 2017.
-
(2017)
Proceedings of the IEEE International Conference on Computer Vision
-
-
Li, Y.1
Ouyang, W.2
Zhou, B.3
Wang, K.4
Wang, X.5
-
26
-
-
78149310629
-
What, where and who? Classifying events by scene and object recognition
-
L.-J. Li et al. What, where and who? classifying events by scene and object recognition. In CVPR, 2007.
-
(2007)
CVPR
-
-
Li, L.-J.1
-
28
-
-
84937834115
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollr, and C. L. Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV), Zrich, 2014.
-
(2014)
European Conference on Computer Vision (ECCV), Zrich
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollr, P.7
Zitnick, C.L.8
-
30
-
-
70450177757
-
Actions in context
-
IEEE
-
M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2929-2936. IEEE, 2009.
-
(2009)
Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on
, pp. 2929-2936
-
-
Marszalek, M.1
Laptev, I.2
Schmid, C.3
-
32
-
-
84990043973
-
-
arXiv preprint arXiv: 1505.04870
-
B. Plummer, L. Wang, C. Cervantes, J. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-To-phrase correspondences for richer image-Tosentence models. ArXiv preprint arXiv: 1505.04870, 2015.
-
(2015)
Flickr30k Entities: Collecting Region-To-phrase Correspondences for Richer Image-Tosentence Models.
-
-
Plummer, B.1
Wang, L.2
Cervantes, C.3
Caicedo, J.4
Hockenmaier, J.5
Lazebnik, S.6
-
33
-
-
50649096757
-
Objects in context
-
IEEE
-
A. Rabinovich, A. Vedaldi, C. Galleguillos, E. Wiewiora, and S. Belongie. Objects in context. In Computer vision, 2007. ICCV 2007. IEEE 11th international conference on, pages 1-8. IEEE, 2007.
-
(2007)
Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on
, pp. 1-8
-
-
Rabinovich, A.1
Vedaldi, A.2
Galleguillos, C.3
Wiewiora, E.4
Belongie, S.5
-
38
-
-
0015064542
-
Edge and curve detection for visual scene analysis
-
May
-
A. Rosenfeld and M. Thurston. Edge and curve detection for visual scene analysis. IEEE Trans. Comput., 20(5): 562-569, May 1971.
-
(1971)
IEEE Trans. Comput.
, vol.20
, Issue.5
, pp. 562-569
-
-
Rosenfeld, A.1
Thurston, M.2
-
39
-
-
84959184467
-
Viske: Visual knowledge extraction and question answering by visual verification of relation phrases
-
F. Sadeghi, S. K. Divvala, and A. Farhadi. Viske: Visual knowledge extraction and question answering by visual verification of relation phrases. In Conference on Computer Vision and Pattern Recognition, pages 1456-1464, 2015.
-
(2015)
Conference on Computer Vision and Pattern Recognition
, pp. 1456-1464
-
-
Sadeghi, F.1
Divvala, S.K.2
Farhadi, A.3
-
40
-
-
84933585162
-
Very deep convolutional networks for large-scale image recognition
-
abs/1409.1556
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
-
(2014)
CoRR
-
-
Simonyan, K.1
Zisserman, A.2
-
41
-
-
84965164720
-
Training very deep networks
-
NIPS'15, Cambridge, MA, USA,. MIT Press
-
R. K. Srivastava, K. Greff, and J. Schmidhuber. Training very deep networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems-Volume 2, NIPS'15, pages 2377-2385, Cambridge, MA, USA, 2015. MIT Press.
-
(2015)
Proceedings of the 28th International Conference on Neural Information Processing Systems
, vol.2
, pp. 2377-2385
-
-
Srivastava, R.K.1
Greff, K.2
Schmidhuber, J.3
-
42
-
-
84858436215
-
Approaching the symbol grounding problem with probabilistic graphical models
-
S. Tellex, T. Kollar, S. Dickerson, M. R.Walter, A. G. Banerjee, S. Teller, and N. Roy. Approaching the symbol grounding problem with probabilistic graphical models. AI magazine, 32(4): 64-76, 2011.
-
(2011)
AI Magazine
, vol.32
, Issue.4
, pp. 64-76
-
-
Tellex, S.1
Kollar, T.2
Dickerson, S.3
Walter, M.R.4
Banerjee, A.G.5
Teller, S.6
Roy, N.7
-
43
-
-
85040312182
-
Graph-structured representations for visual question answering
-
D. Teney, L. Liu, and A. Van den Hengel. Graph-structured representations for visual question answering. CVPR, 2017.
-
(2017)
CVPR
-
-
Teney, D.1
Liu, L.2
Van Den Hengel, A.3
-
44
-
-
84965136196
-
Grammar as a foreign language
-
O. Vinyals, L. Kaiser, T. Koo, S. Petrov, I. Sutskever, and G. Hinton. Grammar as a foreign language. In Advances in Neural Information Processing Systems, pages 2773-2781, 2015.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 2773-2781
-
-
Vinyals, O.1
Kaiser, L.2
Koo, T.3
Petrov, S.4
Sutskever, I.5
Hinton, G.6
-
45
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3156-3164, 2015.
-
(2015)
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
46
-
-
84986313796
-
Cnn-rnn: A unified framework for multi-label image classification
-
J. Wang, Y. Yang, J. Mao, Z. Huang, C. Huang, and W. Xu. Cnn-rnn: A unified framework for multi-label image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2285-2294, 2016.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 2285-2294
-
-
Wang, J.1
Yang, Y.2
Mao, J.3
Huang, Z.4
Huang, C.5
Xu, W.6
-
48
-
-
77955988492
-
Modeling mutual context of object and human pose in human-object interaction activities
-
B. Yao et al. Modeling mutual context of object and human pose in human-object interaction activities. In CVPR, 2010.
-
(2010)
CVPR
-
-
Yao, B.1
-
51
-
-
85045896626
-
Obj2text: Generating visually descriptive language from object layouts
-
X. Yin and V. Ordonez. Obj2text: Generating visually descriptive language from object layouts. In EMNLP, 2017.
-
(2017)
EMNLP
-
-
Yin, X.1
Ordonez, V.2
-
56
-
-
85030471090
-
-
arXiv: 1606.09239 [cs], June. ArXiv: 1606.09239
-
H. Zhang, Z. Hu, Y. Deng, M. Sachan, Z. Yan, and E. P. Xing. Learning Concept Taxonomies from Multimodal Data. ArXiv: 1606.09239 [cs], June 2016. ArXiv: 1606.09239.
-
(2016)
Learning Concept Taxonomies from Multimodal Data.
-
-
Zhang, H.1
Hu, Z.2
Deng, Y.3
Sachan, M.4
Yan, Z.5
Xing, E.P.6
-
57
-
-
85029388674
-
Visual translation embedding network for visual relation detection
-
H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua. Visual translation embedding network for visual relation detection. CVPR, 2017.
-
(2017)
CVPR
-
-
Zhang, H.1
Kyaw, Z.2
Chang, S.-F.3
Chua, T.-S.4
-
58
-
-
84973358602
-
Highway long short-Term memory rnns for distant speech recognition
-
March
-
Y. Zhang, G. Chen, D. Yu, K. Yaco, S. Khudanpur, and J. Glass. Highway long short-Term memory rnns for distant speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5755-5759, March 2016.
-
(2016)
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP
, pp. 5755-5759
-
-
Zhang, Y.1
Chen, G.2
Yu, D.3
Yaco, K.4
Khudanpur, S.5
Glass, J.6
-
59
-
-
84906493890
-
Reasoning about object affordances in a knowledge base representation
-
Springer
-
Y. Zhu, A. Fathi, and L. Fei-Fei. Reasoning about object affordances in a knowledge base representation. In European conference on computer vision, pages 408-424. Springer, 2014.
-
(2014)
European Conference on Computer Vision
, pp. 408
-
-
Zhu, Y.1
Fathi, A.2
Fei-Fei, L.3
|