-
1
-
-
85072842417
-
Analyzing the behavior of visual question answering models
-
A. Agrawal, D. Batra, and D. Parikh. Analyzing the behavior of visual question answering models. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Agrawal, A.1
Batra, D.2
Parikh, D.3
-
2
-
-
84993660571
-
Learning to compose neural networks for question answering
-
J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Learning to compose neural networks for question answering. In NAACL, 2016.
-
(2016)
NAACL
-
-
Andreas, J.1
Rohrbach, M.2
Darrell, T.3
Klein, D.4
-
4
-
-
84973890960
-
VQA: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Zitnick, and D. Parikh. VQA: Visual question answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.6
Parikh, D.7
-
5
-
-
84879854889
-
Representation learning: A review and new perspectives
-
Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. TPAMI, 35(8):1798-1828, 2014.
-
(2014)
TPAMI
, vol.35
, Issue.8
, pp. 1798-1828
-
-
Bengio, Y.1
Courville, A.2
Vincent, P.3
-
6
-
-
84992615443
-
-
Blender Foundation, Blender Institute, Amsterdam
-
Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Blender Institute, Amsterdam, 2016.
-
(2016)
Blender - A 3D Modelling and Rendering Package
-
-
-
7
-
-
84959908834
-
Deja image-captions: A corpus of expressive image descriptions in repetition
-
J. Chen, P. Kuznetsova, D. Warren, and Y. Choi. Deja image-captions: A corpus of expressive image descriptions in repetition. In NAACL, 2015.
-
(2015)
NAACL
-
-
Chen, J.1
Kuznetsova, P.2
Warren, D.3
Choi, Y.4
-
8
-
-
80051961229
-
Every picture tells a story: Generating sentences for images
-
A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences for images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
9
-
-
84990060711
-
-
A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In arXiv:1606.01847, 2016.
-
(2016)
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
-
-
Fukui, A.1
Park, D.H.2
Yang, D.3
Rohrbach, A.4
Darrell, T.5
Rohrbach, M.6
-
10
-
-
84965148420
-
Are you talking to a Machine? Dataset and methods for multilingual image question answering
-
H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? Dataset and methods for multilingual image question answering. In NIPS, 2015.
-
(2015)
NIPS
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
12
-
-
84925422907
-
Visual turing test for computer vision systems
-
D. Geman, S. Geman, N. Hallonquist, and L. Younes. Visual Turing test for computer vision systems. Proceedings of the National Academy of Sciences, 112(12):3618-3623, 2015.
-
(2015)
Proceedings of the National Academy of Sciences
, vol.112
, Issue.12
, pp. 3618-3623
-
-
Geman, D.1
Geman, S.2
Hallonquist, N.3
Younes, L.4
-
13
-
-
84993949467
-
Hybrid computing using a neural network with dynamic external memory
-
A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, A. Badia, K. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, and D. Hassabis. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.
-
(2016)
Nature
-
-
Graves, A.1
Wayne, G.2
Reynolds, M.3
Harley, T.4
Danihelka, I.5
Grabska-Barwinska, A.6
Colmenarejo, S.7
Grefenstette, E.8
Ramalho, T.9
Agapiou, J.10
Badia, A.11
Hermann, K.12
Zwols, Y.13
Ostrovski, G.14
Cain, A.15
King, H.16
Summerfield, C.17
Blunsom, P.18
Kavukcuoglu, K.19
Hassabis, D.20
more..
-
14
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
16
-
-
85041926703
-
Revisiting visual question answering baselines
-
A. Jabri, A. Joulin, and L. van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016.
-
(2016)
ECCV
-
-
Jabri, A.1
Joulin, A.2
Van Der Maaten, L.3
-
17
-
-
84959233256
-
Image retrieval using scene graphs
-
J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei. Image retrieval using scene graphs. In CVPR, 2015.
-
(2015)
CVPR
-
-
Johnson, J.1
Krishna, R.2
Stark, M.3
Li, L.-J.4
Shamma, D.A.5
Bernstein, M.S.6
Fei-Fei, L.7
-
18
-
-
84965117324
-
Inferring algorithmic patterns with stack-augmented recurrent nets
-
A. Joulin and T. Mikolov. Inferring algorithmic patterns with stack-augmented recurrent nets. In NIPS, 2015.
-
(2015)
NIPS
-
-
Joulin, A.1
Mikolov, T.2
-
19
-
-
84943540775
-
Referitgame: Referring to objects in photographs of natural scenes
-
S. Kazemzadeh, V. Ordonez, M. Matten, and T. Berg. Referitgame: Referring to objects in photographs of natural scenes. In EMNLP, 2014.
-
(2014)
EMNLP
-
-
Kazemzadeh, S.1
Ordonez, V.2
Matten, M.3
Berg, T.4
-
20
-
-
85083951076
-
Adam: A method for stochastic optimization
-
D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
-
(2015)
ICLR
-
-
Kingma, D.1
Ba, J.2
-
21
-
-
84990070438
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Jia-Li, D. Shamma, M. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 2016.
-
(2016)
IJCV
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Jia-Li, L.9
Shamma, D.10
Bernstein, M.11
Fei-Fei, L.12
-
22
-
-
85011954581
-
The winograd schema challenge
-
H. J. Levesque, E. Davis, and L. Morgenstern. The Winograd schema challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, Volume 46, page 47, 2011.
-
(2011)
AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning
, vol.46
, pp. 47
-
-
Levesque, H.J.1
Davis, E.2
Morgenstern, L.3
-
23
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollar, P.7
Zitnick, C.8
-
24
-
-
85018917850
-
Hierarchical question-image co-attention for visual question answering
-
J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical question-image co-attention for visual question answering. In NIPS, 2016.
-
(2016)
NIPS
-
-
Lu, J.1
Yang, J.2
Batra, D.3
Parikh, D.4
-
25
-
-
85007153677
-
Learning to answer questions from image using convolutional neural network
-
L. Ma, Z. Lu, and H. Li. Learning to answer questions from image using convolutional neural network. In AAAI, 2016.
-
(2016)
AAAI
-
-
Ma, L.1
Lu, Z.2
Li, H.3
-
26
-
-
84937822746
-
A multi-world approach to question answering about real-world scenes based on uncertain input
-
M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS, 2014.
-
(2014)
NIPS
-
-
Malinowski, M.1
Fritz, M.2
-
28
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
-
(2015)
ICCV
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
31
-
-
85072826753
-
Question relevance in vqa: Identifying non-visual and falsepremise questions
-
A. Ray, G. Christie, M. Bansal, D. Batra, and D. Parikh. Question relevance in vqa: Identifying non-visual and falsepremise questions. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Ray, A.1
Christie, G.2
Bansal, M.3
Batra, D.4
Parikh, D.5
-
32
-
-
84965170394
-
Exploring models and data for image question answering
-
M. Ren, R. Kiros, and R. Zemel. Exploring models and data for image question answering. In NIPS, 2015.
-
(2015)
NIPS
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
33
-
-
84986327457
-
Where to look: Focus regions for visual question answering
-
K. Shih, S. Singh, and D. Hoiem. Where to look: Focus regions for visual question answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Shih, K.1
Singh, S.2
Hoiem, D.3
-
34
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1):1929-1958, 2014.
-
(2014)
JMLR
, vol.15
, Issue.1
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.E.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
35
-
-
84907449171
-
A simple method to determine if a music information retrieval system is a horse
-
B. Sturm. A simple method to determine if a music information retrieval system is a horse. IEEE Transactions on Multimedia, 16(6):1636-1644, 2014.
-
(2014)
IEEE Transactions on Multimedia
, vol.16
, Issue.6
, pp. 1636-1644
-
-
Sturm, B.1
-
37
-
-
84986296727
-
Movieqa: Understanding stories in movies through question-answering
-
M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler. Movieqa: Understanding stories in movies through question-answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Tapaswi, M.1
Zhu, Y.2
Stiefelhagen, R.3
Torralba, A.4
Urtasun, R.5
Fidler, S.6
-
38
-
-
85083951707
-
Towards aicomplete question answering: A set of prerequisite toy tasks
-
J. Weston, A. Bordes, S. Chopra, A. Rush, B. van Merriënboer, A. Joulin, and T. Mikolov. Towards aicomplete question answering: A set of prerequisite toy tasks. In ICLR, 2016.
-
(2016)
ICLR
-
-
Weston, J.1
Bordes, A.2
Chopra, S.3
Rush, A.4
Van Merriënboer, B.5
Joulin, A.6
Mikolov, T.7
-
41
-
-
84990062072
-
-
Q. Wu, C. Shen, A. van den Hengel, P. Wang, and A. Dick. Image captioning and visual question answering based on attributes and their related external knowledge. In arXiv 1603.02814, 2016.
-
(2016)
Image Captioning and Visual Question Answering Based on Attributes and their Related External Knowledge
-
-
Wu, Q.1
Shen, C.2
Van Den Hengel, A.3
Wang, P.4
Dick, A.5
-
42
-
-
84999008900
-
Dynamic memory networks for visual and textual question answering
-
C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. ICML, 2016.
-
(2016)
ICML
-
-
Xiong, C.1
Merity, S.2
Socher, R.3
-
43
-
-
85035008367
-
Ask, attend, and answer: Exploring question-guided spatial attention for visual question answering
-
H. Xu and K. Saenko. Ask, attend, and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, 2016.
-
(2016)
ECCV
-
-
Xu, H.1
Saenko, K.2
-
44
-
-
84986334021
-
Stacked attention networks for image question answering
-
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
45
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In TACL, pages 67-78, 2014.
-
(2014)
TACL
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
46
-
-
84959862697
-
Visual madlibs: Fill in the blank image generation and question answering
-
L. Yu, E. Park, A. Berg, and T. Berg. Visual madlibs: Fill in the blank image generation and question answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Yu, L.1
Park, E.2
Berg, A.3
Berg, T.4
-
47
-
-
84986278354
-
Yin and yang: Balancing and answering binary visual questions
-
P. Zhang, Y. Goyal, D. Summers-Stay, D. Batra, and D. Parikh. Yin and yang: Balancing and answering binary visual questions. In CVPR, 2016.
-
(2016)
CVPR
-
-
Zhang, P.1
Goyal, Y.2
Summers-Stay, D.3
Batra, D.4
Parikh, D.5
-
48
-
-
84986301525
-
-
B. Zhou, Y. Tian, S. Sukhbataar, A. Szlam, and R. Fergus. Simple baseline for visual question answering. In arXiv:1512.02167, 2015.
-
(2015)
Simple Baseline for Visual Question Answering
-
-
Zhou, B.1
Tian, Y.2
Sukhbataar, S.3
Szlam, A.4
Fergus, R.5
-
50
-
-
84887338442
-
Bringing semantics into focus using visual abstraction
-
C. Zitnick and D. Parikh. Bringing semantics into focus using visual abstraction. In CVPR, 2013.
-
(2013)
CVPR
-
-
Zitnick, C.1
Parikh, D.2
|