-
1
-
-
84973890960
-
VQA: Visual question answering
-
Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. VQA: Visual Question Answering. International Journal of Computer Vision, 2015.
-
(2015)
International Journal of Computer Vision
-
-
Agrawal, A.1
Lu, J.2
Antol, S.3
Mitchell, M.4
Zitnick, C.L.5
Parikh, D.6
Batra, D.7
-
2
-
-
85040308578
-
-
arXiv
-
Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. Bottom-Up and Top-Down Attention for Image Captioning and VQA. arXiv, 2017.
-
(2017)
Bottom-Up and Top-Down Attention for Image Captioning and VQA
-
-
Anderson, P.1
He, X.2
Buehler, C.3
Teney, D.4
Johnson, M.5
Gould, S.6
Zhang, L.7
-
4
-
-
84993660571
-
Learning to compose neural networks for question answering
-
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Learning to Compose Neural Networks for Question Answering. In NAACL, 2016b.
-
(2016)
NAACL
-
-
Andreas, J.1
Rohrbach, M.2
Darrell, T.3
Klein, D.4
-
5
-
-
85050692557
-
-
arXiv
-
Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, and Devi Parikh. It Takes Two to Tango: Towards Theory of AI’s Mind. arXiv, 2017.
-
(2017)
It Takes Two to Tango: Towards Theory of AI’S Mind
-
-
Chandrasekaran, A.1
Yadav, D.2
Chattopadhyay, P.3
Prabhu, V.4
Parikh, D.5
-
6
-
-
85043790150
-
Counting everyday objects in everyday scenes
-
Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, and Devi Parikh. Counting Everyday Objects in Everyday Scenes. In CVPR, 2017.
-
(2017)
CVPR
-
-
Chattopadhyay, P.1
Vedantam, R.2
Selvaraju, R.R.3
Batra, D.4
Parikh, D.5
-
7
-
-
85018938177
-
R-FCN: Object Detection via Region-based Fully Convolutional Networks
-
Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In NIPS, 2016.
-
(2016)
NIPS
-
-
Dai, J.1
Li, Y.2
He, K.3
Sun, J.4
-
8
-
-
85044506279
-
Multimodal compact bilinear pooling for visual question answering and visual grounding
-
Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Fukui, A.1
Park, D.H.2
Yang, D.3
Rohrbach, A.4
Darrell, T.5
Rohrbach, M.6
-
9
-
-
85029359197
-
Fast R-CNn
-
Ross Girshick. Fast R-CNN. In ICCV, 2015.
-
(2015)
ICCV
-
-
Girshick, R.1
-
10
-
-
84906343066
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
11
-
-
85041900002
-
Making the V in VQA matter: Elevating the role of image understanding in visual question answering
-
Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. In CVPR, 2017.
-
(2017)
CVPR
-
-
Goyal, Y.1
Khot, T.2
Summers-Stay, D.3
Batra, D.4
Parikh, D.5
-
13
-
-
85041904328
-
Learning to reason: End-to-end module networks for visual question answering
-
Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. Learning to Reason: End-to-End Module Networks for Visual Question Answering. In ICCV, 2017.
-
(2017)
ICCV
-
-
Hu, R.1
Andreas, J.2
Rohrbach, M.3
Darrell, T.4
Saenko, K.5
-
15
-
-
85087529518
-
Hadamard Product for Low-rank Bilinear Pooling
-
Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. Hadamard Product for Low-rank Bilinear Pooling. In ICLR, 2017.
-
(2017)
ICLR
-
-
Kim, J.-H.1
On, K.-W.2
Lim, W.3
Kim, J.4
Ha, J.-W.5
Zhang, B.-T.6
-
17
-
-
84990070438
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Fei-Fei Li. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision, 2016.
-
(2016)
International Journal of Computer Vision
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
Bernstein, M.S.11
Li, F.-F.12
-
18
-
-
85162384490
-
Learning to count objects in images
-
Victor Lempitsky and Andrew Zisserman. Learning To Count Objects in Images. NIPS, 2010.
-
(2010)
NIPS
-
-
Lempitsky, V.1
Zisserman, A.2
-
19
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft COCO: Common Objects in Context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Bourdev, L.4
Girshick, R.5
Hays, J.6
Perona, P.7
Ramanan, D.8
Zitnick, C.L.9
Dollár, P.10
-
20
-
-
85041916350
-
Focal loss for dense object detection
-
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal Loss for Dense Object Detection. ICCV, 2017.
-
(2017)
ICCV
-
-
Lin, T.-Y.1
Goyal, P.2
Girshick, R.3
He, K.4
Dollár, P.5
-
21
-
-
85031922514
-
Knowing when to look: Adaptive attention via a visual sentinel for image captioning
-
Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. In CVPR, 2016a.
-
(2016)
CVPR
-
-
Lu, J.1
Xiong, C.2
Parikh, D.3
Socher, R.4
-
22
-
-
85018917850
-
Hierarchical question-image co-attention for visual question answering
-
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS, 2016b.
-
(2016)
NIPS
-
-
Lu, J.1
Yang, J.2
Batra, D.3
Parikh, D.4
-
23
-
-
85023777762
-
Learning online alignments with continuous rewards policy gradient
-
Yuping Luo, Chung-cheng Chiu, Navdeep Jaitly, and Ilya Sutskever. Learning Online Alignments with Continuous Rewards Policy Gradient. In ICASSP, 2017.
-
(2017)
ICASSP
-
-
Luo, Y.1
Chiu, C.-C.2
Jaitly, N.3
Sutskever, I.4
-
24
-
-
84937822746
-
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
-
Mateusz Malinowski and Mario Fritz. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In NIPS, 2014.
-
(2014)
NIPS
-
-
Malinowski, M.1
Fritz, M.2
-
25
-
-
84999036937
-
Asynchronous methods for deep reinforcement learning
-
Volodymyr Minh, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy Lillicrap, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016.
-
(2016)
ICML
-
-
Minh, V.1
Badia, A.P.2
Mirza, M.3
Graves, A.4
Harley, T.5
Lillicrap, T.6
Silver, D.7
Kavukcuoglu, K.8
-
26
-
-
85021624882
-
Towards perspective-free object counting with deep learning
-
Daniel Oñoro-Rubio and Roberto J. López-Sastre. Towards perspective-free object counting with deep learning. In ECCV, 2016.
-
(2016)
ECCV
-
-
Oñoro-Rubio, D.1
López-Sastre, R.J.2
-
27
-
-
85021677756
-
-
arXiv
-
Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach. Attentive Explanations: Justifying Decisions and Pointing to the Evidence. arXiv, 2016.
-
(2016)
Attentive Explanations: Justifying Decisions and Pointing to the Evidence
-
-
Park, D.H.1
Hendricks, L.A.2
Akata, Z.3
Schiele, B.4
Darrell, T.5
Rohrbach, M.6
-
29
-
-
84961289992
-
Glove: Global vectors for word representation
-
Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors for Word Representation. EMNLP, 2014.
-
(2014)
EMNLP
-
-
Pennington, J.1
Socher, R.2
Manning, C.3
-
30
-
-
85044274041
-
End-to-end instance segmentation with recurrent attention
-
Mengye Ren and Richard S. Zemel. End-to-End Instance Segmentation with Recurrent Attention. In CVPR, 2017.
-
(2017)
CVPR
-
-
Ren, M.1
Zemel, R.S.2
-
31
-
-
84965170394
-
Exploring models and data for image question answering
-
Mengye Ren, Ryan Kiros, and Richard Zemel. Exploring Models and Data for Image Question Answering. In NIPS, 2015a.
-
(2015)
NIPS
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
32
-
-
84960980241
-
Faster R-CNN: Towards real-time object detection with region proposal networks
-
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 2015b.
-
(2015)
NIPS
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
33
-
-
85041911392
-
Self-critical Sequence Training for Image Captioning
-
Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. Self-critical Sequence Training for Image Captioning. In CVPR, 2017.
-
(2017)
CVPR
-
-
Rennie, S.J.1
Marcheret, E.2
Mroueh, Y.3
Ross, J.4
Goel, V.5
-
34
-
-
84951956461
-
Learning to count with deep object features
-
Santi Segui, Oriol Pujol, and Jordi Vitria. Learning to count with deep object features. In CVPRW, 2015.
-
(2015)
CVPRW
-
-
Segui, S.1
Pujol, O.2
Vitria, J.3
-
35
-
-
85047741987
-
Where to look: Focus regions for visual question answering
-
Kevin J. Shih, Saurabh Singh, and Derek Hoiem. Where To Look: Focus Regions for Visual Question Answering. In CVPR, 2015.
-
(2015)
CVPR
-
-
Shih, K.J.1
Singh, S.2
Hoiem, D.3
-
36
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
Richard Socher, Andrej Karpathy, Quoc V Le, Christopher D Manning, and Andrew Y Ng. Grounded Compositional Semantics for Finding and Describing Images with Sentences. In TACL, 2014.
-
(2014)
TACL
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
38
-
-
85071147167
-
Tips and tricks for visual question answering: Learnings from the 2017 challenge
-
Damien Teney, Peter Anderson, Xiaodong He, and Anton van den Hengel. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge. In CVPR, 2017.
-
(2017)
CVPR
-
-
Teney, D.1
Anderson, P.2
He, X.3
Van den Hengel, A.4
-
39
-
-
85018873682
-
Conditional image generation with PixelCNN decoders
-
Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. Conditional Image Generation with PixelCNN Decoders. In NIPS, 2016.
-
(2016)
NIPS
-
-
Van den Oord, A.1
Kalchbrenner, N.2
Vinyals, O.3
Espeholt, L.4
Graves, A.5
Kavukcuoglu, K.6
-
40
-
-
0000337576
-
Simple statistical gradient-following methods for connectionist reinforcement learning
-
R J Williams. Simple statistical gradient-following methods for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
-
(1992)
Machine Learning
, vol.8
, pp. 229-256
-
-
Williams, R.J.1
-
41
-
-
0041154467
-
Function Optimization using Connectionist Reinforcement Learning Algorithms
-
Ronald J. Williams and Jing Peng. Function Optimization using Connectionist Reinforcement Learning Algorithms. Connection Science, 3(3):241–268, 1991.
-
(1991)
Connection Science
, vol.3
, Issue.3
, pp. 241-268
-
-
Williams, R.J.1
Peng, J.2
-
42
-
-
84999008900
-
Dynamic memory networks for visual and textual question answering
-
Caiming Xiong, Stephen Merity, and Richard Socher. Dynamic Memory Networks for Visual and Textual Question Answering. In ICML, 2016.
-
(2016)
ICML
-
-
Xiong, C.1
Merity, S.2
Socher, R.3
-
43
-
-
85035008367
-
Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
-
Huijuan Xu and Kate Saenko. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In ECCV, 2015.
-
(2015)
ECCV
-
-
Xu, H.1
Saenko, K.2
-
44
-
-
85067831524
-
Stacked attention networks for image question answering
-
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked Attention Networks for Image Question Answering. In CVPR, 2015.
-
(2015)
CVPR
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
45
-
-
84959214343
-
Cross-scene crowd counting via deep convolutional neural networks
-
Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. Cross-scene crowd counting via deep convolutional neural networks. In CVPR, 2015.
-
(2015)
CVPR
-
-
Zhang, C.1
Li, H.2
Wang, X.3
Yang, X.4
-
46
-
-
85017461468
-
Salient object subitizing
-
Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiao-hui Shen, Brian Price, and Radomír Měch. Salient Object Subitizing. International Journal of Computer Vision, 2017.
-
(2017)
International Journal of Computer Vision
-
-
Zhang, J.1
Ma, S.2
Sameki, M.3
Sclaroff, S.4
Betke, M.5
Lin, Z.6
Shen, X.-H.7
Price, B.8
Měch, R.9
-
47
-
-
84986301525
-
-
arXiv
-
Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. Simple Baseline for Visual Question Answering. arXiv, 2015.
-
(2015)
Simple Baseline for Visual Question Answering
-
-
Zhou, B.1
Tian, Y.2
Sukhbaatar, S.3
Szlam, A.4
Fergus, R.5
-
48
-
-
84990052104
-
Visual7W: Grounded question answering in images
-
Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. Visual7W: Grounded Question Answering in Images. In CVPR, 2015.
-
(2015)
CVPR
-
-
Zhu, Y.1
Groth, O.2
Bernstein, M.3
Fei-Fei, L.4
|