-
1
-
-
84993660571
-
Learning to compose neural networks for question answering
-
Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016. Learning to compose neural networks for question answering. In NAACL HLT. 2
-
(2016)
NAACL HLT
, vol.2
-
-
Andreas, J.1
Rohrbach, M.2
Darrell, T.3
Klein, D.4
-
2
-
-
84973890960
-
VQA: Visual question answering
-
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In ICCV. 2
-
(2015)
ICCV
, vol.2
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
3
-
-
85083951423
-
Multiple object recognition with visual attention
-
Jimmy Lei Ba, Volodymyr Mnih, and Koray Kavukcuoglu. 2015. Multiple Object Recognition With Visual Attention. In ICLR. 1
-
(2015)
ICLR
, vol.1
-
-
Ba, J.L.1
Mnih, V.2
Kavukcuoglu, K.3
-
4
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural Machine Translation by Jointly Learning to Align and Translate. In ICLR. 1
-
(2015)
ICLR
, vol.1
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
6
-
-
85006764733
-
Leveraging the wisdom of the crowd for fine-grained recognition
-
Jia Deng, Jonathan Krause, Michael Stark, and Li Fei-Fei. 2015. Leveraging the Wisdom of the Crowd for Fine-Grained Recognition. PAMI. 3
-
(2015)
PAMI
, vol.3
-
-
Deng, J.1
Krause, J.2
Stark, M.3
Fei-Fei, L.4
-
8
-
-
33846980853
-
What do we perceive in a glance of a real-world scene?
-
2
-
Li Fei-Fei, Asha Iyer, Christof Koch, and Pietro Perona. 2007. What do we perceive in a glance of a real-world scene? Journal of Vision, 7(1):10. 2
-
(2007)
Journal of Vision
, vol.7
, Issue.1
, pp. 10
-
-
Fei-Fei, L.1
Iyer, A.2
Koch, C.3
Perona, P.4
-
10
-
-
84887325349
-
Fine-grained crowdsourcing for fine-grained recognition
-
Jia Deng and Jonathan Krause and Li Fei-Fei. 2013. Fine-Grained Crowdsourcing for Fine-Grained Recognition. In CVPR. 3
-
(2013)
CVPR
, vol.3
-
-
Deng, J.1
Krause, J.2
Fei-Fei, L.3
-
12
-
-
85072837424
-
SALICON: Saliency in context
-
Ming Jiang, Shengsheng Huang, Juanyong Duan, and Qi Zhao. 2015. Salicon: Saliency in context. In CVPR. 2, 3
-
(2015)
CVPR
, vol.2
, pp. 3
-
-
Jiang, M.1
Huang, S.2
Duan, J.3
Zhao, Q.4
-
13
-
-
77956309403
-
Learning to predict where humans look
-
Tilke Judd, Krista Ehinger, Frédo Du-rand, and Antonio Torralba. 2009. Learning to predict where humans look. In ICCV. 2, 4
-
(2009)
ICCV
, vol.2
, pp. 4
-
-
Judd, T.1
Ehinger, K.2
Du-Rand, F.3
Torralba, A.4
-
14
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollr, and C. Lawrence Zitnick. 2014. Microsoft COCO: Common Objects in Context. In ECCV. 2
-
(2014)
ECCV
, vol.2
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollr, P.7
Lawrence Zitnick, C.8
-
15
-
-
85018917850
-
Hierarchical question-image co-attention for visual question answering
-
Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS. 1, 2, 4
-
(2016)
NIPS
, vol.1
, Issue.2
, pp. 4
-
-
Lu, J.1
Yang, J.2
Batra, D.3
Parikh, D.4
-
17
-
-
0034096923
-
The dynamic representation of scenes
-
Ronald A. Rensink. 2000. The dynamic representation of scenes. Visual Cognition, 7(1-3):17-42. 1
-
(2000)
Visual Cognition
, vol.7
, Issue.1-3
, pp. 17-42
-
-
Rensink, R.A.1
-
19
-
-
36448979181
-
The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions
-
2
-
Benjamin W. Tatler. 2007. The central fixation bias in scene viewing: Selecting an optimal viewing position independently of motor biases and image feature distributions. Journal of Vision, 7(14):4. 2, 4
-
(2007)
Journal of Vision
, vol.7
, Issue.14
, pp. 4
-
-
Tatler, B.W.1
-
20
-
-
4544353199
-
Labeling images with a computer game
-
Luis von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In CHI. 3
-
(2004)
CHI
, vol.3
-
-
Von Ahn, L.1
Dabbish, L.2
-
21
-
-
84999008900
-
Dynamic memory networks for visual and textual question answering
-
Caiming Xiong, Stephen Merity, and Richard Socher. 2016. Dynamic memory networks for visual and textual question answering. In ICML. 1
-
(2016)
ICML
, vol.1
-
-
Xiong, C.1
Merity, S.2
Socher, R.3
-
23
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron C. Courville, Ruslan Salakhutdinov, Richard S. Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML. 1
-
(2015)
ICML
, vol.1
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.C.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
24
-
-
85067831524
-
Stacked attention networks for image question answering
-
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alexander J. Smola. 2016. Stacked Attention Networks for Image Question Answering. In CVPR. 1, 2, 4
-
(2016)
CVPR
, vol.1
, Issue.2
, pp. 4
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.J.5
|