-
1
-
-
85072842417
-
Analyzing the behavior of visual question answering models
-
A. Agrawal, D. Batra, and D. Parikh. Analyzing the behavior of visual question answering models. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Agrawal, A.1
Batra, D.2
Parikh, D.3
-
2
-
-
84973890960
-
VQA: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual question answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
3
-
-
85044308314
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
A. Das, H. Agrawal, et al. Visual genome: connecting language and vision using crowdsourced dense image annotations. In IJCV, 2016.
-
(2016)
IJCV
-
-
Das, A.1
Agrawal, H.2
-
4
-
-
85072846928
-
Human attention in visual question answering: Do humans and deep networks look at the same regions? in
-
A. Das, H. Agrawal, C. L. Zitnick, D. Parikh, and D. Batra. Human attention in visual question answering: Do humans and deep networks look at the same regions? In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Das, A.1
Agrawal, H.2
Zitnick, C.L.3
Parikh, D.4
Batra, D.5
-
5
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Rohrbach, M.3
Venugopalan, S.4
Guadarrama, S.5
Saenko, K.6
Darrell, T.7
-
6
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Landola, R. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Landola, F.3
Srivastava, R.4
Deng, L.5
Dollar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
7
-
-
84938942041
-
Image tag refinement with view-dependent concept representations
-
J. Fu, J.Wang, Y. Rui, X.-J.Wang, T. Mei, and H. Lu. Image tag refinement with view-dependent concept representations. IEEE T-CSVT, 25(28):1409-1422, 2015.
-
(2015)
IEEE IEEE T-CSVT
, vol.25
, Issue.28
, pp. 1409-1422
-
-
Fu, J.1
Wang, J.2
Rui, Y.3
Wang, X.-J.4
Mei, T.5
Lu, H.6
-
8
-
-
84973896917
-
Relaxing from vocabulary: Robust weakly-supervised deep learning for vocabulary-free image tagging
-
J. Fu, Y. Wu, T. Mei, J. Wang, H. Lu, and Y. Rui. Relaxing from vocabulary: Robust weakly-supervised deep learning for vocabulary-free image tagging. In ICCV, 2015.
-
(2015)
ICCV
-
-
Fu, J.1
Wu, Y.2
Mei, T.3
Wang, J.4
Lu, H.5
Rui, Y.6
-
9
-
-
85044506279
-
Multimodal compact bilinear pooling for visual question answering and visual grounding
-
A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Fukui, A.1
Park, D.H.2
Yang, D.3
Rohrbach, A.4
Darrell, T.5
Rohrbach, M.6
-
10
-
-
84965148420
-
Are you talking to a machine? Dataset and methods for multilingual image question answering
-
H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? dataset and methods for multilingual image question answering. In NIPS, 2015.
-
(2015)
NIPS
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
11
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
13
-
-
85044292278
-
A focused dynamic attention model for visual question answering
-
I. Ilievski, S. Yan, and J. Feng. A focused dynamic attention model for visual question answering. In ECCV, 2016.
-
(2016)
ECCV
-
-
Ilievski, I.1
Yan, S.2
Feng, J.3
-
14
-
-
0032203257
-
Gradientbased learning applied to document recognition
-
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceddings of the IEEE, 86(11):2278-2324, 1998.
-
(1998)
Proceddings of the IEEE
, vol.86
, Issue.11
, pp. 2278-2324
-
-
LeCun, Y.1
Bottou, L.2
Bengio, Y.3
Haffner, P.4
-
15
-
-
84937834115
-
Microsoft coco: Common objects in context
-
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollar, P.7
Zitnick, C.L.8
-
16
-
-
85030451435
-
Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks
-
Y. Liu, J. Fu, T. Mei, and C. W. Chen. Let your photos talk: Generating narrative paragraph for photo stream via bidirectional attention recurrent neural networks. In AAAI, pages 1445-1452, 2017.
-
(2017)
AAAI
, pp. 1445-1452
-
-
Liu, Y.1
Fu, J.2
Mei, T.3
Chen, C.W.4
-
17
-
-
85018917850
-
Hierarchical question-image co-attention for visual question answering
-
J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical question-image co-attention for visual question answering. In NIPS, 2016.
-
(2016)
NIPS
-
-
Lu, J.1
Yang, J.2
Batra, D.3
Parikh, D.4
-
18
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
-
(2015)
ICCV
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
19
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-RNN)
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-RNN). In ICLR, 2015.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
21
-
-
84986261711
-
Image question answering using convolutional neural network with dynamic parameter prediction
-
H. Noh, P. H. Seo, and B. Han. Image question answering using convolutional neural network with dynamic parameter prediction. In CVPR, 2016.
-
(2016)
CVPR
-
-
Noh, H.1
Seo, P.H.2
Han, B.3
-
22
-
-
84986332702
-
Jointly modeling embedding and translation to bridge video and language
-
Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui. Jointly modeling embedding and translation to bridge video and language. In CVPR, 2016.
-
(2016)
CVPR
-
-
Pan, Y.1
Mei, T.2
Yao, T.3
Li, H.4
Rui, Y.5
-
24
-
-
84965170394
-
Exploring models and data for image question answering
-
M. Ren, R. Kiros, and R. S. Zemel. Exploring models and data for image question answering. In NIPS, 2015.
-
(2015)
NIPS
-
-
Ren, M.1
Kiros, R.2
Zemel, R.S.3
-
26
-
-
84986327457
-
Where to look: Focus regions for visual question answering
-
K. J. Shih, S. Singh, and D. Hoiem. Where to look: Focus regions for visual question answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Shih, K.J.1
Singh, S.2
Hoiem, D.3
-
27
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
29
-
-
85006134827
-
Beyond object recognition: Visual sentiment analysis with deep coupled adjective and noun neural networks
-
J.Wang, J. Fu, T. Mei, and Y. Xu. Beyond object recognition: Visual sentiment analysis with deep coupled adjective and noun neural networks. In IJCAI, 2016.
-
(2016)
IJCAI
-
-
Wang, J.1
Fu, J.2
Mei, T.3
Xu, Y.4
-
30
-
-
84986301177
-
What value do explicit high level concepts have in vision to language problems?
-
Q.Wu, C. Shen, L. Liu, A. Dick, and A. Hengel. What value do explicit high level concepts have in vision to language problems? In CVPR, 2016.
-
(2016)
CVPR
-
-
Wu, Q.1
Shen, C.2
Liu, L.3
Dick, A.4
Hengel, A.5
-
31
-
-
84986320870
-
Ask me anything: Free-form visual question answering based on knowledge from external sources
-
Q. Wu, P. Wang, C. Shen, A. Dick, and A. Hengel. Ask me anything: Free-form visual question answering based on knowledge from external sources. In CVPR, 2016.
-
(2016)
CVPR
-
-
Wu, Q.1
Wang, P.2
Shen, C.3
Dick, A.4
Hengel, A.5
-
32
-
-
84999008900
-
Dynamic memory networks for visual and texual question answering
-
C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and texual question answering. In ICML, 2016.
-
(2016)
ICML
-
-
Xiong, C.1
Merity, S.2
Socher, R.3
-
33
-
-
85035008367
-
Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
-
H. Xu and K. Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, 2016.
-
(2016)
ECCV
-
-
Xu, H.1
Saenko, K.2
-
34
-
-
84986334021
-
Stacked attention networks for image question answering
-
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
35
-
-
85029380574
-
-
T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei. Boosting image captioning with attributes. In arXiv preprint arXiv:1611.01646, 2016.
-
(2016)
Boosting Image Captioning with Attributes
-
-
Yao, T.1
Pan, Y.2
Li, Y.3
Qiu, Z.4
Mei, T.5
-
36
-
-
84986317307
-
Image captioning with semantic attention
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016.
-
(2016)
CVPR
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
38
-
-
85018934522
-
Measuring machine intelligence through visual question answering
-
C. L. Zitnick, A. Agrawal, S. Antol, M. Mitchell, D. Batra, and D. Parikh. Measuring machine intelligence through visual question answering. AI Magazine, 37(1):63-72, 2016.
-
(2016)
AI Magazine
, vol.37
, Issue.1
, pp. 63-72
-
-
Zitnick, C.L.1
Agrawal, A.2
Antol, S.3
Mitchell, M.4
Batra, D.5
Parikh, D.6
|