-
1
-
-
84973890960
-
Vqa: Visual question answering
-
In 2
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, 2015. 2
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
2
-
-
84959933549
-
Neural machine translation by jointly learning to align and translate
-
In 2
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2014. 2
-
(2014)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
3
-
-
85116156579
-
Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
-
In 5
-
S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL, 2005. 5
-
(2005)
ACL
-
-
Banerjee, S.1
Lavie, A.2
-
4
-
-
85044532646
-
Abc-cnn: An attention based convolutional neural network for visual question answering
-
In 1
-
K. Chen, J. Wang, L.-C. Chen, H. Gao, W. Xu, and R. Nevatia. Abc-cnn: An attention based convolutional neural network for visual question answering. In CVPR, 2016. 1
-
(2016)
CVPR
-
-
Chen, K.1
Wang, J.2
Chen, L.-C.3
Gao, H.4
Xu, W.5
Nevatia, R.6
-
5
-
-
0036517313
-
Control of goal-directed and stimulus-driven attention in the brain
-
1
-
M. Corbetta and G. L. Shulman. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 2002. 1
-
(2002)
Nature Reviews Neuroscience
-
-
Corbetta, M.1
Shulman, G.L.2
-
6
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
In 2
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015. 2
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
84965148420
-
Are you talking to a machine? Dataset and methods for multilingual image question
-
In 2
-
H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? dataset and methods for multilingual image question. In NIPS, 2015. 2
-
(2015)
NIPS
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
8
-
-
84978717864
-
-
1, 2, 3, 5
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016. 1, 2, 3, 5
-
(2016)
Deep Residual Learning for Image Recognition
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
10
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
4
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 2013. 4
-
(2013)
JAIR
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
11
-
-
84973917813
-
Guiding the long-short term memory model for image caption generation
-
In 2, 5, 6
-
X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding the long-short term memory model for image caption generation. In ICCV, 2015. 2, 5, 6
-
(2015)
ICCV
-
-
Jia, X.1
Gavves, E.2
Fernando, B.3
Tuytelaars, T.4
-
12
-
-
84962850780
-
Deep compositional cross-modal learning to rank via local-global alignment
-
In 2
-
X. Jiang, F. Wu, X. Li, Z. Zhao, W. Lu, S. Tang, and Y. Zhuang. Deep compositional cross-modal learning to rank via local-global alignment. In ACM MM, pages 69-78, 2015. 2
-
(2015)
ACM MM
, pp. 69-78
-
-
Jiang, X.1
Wu, F.2
Li, X.3
Zhao, Z.4
Lu, W.5
Tang, S.6
Zhuang, Y.7
-
13
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
In 2, 5, 6
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 2, 5, 6
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
14
-
-
84990070438
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
2
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 2016. 2
-
(2016)
IJCV
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
15
-
-
79960403098
-
Rouge: A package for automatic evaluation of summaries
-
5
-
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In ACL, 2004. 5
-
(2004)
ACL
-
-
Lin, C.-Y.1
-
16
-
-
84937834115
-
Microsoft coco: Common objects in context
-
In 5
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 5
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
17
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
In 2
-
M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015. 2
-
(2015)
ICCV
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
18
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
In 6
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015. 6
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
19
-
-
84937959846
-
Recurrent models of visual attention
-
1
-
V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In NIPS, 2014. 1
-
(2014)
NIPS
-
-
Mnih, V.1
Heess, N.2
Graves, A.3
-
20
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
In 5
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002. 5
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
21
-
-
84965170394
-
Exploring models and data for image question answering
-
In 2
-
M. Ren, R. Kiros, and R. Zemel. Exploring models and data for image question answering. In NIPS, 2015. 2
-
(2015)
NIPS
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
22
-
-
85030460823
-
-
arXiv preprint 2
-
P. H. Seo, Z. Lin, S. Cohen, X. Shen, and B. Han. Hierarchical attention networks. arXiv preprint arXiv:1606.02393, 2016. 2
-
(2016)
Hierarchical Attention Networks
-
-
Seo, P.H.1
Lin, Z.2
Cohen, S.3
Shen, X.4
Han, B.5
-
23
-
-
84951869843
-
Supervised discrete hashing
-
In 2
-
F. Shen, C. Shen, W. Liu, and H. Tao Shen. Supervised discrete hashing. In CVPR, pages 37-45, 2015. 2
-
(2015)
CVPR
, pp. 37-45
-
-
Shen, F.1
Shen, C.2
Liu, W.3
Tao Shen, H.4
-
24
-
-
84887334105
-
Inductive hashing on manifolds
-
In 2
-
F. Shen, C. Shen, Q. Shi, A. Van Den Hengel, and Z. Tang. Inductive hashing on manifolds. In CVPR, pages 1562-1569, 2013. 2
-
(2013)
CVPR
, pp. 1562-1569
-
-
Shen, F.1
Shen, C.2
Shi, Q.3
Van Den Hengel, A.4
Tang, Z.5
-
26
-
-
84937961845
-
Deep networks with internal selective attention through feedback connections
-
In 1
-
M. F. Stollenga, J. Masci, F. Gomez, and J. Schmidhuber. Deep networks with internal selective attention through feedback connections. In NIPS, 2014. 1
-
(2014)
NIPS
-
-
Stollenga, M.F.1
Masci, J.2
Gomez, F.3
Schmidhuber, J.4
-
27
-
-
84986296808
-
Rethinking the inception architecture for computer vision
-
In 6
-
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, pages 2818-2826, 2016. 6
-
(2016)
CVPR
, pp. 2818-2826
-
-
Szegedy, C.1
Vanhoucke, V.2
Ioffe, S.3
Shlens, J.4
Wojna, Z.5
-
28
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
In 5
-
R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015. 5
-
(2015)
CVPR
-
-
Vedantam, R.1
Lawrence Zitnick, C.2
Parikh, D.3
-
29
-
-
84973882730
-
Sequence to sequence-video to text
-
In 2
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, 2015. 2
-
(2015)
ICCV
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
30
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
In 2
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL-HLT, 2015. 2
-
(2015)
NAACL-HLT
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
31
-
-
84946747440
-
Show and tell: A neural image caption generator
-
In 2, 5, 6
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 2, 5, 6
-
(2015)
CVPR
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
32
-
-
84981331874
-
Hcp: A flexible cnn framework for multi-label image classification
-
1
-
Y. Wei, W. Xia, M. Lin, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan. Hcp: A flexible cnn framework for multi-label image classification. TPAMI, 2016. 1
-
(2016)
TPAMI
-
-
Wei, Y.1
Xia, W.2
Lin, M.3
Huang, J.4
Ni, B.5
Dong, J.6
Zhao, Y.7
Yan, S.8
-
33
-
-
85035008367
-
Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
-
H. Xu and K. Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, 2016. 1, 2
-
(2016)
ECCV
-
-
Xu, H.1
Saenko, K.2
-
34
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
In 1, 2, 3, 5, 6
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. 1, 2, 3, 5, 6
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
35
-
-
84986334021
-
Stacked attention networks for image question answering
-
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016. 1, 2
-
(2016)
CVPR
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
36
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
In 1
-
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015. 1
-
(2015)
ICCV
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
37
-
-
84986317307
-
Image captioning with semantic attention
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016. 2, 6
-
(2016)
CVPR
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
38
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
5
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014. 5
-
(2014)
TACL
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
40
-
-
84921476116
-
Visualizing and understanding convolutional networks
-
In 1, 2, 7
-
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014. 1, 2, 7
-
(2014)
ECCV
-
-
Zeiler, M.D.1
Fergus, R.2
-
41
-
-
85029388674
-
Visual translation embedding network for visual relation detection
-
In 2
-
H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua. Visual translation embedding network for visual relation detection. In CVPR, 2017. 2
-
(2017)
CVPR
-
-
Zhang, H.1
Kyaw, Z.2
Chang, S.-F.3
Chua, T.-S.4
-
42
-
-
84994666699
-
Partial multimodal sparse coding via adaptive similarity structure regularization
-
In 2
-
Z. Zhao, H. Lu, C. Deng, X. He, and Y. Zhuang. Partial multimodal sparse coding via adaptive similarity structure regularization. In ACM MM, pages 152-156, 2016. 2
-
(2016)
ACM MM
, pp. 152-156
-
-
Zhao, Z.1
Lu, H.2
Deng, C.3
He, X.4
Zhuang, Y.5
|