-
3
-
-
84957029470
-
Mind's eye: A recurrent visual representation for image caption generation
-
X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Chen, X.1
Lawrence Zitnick, C.2
-
4
-
-
84961291190
-
-
K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv: 1406.1078, 2014.
-
(2014)
Learning Phrase Representations Using Rnn Encoder-Decoder for Statistical Machine Translation
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
6
-
-
84965102873
-
-
J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv: 1505.04467, 2015.
-
(2015)
Exploring Nearest Neighbor Approaches For Image Captioning
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.3
Mitchell, M.4
Zitnick, C.L.5
-
7
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
8
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
9
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
10
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
11
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
13
-
-
80052901011
-
Babytalk: Understanding and generating simple image descriptions
-
G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. In CVPR, 2011.
-
(2011)
CVPR
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
14
-
-
84878189119
-
Collective generation of natural image descriptions
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, 2012.
-
(2012)
ACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
15
-
-
78650200194
-
Rouge: A package for automatic evaluation of summaries
-
C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In ACL 2004 Workshop, 2004.
-
(2004)
ACL 2004 Workshop
-
-
Lin, C.-Y.1
-
16
-
-
84937834115
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
17
-
-
85018917850
-
Hierarchical question-image co-attention for visual question answering
-
J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical question-image co-attention for visual question answering. In NIPS, 2016.
-
(2016)
NIPS
-
-
Lu, J.1
Yang, J.2
Batra, D.3
Parikh, D.4
-
18
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
20
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, and H. Daumé III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012.
-
(2012)
EACL
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Daumé, H.10
-
21
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
22
-
-
85027880264
-
-
R. R.Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv: 1611.01646, 2016.
-
(2016)
Grad-Cam: Why Did You Say That? Visual Explanations from Deep Networks Via Gradient-Based Localization
-
-
Selvaraju, R.R.1
Das, A.2
Vedantam, R.3
Cogswell, M.4
Parikh, D.5
Batra, D.6
-
24
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
-
(2014)
NIPS
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
25
-
-
84990032289
-
-
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. arXiv preprint arXiv: 1512.00567, 2015.
-
(2015)
Rethinking the Inception Architecture For Computer Vision
-
-
Szegedy, C.1
Vanhoucke, V.2
Ioffe, S.3
Shlens, J.4
Wojna, Z.5
-
28
-
-
85028032121
-
-
Q. Wu, C. Shen, L. Liu, A. Dick, and A. v. d. Hengel. What value do explicit high level concepts have in vision to language problems? arXiv preprint arXiv: 1506.01144, 2015.
-
(2015)
What Value do Explicit High Level Concepts Have in Vision to Language Problems?
-
-
Wu, Q.1
Shen, C.2
Liu, L.3
Dick, A.4
Hengel, A.V.D.5
-
29
-
-
84999008900
-
Dynamic memory networks for visual and textual question answering
-
C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. In ICML, 2016.
-
(2016)
ICML
-
-
Xiong, C.1
Merity, S.2
Socher, R.3
-
30
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
-
31
-
-
84986334021
-
Stacked attention networks for image question answering
-
Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
32
-
-
85030211479
-
Encode, review, and decode: Reviewer module for caption generation
-
Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdinov, and W. W. Cohen. Encode, review, and decode: Reviewer module for caption generation. In NIPS, 2016.
-
(2016)
NIPS
-
-
Yang, Z.1
Yuan, Y.2
Wu, Y.3
Salakhutdinov, R.4
Cohen, W.W.5
-
33
-
-
85029380574
-
-
T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei. Boosting image captioning with attributes. arXiv preprint arXiv: 1611.01646, 2015.
-
(2015)
Boosting Image Captioning With Attributes
-
-
Yao, T.1
Pan, Y.2
Li, Y.3
Qiu, Z.4
Mei, T.5
-
34
-
-
84986317307
-
Image captioning with semantic attention
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016.
-
(2016)
CVPR
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
35
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In ACL, 2014.
-
(2014)
ACL
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
36
-
-
84990054197
-
-
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. arXiv preprint arXiv: 1512.04150, 2015.
-
(2015)
Learning Deep Features For Discriminative Localization
-
-
Zhou, B.1
Khosla, A.2
Lapedriza, A.3
Oliva, A.4
Torralba, A.5
|