-
1
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
3
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2015. 3
-
(2015)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
2
-
-
84965179228
-
Scheduled sampling for sequence prediction with recurrent neural networks
-
1, 2
-
S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS, pages 1171-1179, 2015. 1, 2
-
(2015)
NIPS
, pp. 1171-1179
-
-
Bengio, S.1
Vinyals, O.2
Jaitly, N.3
Shazeer, N.4
-
3
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
8
-
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, pages 190-200, 2011. 8
-
(2011)
ACL
, pp. 190-200
-
-
Chen, D.L.1
Dolan, W.B.2
-
4
-
-
84952349295
-
-
arXiv preprint 5
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015. 5
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollár, P.6
Zitnick, C.L.7
-
5
-
-
84957029470
-
Mind's eye: A recurrent visual representation for image caption generation
-
1, 2
-
X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, pages 2422-2431, 2015. 1, 2
-
(2015)
CVPR
, pp. 2422-2431
-
-
Chen, X.1
Lawrence Zitnick, C.2
-
6
-
-
84961291190
-
Learning phrase representations using rnn encoder-decoder for statistical machine translation
-
3
-
K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014. 3
-
(2014)
EMNLP
-
-
Cho, K.1
Merrienboer, B.V.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
7
-
-
85198028989
-
Imagenet: A large-scale hierarchical image database
-
6
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248-255, 2009. 6
-
(2009)
CVPR
, pp. 248-255
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
8
-
-
85107661995
-
Meteor universal: Language specific translation evaluation for any target language
-
6
-
M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In ACL, 2014. 6
-
(2014)
ACL
-
-
Denkowski, M.1
Lavie, A.2
-
9
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
1, 2
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, pages 2625-2634, 2015. 1, 2
-
(2015)
CVPR
, pp. 2625-2634
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
10
-
-
84959250180
-
From captions to visual concepts and back
-
1, 2, 3
-
H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015. 1, 2, 3
-
(2015)
CVPR
, pp. 1473-1482
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
11
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
2
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, pages 15-29, 2010. 2
-
(2010)
ECCV
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
12
-
-
84986281512
-
Learning attributes equals multi-source domain generalization
-
2
-
C. Gan, T. Yang, and B. Gong. Learning attributes equals multi-source domain generalization. In CVPR, pages 87-97, 2016. 2
-
(2016)
CVPR
, pp. 87-97
-
-
Gan, C.1
Yang, T.2
Gong, B.3
-
13
-
-
85021786108
-
Semantic compositional networks for visual captioning
-
3
-
Z. Gan, C. Gan, X. He, Y. Pu, K. Tran, J. Gao, L. Carin, and L. Deng. Semantic compositional networks for visual captioning. CVPR, 2017. 3
-
(2017)
CVPR
-
-
Gan, Z.1
Gan, C.2
He, X.3
Pu, Y.4
Tran, K.5
Gao, J.6
Carin, L.7
Deng, L.8
-
14
-
-
85044442374
-
Residual multiple instance learning for visually impaired image descriptions
-
3
-
S. Gella and M. Mitchell. Residual multiple instance learning for visually impaired image descriptions. NIPS Women in Machine Learning Workshop, 2016. 3
-
(2016)
NIPS Women in Machine Learning Workshop
-
-
Gella, S.1
Mitchell, M.2
-
15
-
-
84964588182
-
Fast r-cnn
-
2
-
R. Girshick. Fast r-cnn. In ICCV, pages 1440-1448, 2015. 2
-
(2015)
ICCV
, pp. 1440-1448
-
-
Girshick, R.1
-
16
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
2
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. Computer Science, pages 580-587, 2014. 2
-
(2014)
Computer Science
, pp. 580-587
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
17
-
-
84986274465
-
Deep residual learning for image recognition
-
2, 6
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CVPR, 2016. 2, 6
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
18
-
-
84986274522
-
Deep compositional captioning: Describing novel object categories without paired training data
-
3
-
L. A. Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell. Deep compositional captioning: Describing novel object categories without paired training data. CVPR, 2016. 3
-
(2016)
CVPR
-
-
Hendricks, L.A.1
Venugopalan, S.2
Rohrbach, M.3
Mooney, R.4
Saenko, K.5
Darrell, T.6
-
20
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
2, 5, 6
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853-899, 2013. 2, 5, 6
-
(2013)
Journal of Artificial Intelligence Research
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
21
-
-
84973917813
-
Guiding the long-short term memory model for image caption generation
-
2
-
X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding the long-short term memory model for image caption generation. In ICCV, pages 2407-2415, 2015. 2
-
(2015)
ICCV
, pp. 2407-2415
-
-
Jia, X.1
Gavves, E.2
Fernando, B.3
Tuytelaars, T.4
-
22
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
1, 2
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, pages 3128-3137, 2015. 1, 2
-
(2015)
CVPR
, pp. 3128-3137
-
-
Karpathy, A.1
Fei-Fei, L.2
-
23
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
8
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725-1732, 2014. 8
-
(2014)
Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1725-1732
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
24
-
-
85083951076
-
A method for stochastic optimization
-
6
-
D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015. 6
-
(2015)
ICLR
-
-
Kingma, D.1
Adam, J.Ba.2
-
25
-
-
84978730111
-
-
arXiv preprint 3
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332, 2016. 3
-
(2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
26
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
2
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012. 2
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
27
-
-
80052901011
-
Babytalk: Understanding and generating simple image descriptions
-
2
-
G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: understanding and generating simple image descriptions. In CVPR, pages 1601-1608, 2011. 2
-
(2011)
CVPR
, pp. 1601-1608
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
28
-
-
84934873221
-
TREETALK: Composition and compression of trees for image descriptions
-
2
-
P. Kuznetsova, V. Ordonez, T. L. Berg, and Y. Choi. TREETALK: composition and compression of trees for image descriptions. TACL, 2:351-362, 2014. 2
-
(2014)
TACL
, vol.2
, pp. 351-362
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.L.3
Choi, Y.4
-
29
-
-
85044442587
-
Composing simple image descriptions using web-scale n-grams
-
2
-
S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In ACL, 2011. 2
-
(2011)
ACL
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
31
-
-
84994145330
-
Multi-task sequence to sequence learning
-
2, 3, 5, 6
-
M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser. Multi-task sequence to sequence learning. ICLR, 2015. 2, 3, 5, 6
-
(2015)
ICLR
-
-
Luong, M.-T.1
Le, Q.V.2
Sutskever, I.3
Vinyals, O.4
Kaiser, L.5
-
32
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-RNN)
-
1, 2, 4
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-RNN). ICLR, 2015. 1, 2, 4
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
33
-
-
84973863256
-
Learning like a child: Fast novel visual concept learning from sentence descriptions of images
-
3
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, 2015. 3
-
(2015)
ICCV
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
34
-
-
85044481513
-
Senticap: Generating image descriptions with sentiments
-
2, 3
-
A. Mathews, L. Xie, and X. He. Senticap: Generating image descriptions with sentiments. AAAI, 2015. 2, 3
-
(2015)
AAAI
-
-
Mathews, A.1
Xie, L.2
He, X.3
-
35
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
2
-
M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, and I. Hal Daum?? Midge: generating image descriptions from computer vision detections. In EACL, pages 747-756, 2012. 2
-
(2012)
EACL
, pp. 747-756
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Hal Daum, I.10
-
36
-
-
85162522202
-
Im2text: Describing images using 1 million captioned photographs
-
2
-
V. Ordonez, G. Kulkarni, T. L. Berg, V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. NIPS, pages 1143-1151, 2011. 2
-
(2011)
NIPS
, pp. 1143-1151
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
Ordonez, V.4
Kulkarni, G.5
Berg, T.L.6
-
37
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
6
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002. 6
-
(2002)
ACL
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
38
-
-
85018916536
-
Variational autoencoder for deep learning of images, labels and captions
-
3
-
Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning of images, labels and captions. In NIPS, pages 2352-2360, 2016. 3
-
(2016)
NIPS
, pp. 2352-2360
-
-
Pu, Y.1
Gan, Z.2
Henao, R.3
Yuan, X.4
Li, C.5
Stevens, A.6
Carin, L.7
-
39
-
-
85011668632
-
Faster r-cnn: Towards real-time object detection with region proposal networks
-
2
-
S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1-1, 2016. 2
-
(2016)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, pp. 1
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
40
-
-
84990034009
-
Very deep convolutional networks for large-scale image recognition
-
2
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. Computer Science, 2014. 2
-
(2014)
Computer Science
-
-
Simonyan, K.1
Zisserman, A.2
-
41
-
-
84973888835
-
Automatic concept discovery from parallel text and visual corpora
-
2
-
C. Sun, C. Gan, and R. Nevatia. Automatic concept discovery from parallel text and visual corpora. In ICCV, pages 2596-2604, 2015. 2
-
(2015)
ICCV
, pp. 2596-2604
-
-
Sun, C.1
Gan, C.2
Nevatia, R.3
-
42
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
2, 3
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104-3112, 2014. 2, 3
-
(2014)
NIPS
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
43
-
-
84937522268
-
Going deeper with convolutions
-
2
-
C. Szegedy, W. Liu, Y. Jia, and P. Sermanet. Going deeper with convolutions. CVPR, pages 1-9, 2015. 2
-
(2015)
CVPR
, pp. 1-9
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
-
45
-
-
84973865953
-
Learning spatiotemporal features with 3D convolutional networks
-
8
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In ICCV, pages 4489-4497, 2015. 8
-
(2015)
ICCV
, pp. 4489-4497
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
46
-
-
84991067015
-
-
arXiv preprint 1, 2, 3, 6, 7
-
K. Tran, X. He, L. Zhang, J. Sun, C. Carapcea, C. Thrasher, C. Buehler, and C. Sienkiewicz. Rich image captioning in the wild. arXiv preprint arXiv:1603.09016, 2016. 1, 2, 3, 6, 7
-
(2016)
Rich Image Captioning in The Wild
-
-
Tran, K.1
He, X.2
Zhang, L.3
Sun, J.4
Carapcea, C.5
Thrasher, C.6
Buehler, C.7
Sienkiewicz, C.8
-
47
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
6
-
R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, pages 4566-4575, 2015. 6
-
(2015)
CVPR
, pp. 4566-4575
-
-
Vedantam, R.1
Lawrence Zitnick, C.2
Parikh, D.3
-
49
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
8
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. NAACL, 2015. 8
-
(2015)
NAACL
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
50
-
-
84946747440
-
Show and tell: A neural image caption generator
-
1, 2, 3, 4, 5, 6
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015. 1, 2, 3, 4, 5, 6
-
(2015)
CVPR
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
51
-
-
85044461924
-
Dense-cap: Fully convolutional localization networks for dense captioning
-
3
-
L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. Dense-cap: Fully convolutional localization networks for dense captioning. Computer Science, 2015. 3
-
(2015)
Computer Science
-
-
Wei, L.1
Huang, Q.2
Ceylan, D.3
Vouga, E.4
Li, H.5
-
52
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
1, 2, 4
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudi-nov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, pages 2048-2057, 2015. 1, 2, 4
-
(2015)
ICML
, pp. 2048-2057
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudi-Nov, R.6
Zemel, R.7
Bengio, Y.8
-
53
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
2
-
Y. Yang, C. L. Teo, Daum, H. Iii, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, pages 444-454, 2011. 2
-
(2011)
EMNLP
, pp. 444-454
-
-
Yang, Y.1
Teo, C.L.2
Daum, H.I.3
Aloimonos, Y.4
-
54
-
-
85030211479
-
Encode, review, and decode: Reviewer module for caption generation
-
1, 2
-
Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdinov, and W. W. Cohen. Encode, review, and decode: Reviewer module for caption generation. NIPS, 2016. 1, 2
-
(2016)
NIPS
-
-
Yang, Z.1
Yuan, Y.2
Wu, Y.3
Salakhutdinov, R.4
Cohen, W.W.5
-
55
-
-
84986317307
-
Image captioning with semantic attention
-
1, 2
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. CVPR, 2016. 1, 2
-
(2016)
CVPR
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
56
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
5
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014. 5
-
(2014)
TACL
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
|