-
1
-
-
77951458444
-
An online algorithm for large scale image similarity learning
-
3
-
G. Chechik, U. Shalit, V. Sharma, and S. Bengio. An online algorithm for large scale image similarity learning. In NIPS, 2009. 3
-
(2009)
NIPS
-
-
Chechik, G.1
Shalit, U.2
Sharma, V.3
Bengio, S.4
-
2
-
-
84957029470
-
Mind's eye: A recurrent visual representation for image caption generation
-
3, 7
-
X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015. 3, 7
-
(2015)
CVPR
-
-
Chen, X.1
Zitnick, C.L.2
-
3
-
-
84455207551
-
Automatic evaluation of machine translation quality using n-gram co-occurrence statistics
-
6
-
G. Doddington. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In HLT, 2002. 6
-
(2002)
HLT
-
-
Doddington, G.1
-
4
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
3, 7, 8
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015. 3, 7, 8
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
5
-
-
84959250180
-
From captions to visual concepts and back
-
3, 7
-
H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015. 3, 7
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
6
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
2
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010. 2
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
7
-
-
84887839738
-
Phrasal recognition
-
1, 3
-
A. Farhadi and M. A. Sadeghi. Phrasal recognition. PAMI, 35 (12): 2854-65, 2013. 1, 3
-
(2013)
PAMI
, vol.35
, Issue.12
, pp. 2854-2865
-
-
Farhadi, A.1
Sadeghi, M.A.2
-
8
-
-
38049183286
-
The iapr tc-12 benchmark: A new evaluation resource for visual information systems
-
5, 10
-
M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, 2006. 5, 10
-
(2006)
International Workshop OntoImage
-
-
Grubinger, M.1
Clough, P.2
Müller, H.3
Deselaers, T.4
-
9
-
-
84973931408
-
From image annotation to image description
-
5, 6, 13
-
A. Gupta and P. Mannem. From image annotation to image description. In ICONIP, 2012. 5, 6, 13
-
(2012)
ICONIP
-
-
Gupta, A.1
Mannem, P.2
-
10
-
-
85059866463
-
Choosing linguistics over vision to describe images
-
1, 3, 5, 7
-
A. Gupta, Y. Verma, and C. V. Jawahar. Choosing linguistics over vision to describe images. In AAAI, 2012. 1, 3, 5, 7
-
(2012)
AAAI
-
-
Gupta, A.1
Verma, Y.2
Jawahar, C.V.3
-
12
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
2
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47: 853-899, 2013. 2
-
(2013)
JAIR
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
13
-
-
84913555165
-
-
arXiv preprint, (1408. 5093) 5
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint, (1408. 5093), 2014. 5
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
14
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
3, 7, 8
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 3, 7, 8
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
15
-
-
84944113729
-
Unifying visualsemantic embeddings with multimodal neural language models
-
3
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visualsemantic embeddings with multimodal neural language models. In NIPS, 2014. 3
-
(2014)
NIPS
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
16
-
-
85146417759
-
Accurate unlexicalized parsing
-
5
-
D. Klein and C. D. Manning. Accurate unlexicalized parsing. In ACL, 2003. 5
-
(2003)
ACL
-
-
Klein, D.1
Manning, C.D.2
-
17
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
1, 3, 5, 7
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012. 1, 3, 5, 7
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
18
-
-
80052901011
-
Baby talk: Understanding and generating image descriptions
-
1, 2, 7, 8
-
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating image descriptions. In CVPR, 2011. 1, 2, 7, 8
-
(2011)
CVPR
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
19
-
-
84907331257
-
Generalizing image captions for image-text parallel corpus
-
7
-
P. Kuznetsova, V. Ordonez, A. Berg, T. Berg, Y. Choi, and S. Brook. Generalizing image captions for image-text parallel corpus. In ACL, 2013. 7
-
(2013)
ACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.3
Berg, T.4
Choi, Y.5
Brook, S.6
-
20
-
-
84878189119
-
Collective generation of natural image descriptions
-
1, 2, 5, 7
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, 2012. 1, 2, 5, 7
-
(2012)
ACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
21
-
-
52149112996
-
Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments
-
7
-
A. Lavie and A. Agarwal. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In ACL WMT, 2007. 7
-
(2007)
ACL WMT
-
-
Lavie, A.1
Agarwal, A.2
-
22
-
-
84862279067
-
Composing simple image descriptions using web-scale n-grams
-
1, 2, 3
-
S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In CoNLL, 2011. 1, 2, 3
-
(2011)
CoNLL
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
23
-
-
84906505935
-
-
arXiv preprint 1405 0312. 5
-
T.-y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft coco: Common objects in context. arXiv preprint, 1405. 0312, 2014. 5
-
(2014)
Microsoft Coco: Common Objects in Context
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollar, P.7
Zitnick, C.L.8
-
24
-
-
3042535216
-
Distinctive image features from scale-invariant keypoints
-
5, 10
-
D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, (2): 91-110, 2004. 5, 10
-
(2004)
IJCV
, Issue.2
, pp. 91-110
-
-
Lowe, D.G.1
-
25
-
-
34948830130
-
Semantic hierarchies for visual object recognition
-
2
-
M. Marszalek and C. Schmid. Semantic hierarchies for visual object recognition. In CVPR, 2007. 2
-
(2007)
CVPR
-
-
Marszalek, M.1
Schmid, C.2
-
26
-
-
84884545084
-
-
Technical report, LEAR-INRIA and TVPA-XRCE 2, 3, 11
-
T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Large scale metric learning for distance-based image classification. Technical report, LEAR-INRIA and TVPA-XRCE, 2012. 2, 3, 11
-
(2012)
Large Scale Metric Learning for Distance-based Image Classification
-
-
Mensink, T.1
Verbeek, J.2
Perronnin, F.3
Csurka, G.4
-
27
-
-
84883488616
-
Metric learning for large scale image classification: Generalizing to new classes at near-zero cost
-
2, 3, 11
-
T. Mensink, J. Verbeek, F. Perronnin, and G. Csurka. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV, 2012. 2, 3, 11
-
(2012)
ECCV
-
-
Mensink, T.1
Verbeek, J.2
Perronnin, F.3
Csurka, G.4
-
28
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
1, 2, 3, 5
-
M. Mitchell, J. Dodge, A. Goyal, K. Yamaguchi, K. Stratos, X. Han, A. Mensch, A. Berg, T. Berg, and H. Daumé III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012. 1, 2, 3, 5
-
(2012)
EACL
-
-
Mitchell, M.1
Dodge, J.2
Goyal, A.3
Yamaguchi, K.4
Stratos, K.5
Han, X.6
Mensch, A.7
Berg, A.8
Berg, T.9
Daumé, H.10
-
29
-
-
85162522202
-
Im2text: Describing images using 1 million captioned photographs
-
2, 5, 7
-
V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011. 2, 5, 7
-
(2011)
NIPS
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
-
30
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
6
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002. 6
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
31
-
-
79959771606
-
Improving the fisher kernel for large-scale image classification
-
5
-
F. Perronnin, J. Sánchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In ECCV, 2010. 5
-
(2010)
ECCV
-
-
Perronnin, F.1
Sánchez, J.2
Mensink, T.3
-
34
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
5
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. IJCV, 2015. 5
-
(2015)
IJCV
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
35
-
-
80052889458
-
Recognition using visual phrases
-
1, 3
-
M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011. 1, 3
-
(2011)
CVPR
-
-
Sadeghi, M.A.1
Farhadi, A.2
-
36
-
-
80052905403
-
Learning to share visual appearance for multiclass object detection
-
2
-
R. Salakhutdinov, A. Torralba, and J. Tenenbaum. Learning to share visual appearance for multiclass object detection. In CVPR, 2011. 2
-
(2011)
CVPR
-
-
Salakhutdinov, R.1
Torralba, A.2
Tenenbaum, J.3
-
37
-
-
80052885179
-
High-dimensional signature compression for large-scale image classification
-
1, 3
-
J. Sánchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In CVPR, 2011. 1, 3
-
(2011)
CVPR
-
-
Sánchez, J.1
Perronnin, F.2
-
38
-
-
0031268931
-
Bidirectional recurrent neural networks
-
7
-
M. Schuster and K. K. Paliwal. Bidirectional recurrent neural networks. TSP, 45 (11): 2673-2681, 1997. 7
-
(1997)
TSP
, vol.45
, Issue.11
, pp. 2673-2681
-
-
Schuster, M.1
Paliwal, K.K.2
-
39
-
-
84943761635
-
Very deep convolutional networks for large-scale image recognition
-
5, 7
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In CVPR, 2015. 5, 7
-
(2015)
CVPR
-
-
Simonyan, K.1
Zisserman, A.2
-
40
-
-
84937522268
-
Going deeper with convolutions
-
7
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015. 7
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
41
-
-
84871392832
-
Efficient image annotation for automatic sentence generation
-
1, 3, 4, 5, 6, 7, 11, 12
-
Y. Ushiku, T. Harada, and Y. Kuniyoshi. Efficient image annotation for automatic sentence generation. In ACMMM, 2012. 1, 3, 4, 5, 6, 7, 11, 12
-
(2012)
ACMMM
-
-
Ushiku, Y.1
Harada, T.2
Kuniyoshi, Y.3
-
42
-
-
25844477556
-
Less: A model-based classifier for sparse subspaces
-
2, 3
-
C. J. Veenman and D. M. Tax. Less: A model-based classifier for sparse subspaces. PAMI, 27 (9): 1496-500, 2005. 2, 3
-
(2005)
PAMI
, vol.27
, Issue.9
, pp. 1496-1500
-
-
Veenman, C.J.1
Tax, D.M.2
-
43
-
-
84884963254
-
Generating image descriptions using semantic similarities in the output space
-
1, 3, 5, 7
-
Y. Verma, A. Gupta, P. Mannem, and C. Jawahar. Generating image descriptions using semantic similarities in the output space. In Proceedings of CVPR Workshop on Language for Vision, 2013. 1, 3, 5, 7
-
(2013)
Proceedings of CVPR Workshop on Language for Vision
-
-
Verma, Y.1
Gupta, A.2
Mannem, P.3
Jawahar, C.4
-
44
-
-
84946747440
-
Show and tell: A neural image caption generator
-
3, 7, 8
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 3, 7, 8
-
(2015)
CVPR
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
45
-
-
33749550361
-
Distance metric learning for large margin nearest neighbor classification
-
3, 7
-
K. Q. Weinberger, J. Blitzer, and L. K. Saul. Distance metric learning for large margin nearest neighbor classification. In NIPS, 2006. 3, 7
-
(2006)
NIPS
-
-
Weinberger, K.Q.1
Blitzer, J.2
Saul, L.K.3
-
46
-
-
77955654853
-
Large scale image annotation: Learning to rank with joint word-image embeddings
-
4, 11
-
J. Weston, S. Bengio, and N. Usunier. Large scale image annotation: Learning to rank with joint word-image embeddings. Machine Learning, 81: 21-35, 2010. 4, 11
-
(2010)
Machine Learning
, vol.81
, pp. 21-35
-
-
Weston, J.1
Bengio, S.2
Usunier, N.3
-
47
-
-
84867117593
-
Wsabie: Scaling up to large vocabulary image annotation
-
2, 3, 4, 11
-
J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In IJCAI, 2011. 2, 3, 4, 11
-
(2011)
IJCAI
-
-
Weston, J.1
Bengio, S.2
Usunier, N.3
-
48
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
1, 2, 5, 7, 8
-
Y. Yang, C. L. Teo, H. Daumé III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, 2011. 1, 2, 5, 7, 8
-
(2011)
EMNLP
-
-
Yang, Y.1
Teo, C.L.2
Daumé, H.3
Aloimonos, Y.4
|