-
1
-
-
84995425614
-
-
Deep Learning and Unsupervised Feature Learning NIPS Workshop
-
F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS Workshop, 2012.
-
(2012)
Theano: New Features and Speed Improvements
-
-
Bastien, F.1
Lamblin, P.2
Pascanu, R.3
Bergstra, J.4
Goodfellow, I.J.5
Bergeron, A.6
Bouchard, N.7
Bengio, Y.8
-
2
-
-
85009859594
-
-
arXiv, abs/1601.03896
-
R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, and B. Plank. Automatic description generation from images: A survey. arXiv, abs/1601.03896, 2016.
-
(2016)
Automatic Description Generation from Images: A Survey
-
-
Bernardi, R.1
Cakici, R.2
Elliott, D.3
Erdem, A.4
Erdem, E.5
Ikizler-Cinbis, N.6
Keller, F.7
Muscat, A.8
Plank, B.9
-
3
-
-
84952349295
-
-
arXiv, abs/1504.00325
-
X. Chen, T.-Y. L. Hao Fang, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. arXiv, abs/1504.00325, 2015.
-
(2015)
Microsoft COCO Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Hao Fang, T.-Y.L.2
Vedantam, R.3
Gupta, S.4
Dollar, P.5
Zitnick, C.L.6
-
5
-
-
85198028989
-
ImageNet: A large-scale hierarchical image database
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
-
(2009)
CVPR
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
6
-
-
85107661995
-
Meteor universal: Language specific translation evaluation for any target language
-
M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL, 2014.
-
(2014)
EACL
-
-
Denkowski, M.1
Lavie, A.2
-
7
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
8
-
-
84919881041
-
DeCAF: A deep convolutional activation feature for generic visual recognition
-
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.
-
(2014)
ICML
-
-
Donahue, J.1
Jia, Y.2
Vinyals, O.3
Hoffman, J.4
Zhang, N.5
Tzeng, E.6
Darrell, T.7
-
9
-
-
84943812736
-
Describing images using inferred visual dependency representations
-
D. Elliott and A. P. de Vries. Describing images using inferred visual dependency representations. In ACL, 2015.
-
(2015)
ACL
-
-
Elliott, D.1
De Vries, A.P.2
-
10
-
-
84906929591
-
Image description using visual dependency representations
-
D. Elliott and F. Keller. Image description using visual dependency representations. In EMNLP, 2013.
-
(2013)
EMNLP
-
-
Elliott, D.1
Keller, F.2
-
11
-
-
84921069139
-
The pascal visual object classes challenge: A retrospective
-
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. IJCV, 2015.
-
(2015)
IJCV
-
-
Everingham, M.1
Eslami, S.M.A.2
Van Gool, L.3
Williams, C.K.I.4
Winn, J.5
Zisserman, A.6
-
12
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. Platt, L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
Zitnick, L.11
Zweig, G.12
-
13
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
14
-
-
84938217896
-
-
arXiv, abs/1403.1840
-
Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. arXiv, abs/1403.1840, 2014.
-
(2014)
Multi-scale Orderless Pooling of Deep Convolutional Activation Features
-
-
Gong, Y.1
Wang, L.2
Guo, R.3
Lazebnik, S.4
-
15
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
17
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 2013.
-
(2013)
JAIR
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
18
-
-
84913555165
-
-
arXiv, abs/1408.5093
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv, abs/1408.5093, 2014.
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
19
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
20
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, 2014.
-
(2014)
NIPS
-
-
Karpathy, A.1
Joulin, A.2
Fei-Fei, L.3
-
21
-
-
84961376850
-
Convolutional neural networks for sentence classification
-
Y. Kim. Convolutional neural networks for sentence classification. In EMNLP, 2014.
-
(2014)
EMNLP
-
-
Kim, Y.1
-
24
-
-
84913582676
-
Convolutional network features for scene recognition
-
M. Koskela and J. Laaksonen. Convolutional network features for scene recognition. In ACMMM, 2014.
-
(2014)
ACMMM
-
-
Koskela, M.1
Laaksonen, J.2
-
25
-
-
84862279067
-
Composing simple image descriptions using web-scale n-grams
-
S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In CoNLL, 2011.
-
(2011)
CoNLL
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
27
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
28
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
-
(2013)
NIPS
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
29
-
-
85162522202
-
Im2text: Describing images using 1 million captioned photographs
-
V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
-
(2011)
NIPS
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
-
30
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002.
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
33
-
-
84955283951
-
-
arXiv, abs/1506.01497
-
S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv, abs/1506.01497, 2015.
-
(2015)
Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
35
-
-
84977650097
-
Video captioning with recurrent networks based on frame-and video-level features and visual content classification
-
abs/1512.02949
-
R. Shetty and J. Laaksonen. Video captioning with recurrent networks based on frame-and video-level features and visual content classification. ICCV Workshop on LSMDC, abs/1512.02949, 2015.
-
(2015)
ICCV Workshop on LSMDC
-
-
Shetty, R.1
Laaksonen, J.2
-
36
-
-
84994666053
-
Frame-and segment-level features and candidate pool evaluation for video caption generation
-
R. Shetty and J. Laaksonen. Frame-and segment-level features and candidate pool evaluation for video caption generation. In ACMMM Multimedia Grand Challenge Solutions, 2016.
-
(2016)
ACMMM Multimedia Grand Challenge Solutions
-
-
Shetty, R.1
Laaksonen, J.2
-
37
-
-
84964983441
-
-
arXiv, abs/1409.4842
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv, abs/1409.4842, 2014.
-
(2014)
Going Deeper with Convolutions
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
39
-
-
84956980995
-
CIDEr: Consensus-based image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
41
-
-
84924067462
-
Sun database: Exploring a large collection of scene categories
-
J. Xiao, K. A. Ehinger, J. Hays, A. Torralba, and A. Oliva. Sun database: Exploring a large collection of scene categories. IJCV, 2014.
-
(2014)
IJCV
-
-
Xiao, J.1
Ehinger, K.A.2
Hays, J.3
Torralba, A.4
Oliva, A.5
-
42
-
-
77955988947
-
SUN database: Large-scale scene recognition from abbey to zoo
-
J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
-
(2010)
CVPR
-
-
Xiao, J.1
Hays, J.2
Ehinger, K.3
Oliva, A.4
Torralba, A.5
-
43
-
-
84939821074
-
-
arXiv, abs/1502.03044
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv, abs/1502.03044, 2015.
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
-
44
-
-
84995439884
-
-
arXiv, abs/1603.03925
-
Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. arXiv, abs/1603.03925, 2016.
-
(2016)
Image Captioning with Semantic Attention
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
45
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014.
-
(2014)
TACL
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
47
-
-
84937964578
-
Learning deep features for scene recognition using places database
-
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In NIPS, 2014.
-
(2014)
NIPS
-
-
Zhou, B.1
Lapedriza, A.2
Xiao, J.3
Torralba, A.4
Oliva, A.5
|