-
1
-
-
0041876117
-
Matching words and pictures
-
K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003.
-
(2003)
JMLR
-
-
Barnard, K.1
Duygulu, P.2
Forsyth, D.3
De Freitas, N.4
Blei, D.M.5
Jordan, M.I.6
-
2
-
-
0142166851
-
A neural probabilistic language model
-
Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003.
-
(2003)
The Journal of Machine Learning Research
, vol.3
, pp. 1137-1155
-
-
Bengio, Y.1
Ducharme, R.2
Vincent, P.3
Janvin, C.4
-
3
-
-
84952349295
-
-
arXiv preprint arXiv:1504.00325
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollar, P.6
Zitnick, C.L.7
-
4
-
-
84957029470
-
Mind's eye: A recurrent visual representation for image caption generation
-
X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. CVPR, 2015.
-
(2015)
CVPR
-
-
Chen, X.1
Zitnick, C.L.2
-
5
-
-
85009929513
-
Describing multimedia content using attention-based encoder-decoder networks
-
abs/1507.01053
-
K. Cho, A. C. Courville, and Y. Bengio. Describing multimedia content using attention-based encoder-decoder networks. CoRR, abs/1507.01053, 2015.
-
(2015)
CoRR
-
-
Cho, K.1
Courville, A.C.2
Bengio, Y.3
-
6
-
-
84990044091
-
Torch7: A matlab-like environment for machine learning
-
R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, number EPFL-CONF-192376, 2011.
-
(2011)
BigLearn, NIPS Workshop, Number EPFL-CONF-192376
-
-
Collobert, R.1
Kavukcuoglu, K.2
Farabet, C.3
-
8
-
-
85009912425
-
-
arXiv preprint arXiv:1411.4389
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411.4389, 2014.
-
(2014)
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
10
-
-
77951298115
-
The PASCAL visual object classes (VOC) challenge
-
M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The PASCAL visual object classes (VOC) challenge. International journal of computer vision, 88(2):303-338, 2010.
-
(2010)
International Journal of Computer Vision
, vol.88
, Issue.2
, pp. 303-338
-
-
Everingham, M.1
Van Gool, L.2
Williams, C.K.3
Winn, J.4
Zisserman, A.5
-
11
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
-
12
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
13
-
-
85029359197
-
Fast R-CNN
-
R. Girshick. Fast R-CNN. ICCV, 2015.
-
(2015)
ICCV
-
-
Girshick, R.1
-
14
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR, 2014.
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
17
-
-
84939247735
-
Spatial pyramid pooling in deep convolutional networks for visual recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015, 2015.
-
(2015)
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
20
-
-
84856653718
-
Learning cross-modality similarity for multinomial data
-
Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. ICCV, 2011.
-
(2011)
ICCV
-
-
Jia, Y.1
Salzmann, M.2
Darrell, T.3
-
21
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
23
-
-
85083951076
-
Adam: A method for stochastic optimization
-
D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.
-
(2015)
ICLR
-
-
Kingma, D.1
Ba, J.2
-
24
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
-
(2015)
TACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
25
-
-
84978730111
-
-
R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, M. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. 2016.
-
(2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
Bernstein, M.11
Fei-Fei, L.12
-
26
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
27
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. CVPR, 2011.
-
(2011)
CVPR
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
28
-
-
84907331257
-
Generalizing image captions for image-text parallel corpus
-
Citeseer
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Generalizing image captions for image-text parallel corpus. In ACL (2), pages 790-796. Citeseer, 2013.
-
(2013)
ACL
, vol.2
, pp. 790-796
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
29
-
-
0032203257
-
Gradientbased learning applied to document recognition
-
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
-
(1998)
Proceedings of the IEEE
, vol.86
, Issue.11
, pp. 2278-2324
-
-
LeCun, Y.1
Bottou, L.2
Bengio, Y.3
Haffner, P.4
-
30
-
-
85009931853
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
31
-
-
84959205572
-
Fully convolutional networks for semantic segmentation
-
J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. CVPR, 2015.
-
(2015)
CVPR
-
-
Long, J.1
Shelhamer, E.2
Darrell, T.3
-
32
-
-
84951072975
-
Explain images with multimodal recurrent neural networks
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.
-
(2014)
ArXiv Preprint arXiv:1410.1090
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
33
-
-
79959829092
-
Recurrent neural network based language model
-
T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, 2010.
-
(2010)
INTERSPEECH
-
-
Mikolov, T.1
Karafiát, M.2
Burget, L.3
Cernockỳ, J.4
Khudanpur, S.5
-
34
-
-
84936796885
-
Large scale retrieval and generation of image descriptions
-
V. Ordonez, X. Han, P. Kuznetsova, G. Kulkarni, M. Mitchell, K. Yamaguchi, K. Stratos, A. Goyal, J. Dodge, A. Mensch, et al. Large scale retrieval and generation of image descriptions. International Journal of Computer Vision (IJCV), 2015.
-
(2015)
International Journal of Computer Vision (IJCV)
-
-
Ordonez, V.1
Han, X.2
Kuznetsova, P.3
Kulkarni, G.4
Mitchell, M.5
Yamaguchi, K.6
Stratos, K.7
Goyal, A.8
Dodge, J.9
Mensch, A.10
-
35
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer imageto-sentence models
-
B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer imageto-sentence models. ICCV, 2015.
-
(2015)
ICCV
-
-
Plummer, B.A.1
Wang, L.2
Cervantes, C.M.3
Caicedo, J.C.4
Hockenmaier, J.5
Lazebnik, S.6
-
36
-
-
85009891462
-
-
qassemoquab. stnbhwd
-
qassemoquab. stnbhwd. https://github.com/qassemoquab/stnbhwd, 2015.
-
(2015)
-
-
-
37
-
-
84961917629
-
-
arXiv preprint arXiv:1506.02640
-
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640, 2015.
-
(2015)
You only Look Once: Unified, Real-time Object Detection
-
-
Redmon, J.1
Divvala, S.2
Girshick, R.3
Farhadi, A.4
-
38
-
-
84960980241
-
Faster R-CNN: Towards real-time object detection with region proposal networks
-
S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS, 2015.
-
(2015)
NIPS
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
39
-
-
84947041871
-
Image net large scale visual recognition challenge
-
April
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), pages 1-42, April 2015.
-
(2015)
International Journal of Computer Vision (IJCV)
, pp. 1-42
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
40
-
-
85083951635
-
OverFeat: Integrated recognition, localization and detection using convolutional networks
-
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. OverFeat: Integrated recognition, localization and detection using convolutional networks. ICLR, 2014.
-
(2014)
ICLR
-
-
Sermanet, P.1
Eigen, D.2
Zhang, X.3
Mathieu, M.4
Fergus, R.5
LeCun, Y.6
-
41
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR, 2015.
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
42
-
-
77955998009
-
Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
-
R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. CVPR, 2010.
-
(2010)
CVPR
-
-
Socher, R.1
Fei-Fei, L.2
-
43
-
-
84964474107
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. TACL, 2014.
-
(2014)
TACL
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
44
-
-
80053459857
-
Generating text with recurrent neural networks
-
I. Sutskever, J. Martens, and G. E. Hinton. Generating text with recurrent neural networks. ICML, 2011.
-
(2011)
ICML
-
-
Sutskever, I.1
Martens, J.2
Hinton, G.E.3
-
45
-
-
84937522268
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR, 2015.
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
46
-
-
84962336509
-
-
arXiv preprint arXiv:1412.1441
-
C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. arXiv preprint arXiv:1412.1441, 2014.
-
(2014)
Scalable, High-quality Object Detection
-
-
Szegedy, C.1
Reed, S.2
Erhan, D.3
Anguelov, D.4
-
47
-
-
84957922397
-
Yfcc100m: The new data in multimedia research
-
B. Thomee, B. Elizalde, D. A. Shamma, K. Ni, G. Friedland, D. Poland, D. Borth, and L.-J. Li. Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64-73, 2016.
-
(2016)
Communications of the ACM
, vol.59
, Issue.2
, pp. 64-73
-
-
Thomee, B.1
Elizalde, B.2
Shamma, D.A.3
Ni, K.4
Friedland, G.5
Poland, D.6
Borth, D.7
Li, L.-J.8
-
50
-
-
0000903748
-
Generalization of backpropagation with application to a recurrent gas market model
-
P. J. Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4):339-356, 1988.
-
(1988)
Neural Networks
, vol.1
, Issue.4
, pp. 339-356
-
-
Werbos, P.J.1
-
51
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
52
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014.
-
(2014)
TACL
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
53
-
-
85009899017
-
Visualizing and understanding convolutional networks
-
M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV, 2014.
-
(2014)
ECCV
-
-
Zeiler, M.D.1
Fergus, R.2
-
54
-
-
85009853104
-
Edge boxes: Locating object proposals from edges
-
C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. ECCV, 2014.
-
(2014)
ECCV
-
-
Zitnick, C.L.1
Dollár, P.2
|