-
2
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
3
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
4
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems, pages 2121-2129, 2013.
-
(2013)
Advances in Neural Information Processing Systems
, pp. 2121-2129
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
5
-
-
84959109176
-
-
arXiv preprint
-
C. Gulcehre, O. Firat, K. Xu, K. Cho, L. Barrault, H. Lin, F. Bougares, H. Schwenk, and Y. Bengio. On using monolingual corpora in neural machine translation. arXiv preprint arXiv:1503.03535, 2015.
-
(2015)
On Using Monolingual Corpora in Neural Machine Translation
-
-
Gulcehre, C.1
Firat, O.2
Xu, K.3
Cho, K.4
Barrault, L.5
Lin, H.6
Bougares, F.7
Schwenk, H.8
Bengio, Y.9
-
6
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
7
-
-
84986274522
-
Deep compositional captioning: Describing novel object categories without paired training data
-
L. A. Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR, 2016.
-
(2016)
CVPR
-
-
Hendricks, L.A.1
Venugopalan, S.2
Rohrbach, M.3
Mooney, R.4
Saenko, K.5
Darrell, T.6
-
8
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
10
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
-
(2015)
TACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
11
-
-
84934873221
-
Treetalk: Composition and compression of trees for image descriptions
-
P. Kuznetsova, V. Ordonez, T. L. Berg, U. C. Hill, and Y. Choi. Treetalk: Composition and compression of trees for image descriptions. In TACL, 2014.
-
(2014)
TACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.L.3
Hill, U.C.4
Choi, Y.5
-
12
-
-
84906927509
-
Is this a wampimuk? Cross-modal mapping between distributional semantics and the visual world
-
A. Lazaridou, E. Bruni, and M. Baroni. Is this a wampimuk? cross-modal mapping between distributional semantics and the visual world. In ACL, 2014.
-
(2014)
ACL
-
-
Lazaridou, A.1
Bruni, E.2
Baroni, M.3
-
13
-
-
84937834115
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
14
-
-
85117622017
-
The stanford corenlp natural language processing toolkit
-
C. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55-60, 2014.
-
(2014)
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations
, pp. 55-60
-
-
Manning, C.1
Surdeanu, M.2
Bauer, J.3
Finkel, J.4
Bethard, S.J.5
McClosky, D.6
-
15
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
16
-
-
84973863256
-
Learning like a child: Fast novel visual concept learning from sentence descriptions of images
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. L. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, 2015.
-
(2015)
ICCV
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.L.6
-
17
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
-
(2013)
NIPS
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
18
-
-
85044322278
-
Midge: Generating image descriptions from computer vision detections
-
H. D. III
-
M. Mitchell, J. Dodge, A. Goyal, K. Yamaguchi, K. Stratos, X. Han, A. Mensch, A. C. Berg, T. L. Berg, and H. D. III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012.
-
(2012)
EACL
-
-
Mitchell, M.1
Dodge, J.2
Goyal, A.3
Yamaguchi, K.4
Stratos, K.5
Han, X.6
Mensch, A.7
Berg, A.C.8
Berg, T.L.9
-
19
-
-
84898979068
-
-
arXiv preprint
-
M. Norouzi, T. Mikolov, S. Bengio, Y. Singer, J. Shlens, A. Frome, G. S. Corrado, and J. Dean. Zero-shot learning by convex combination of semantic embeddings. arXiv preprint arXiv:1312.5650, 2013.
-
(2013)
Zero-shot Learning by Convex Combination of Semantic Embeddings
-
-
Norouzi, M.1
Mikolov, T.2
Bengio, S.3
Singer, Y.4
Shlens, J.5
Frome, A.6
Corrado, G.S.7
Dean, J.8
-
21
-
-
84909978410
-
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ILSVRC, 2014.
-
(2014)
ILSVRC
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
22
-
-
84933585162
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
-
(2014)
CoRR, abs/1409.1556
-
-
Simonyan, K.1
Zisserman, A.2
-
23
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. TACL, 2014.
-
(2014)
TACL
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
25
-
-
85072843664
-
Improving LSTM-based video description with linguistic knowledge mined from text
-
S. Venugopalan, L. A. Hendricks, R. Mooney, and K. Saenko. Improving LSTM-based video description with linguistic knowledge mined from text. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Venugopalan, S.1
Hendricks, L.A.2
Mooney, R.3
Saenko, K.4
-
27
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
Y. Yang, C. L. Teo, H. Daumé III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, 2011.
-
(2011)
EMNLP
-
-
Yang, Y.1
Teo, C.L.2
Daumé, H.3
Aloimonos, Y.4
|