-
1
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator", In CVPR, 2015.
-
(2015)
CVPR
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
2
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel, "Unifying visual-semantic embeddings with multimodal neural language models", TACL, 2015.
-
(2015)
TACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
3
-
-
84959252592
-
Deep fragment embeddings for bidirectional image sentence mapping
-
A. Karpathy, A. Joulin, and L. Fei-Fei, "Deep fragment embeddings for bidirectional image sentence mapping", In NIPS, 2013.
-
(2013)
NIPS
-
-
Karpathy, A.1
Joulin, A.2
Fei-Fei, L.3
-
4
-
-
84951072975
-
Explain images with multimodal recurrent neural networks
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille, "Explain images with multimodal recurrent neural networks", NIPS Deep Learning Workshop, 2014.
-
(2014)
NIPS Deep Learning Workshop
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
5
-
-
84944046597
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell, "Long-term recurrent convolutional networks for visual recognition and description", In CVPR, 2014.
-
(2014)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig, "From captions to visual concepts and back", In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
8
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio, "Show, attend and tell: Neural image caption generation with visual attention", In ICML, 2015.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.C.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
10
-
-
84965125568
-
Fisher vectors derived from hybrid Gaussian-Laplacian mixture models for image annotations
-
B. Klein, G. Lev, G. Lev, and L. Wolf, "Fisher vectors derived from hybrid Gaussian-Laplacian mixture models for image annotations", In CVPR, 2015.
-
(2015)
CVPR
-
-
Klein, B.1
Lev, G.2
Lev, G.3
Wolf, L.4
-
12
-
-
84881536861
-
Indoor segmentation and support inference from RGBD images
-
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus, "Indoor segmentation and support inference from RGBD images", In ECCV, 2012.
-
(2012)
ECCV
-
-
Silberman, N.1
Hoiem, D.2
Kohli, P.3
Fergus, R.4
-
13
-
-
84959502295
-
-
CoRR, vol. abs/1505.00468
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh, "VQA: Visual Question Answering", CoRR, vol. abs/1505.00468, 2015.
-
(2015)
VQA: Visual Question Answering
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
14
-
-
84957035520
-
-
CoRR, vol. abs/1505.01121
-
M. Malinowski, M. Rohrbach, and M. Fritz, "Ask Your Neurons: A Neural-based Approach to Answering Questions about Images", CoRR, vol. abs/1505.01121, 2015.
-
(2015)
Ask Your Neurons: A Neural-based Approach to Answering Questions About Images
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
15
-
-
84957033954
-
-
CoRR, vol. abs/1505.05612
-
H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu, "Are you talking to a machine? dataset and methods for multilingual image question answering", CoRR, vol. abs/1505.05612, 2015.
-
(2015)
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
16
-
-
84957021783
-
-
CoRR, vol. abs/1506.00333
-
L. Ma, Z. Lu, and H. Li, "Learning to answer questions from image using convolutional neural network", CoRR, vol. abs/1506.00333, 2015.
-
(2015)
Learning to Answer Questions from Image Using Convolutional Neural Network
-
-
Ma, L.1
Lu, Z.2
Li, H.3
-
17
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, "Microsoft COCO: Common Objects in Context", In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
18
-
-
84952349295
-
-
CoRR, vol. abs/1504.00325
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick, "Microsoft COCO captions: Data collection and evaluation server", CoRR, vol. abs/1504.00325, 2015.
-
(2015)
Microsoft COCO Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollar, P.6
Zitnick, C.L.7
-
19
-
-
0031573117
-
Long short-term memory
-
S. Hochreiter and J. Schmidhuber, "Long short-term memory", Neural Computation, vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Computation
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
20
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition", In ICLR, 2015.
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
21
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. S. Bernstein, A. C. Berg, and L. Fei-Fei, "Imagenet large scale visual recognition challenge", IJCV, 2015.
-
(2015)
IJCV
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.S.10
Berg, A.C.11
Fei-Fei, L.12
-
22
-
-
85083951332
-
Efficient estimation of word representations in vector space
-
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space", In ICLR, 2013.
-
(2013)
ICLR
-
-
Mikolov, T.1
Chen, K.2
Corrado, G.3
Dean, J.4
-
23
-
-
84898958665
-
DeViSE: A deep visual-semantic embedding model
-
A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov, "DeViSE: A deep visual-semantic embedding model", In NIPS, 2013.
-
(2013)
NIPS
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Ranzato, M.6
Mikolov, T.7
-
24
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
M. Hodosh, P. Young, and J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics", J. Artif. Intell. Res. (JAIR), vol. 47, pp. 853-899, 2013.
-
(2013)
J. Artif. Intell. Res. (JAIR)
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
25
-
-
85162522202
-
Im2text: Describing images using 1 million captioned photographs
-
V. Ordonez, G. Kulkarni, and T. L. Berg, "Im2text: Describing images using 1 million captioned photographs", In NIPS, 2011.
-
(2011)
NIPS
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
-
26
-
-
85146417759
-
Accurate unlexicalized parsing
-
D. Klein and C. D. Manning, "Accurate unlexicalized parsing", In ACL, 2003.
-
(2003)
ACL
-
-
Klein, D.1
Manning, C.D.2
-
29
-
-
85107362379
-
NLTK: The natural language toolkit
-
S. Bird, "NLTK: the natural language toolkit", In ACL, 2006.
-
(2006)
ACL
-
-
Bird, S.1
-
30
-
-
84965102873
-
-
CoRR, vol. abs/1505.04467
-
J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring nearest neighbor approaches for image captioning", CoRR, vol. abs/1505.04467, 2015.
-
(2015)
Exploring Nearest Neighbor Approaches for Image Captioning
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.3
Mitchell, M.4
Zitnick, C.L.5
-
31
-
-
85146676791
-
Verb semantics and lexical selection
-
Z. Wu and M. Palmer, "Verb semantics and lexical selection", In ACL, 1994.
-
(1994)
ACL
-
-
Wu, Z.1
Palmer, M.2
-
32
-
-
84937822746
-
A multi-world approach to question answering about real-world scenes based on uncertain input
-
M. Malinowski and M. Fritz, "A multi-world approach to question answering about real-world scenes based on uncertain input", In NIPS, 2014.
-
(2014)
NIPS
-
-
Malinowski, M.1
Fritz, M.2
|