-
1
-
-
84973890960
-
Vqa: Visual question answering
-
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C. L., & Parikh, D. (2015). Vqa: Visual question answering. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
4
-
-
85072028231
-
Return of the devil in the details: Delving deep into convolutional nets
-
Chatfield, K., Simonyan, K., Vedaldi, A., & Zisserman, A. (2014). Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference.
-
(2014)
British Machine Vision Conference
-
-
Chatfield, K.1
Simonyan, K.2
Vedaldi, A.3
Zisserman, A.4
-
5
-
-
84959908834
-
Déjà image-captions: A corpus of expressive descriptions in repetition
-
Chen, J., Kuznetsova, P., Warren, D., & Choi, Y. (2015). Déjà image-captions: A corpus of expressive descriptions in repetition. In North American Chapter of the Association for Computational Linguistics.
-
(2015)
North American Chapter of the Association for Computational Linguistics
-
-
Chen, J.1
Kuznetsova, P.2
Warren, D.3
Choi, Y.4
-
9
-
-
84944096380
-
Language models for image captioning: The quirks and what works
-
Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., & Mitchell, M. (2015). Language Models for Image Captioning: The Quirks and What Works. In Annual Meeting of the Association for Computational Linguistics.
-
(2015)
Annual Meeting of the Association for Computational Linguistics
-
-
Devlin, J.1
Cheng, H.2
Fang, H.3
Gupta, S.4
Deng, L.5
He, X.6
Zweig, G.7
Mitchell, M.8
-
10
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Donahue, J., Hendricks, L. A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In IEEE Conference on Computer Vision and Pattern Recognition.
-
(2015)
IEEE Conference on Computer Vision and Pattern Recognition
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
15
-
-
77951298115
-
The PASCAL Visual Object Classes (VOC) Challenge
-
Everingham, M., Van Gool, L., Williams, C. K. I., Winn, J., & Zisserman, A. (2010). The PASCAL Visual Object Classes (VOC) Challenge. International Journal of Computer Vision, 88 (2), 303-338.
-
(2010)
International Journal of Computer Vision
, vol.88
, Issue.2
, pp. 303-338
-
-
Everingham, M.1
Van Gool, L.2
Williams, C.K.I.3
Winn, J.4
Zisserman, A.5
-
18
-
-
84959250180
-
From captions to visual concepts and back
-
Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J., Zitnick, C. L., & Zweig, G. (2015). From captions to visual concepts and back. In IEEE Conference on Computer Vision and Pattern Recognition.
-
(2015)
IEEE Conference on Computer Vision and Pattern Recognition
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
Zitnick, C.L.11
Zweig, G.12
-
19
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
Farhadi, A., Hejrati, M., Sadeghi, M. A., Young, P., Rashtchian, C., Hockenmaier, J., & Forsyth, D. (2010). Every picture tells a story: Generating sentences from images. In European Conference on Computer Vision.
-
(2010)
European Conference on Computer Vision
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
20
-
-
77955422240
-
Object detection with discriminatively trained part-based models
-
Felzenszwalb, P. F., Girshick, R. B., McAllester, D., & Ramanan, D. (2010). Object detection with discriminatively trained part-based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32 (9), 1627-1645.
-
(2010)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.32
, Issue.9
, pp. 1627-1645
-
-
Felzenszwalb, P.F.1
Girshick, R.B.2
McAllester, D.3
Ramanan, D.4
-
22
-
-
84874541449
-
Automatic caption generation for news images
-
Feng, Y., & Lapata, M. (2013). Automatic caption generation for news images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35 (4), 797-812.
-
(2013)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.35
, Issue.4
, pp. 797-812
-
-
Feng, Y.1
Lapata, M.2
-
23
-
-
84959904882
-
A survey of current datasets for vision and language research
-
Ferraro, F., Mostafazadeh, N., Huang, T., Vanderwende, L., Devlin, J., Galley, M., & Mitchell, M. (2015). A survey of current datasets for vision and language research. In Conference on Empirical Methods in Natural Language Processing.
-
(2015)
Conference on Empirical Methods in Natural Language Processing
-
-
Ferraro, F.1
Mostafazadeh, N.2
Huang, T.3
Vanderwende, L.4
Devlin, J.5
Galley, M.6
Mitchell, M.7
-
24
-
-
84965148420
-
Are you talking to a machine? Dataset and methods for multilingual image question answering
-
Gao, H., Mao, J., Zhou, J., Huang, Z., & Yuille, A. (2015). Are you talking to a machine? dataset and methods for multilingual image question answering. In International Conference on Learning Representations.
-
(2015)
International Conference on Learning Representations
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Yuille, A.5
-
25
-
-
84925422907
-
Visual turing test for computer vision systems
-
Geman, D., Geman, S., Hallonquist, N., & Younes, L. (2015). Visual turing test for computer vision systems. Proceedings of the National Academy of Sciences, 112 (12), 3618-3623.
-
(2015)
Proceedings of the National Academy of Sciences
, vol.112
, Issue.12
, pp. 3618-3623
-
-
Geman, D.1
Geman, S.2
Hallonquist, N.3
Younes, L.4
-
26
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In IEEE Conference on Computer Vision and Pattern Recognition.
-
(2014)
IEEE Conference on Computer Vision and Pattern Recognition
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
27
-
-
84959243872
-
Improving image-sentence embeddings using large weakly annotated photo collections
-
Gong, Y., Wang, L., Hodosh, M., Hockenmaier, J., & Lazebnik, S. (2014). Improving Image-Sentence Embeddings Using Large Weakly Annotated Photo Collections. In European Conference on Computer Vision.
-
(2014)
European Conference on Computer Vision
-
-
Gong, Y.1
Wang, L.2
Hodosh, M.3
Hockenmaier, J.4
Lazebnik, S.5
-
28
-
-
38049183286
-
The IAPR TC-12 benchmark: A new evaluation resource for visual information systems
-
Grubinger, M., Clough, P., Muller, H., & Deselaers, T. (2006). The IAPR TC-12 benchmark: A new evaluation resource for visual information systems. In International Conference on Language Resources and Evaluation.
-
(2006)
International Conference on Language Resources and Evaluation
-
-
Grubinger, M.1
Clough, P.2
Muller, H.3
Deselaers, T.4
-
29
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
-
Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Venugopalan, S., Mooney, R., Darrell, T., & Saenko, K. (2013). Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In International Conference on Computer Vision.
-
(2013)
International Conference on Computer Vision
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
31
-
-
10044285992
-
Canonical correlation analysis: An overview with application to learning methods
-
Hardoon, D. R., Szedmak, S., & Shawe-Taylor, J. (2004). Canonical correlation analysis: An overview with application to learning methods. Neural Computation, 16 (12), 2639-2664.
-
(2004)
Neural Computation
, vol.16
, Issue.12
, pp. 2639-2664
-
-
Hardoon, D.R.1
Szedmak, S.2
Shawe-Taylor, J.3
-
33
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. Journal of Artificial Intelligence Research, 47, 853-899.
-
(2013)
Journal of Artificial Intelligence Research
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
34
-
-
0000107975
-
Relations between two sets of variates
-
Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 0, 321-377.
-
(1936)
Biometrika
, pp. 321-377
-
-
Hotelling, H.1
-
35
-
-
0033909136
-
A conceptual framework for indexing visual information at multiple levels
-
Jaimes, A., & Chang, S.-F. (2000). A conceptual framework for indexing visual information at multiple levels. In IST SPIE Internet Imaging.
-
(2000)
IST SPIE Internet Imaging
-
-
Jaimes, A.1
Chang, S.-F.2
-
37
-
-
84973917813
-
Guiding the long-short term memory model for image caption generation
-
Jia, X., Gavves, E., Fernando, B., & Tuytelaars, T. (2015). Guiding the long-short term memory model for image caption generation. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Jia, X.1
Gavves, E.2
Fernando, B.3
Tuytelaars, T.4
-
38
-
-
84959233256
-
Image retrieval using scene graphs
-
Johnson, J., Krishna, R., Stark, M., Li, L.-J., Shamma, D. A., Bernstein, M., & Fei-Fei, L. (2015). Image retrieval using scene graphs. In IEEE Conference on Computer Vision and Pattern Recognition.
-
(2015)
IEEE Conference on Computer Vision and Pattern Recognition
-
-
Johnson, J.1
Krishna, R.2
Stark, M.3
Li, L.-J.4
Shamma, D.A.5
Bernstein, M.6
Fei-Fei, L.7
-
43
-
-
84893398951
-
Generating natural-language video descriptions using text-mined knowledge
-
Krishnamoorthy, N., Malkarnenkar, G., Mooney, R., Saenko, K., & Guadarrama, S. (2013). Generating Natural-Language Video Descriptions Using Text-Mined Knowledge. In Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies.
-
(2013)
Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
-
-
Krishnamoorthy, N.1
Malkarnenkar, G.2
Mooney, R.3
Saenko, K.4
Guadarrama, S.5
-
44
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., & Berg, T. L. (2011). Baby talk: Understanding and generating simple image descriptions. In IEEE Conference on Computer Vision and Pattern Recognition.
-
(2011)
IEEE Conference on Computer Vision and Pattern Recognition
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
45
-
-
84878189119
-
Collective generation of natural image descriptions
-
Kuznetsova, P., Ordonez, V., Berg, A. C., Berg, T. L., & Choi, Y. (2012). Collective Generation of Natural Image Descriptions. In Annual Meeting of the Association for Computational Linguistics.
-
(2012)
Annual Meeting of the Association for Computational Linguistics
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
46
-
-
84934873221
-
TREETALK: Composition and compression of trees for image descriptions
-
Kuznetsova, P., Ordonezz, V., Berg, T. L., & Choi, Y. (2014). TREETALK: Composition and compression of trees for image descriptions. In Conference on Empirical Methods in Natural Language Processing.
-
(2014)
Conference on Empirical Methods in Natural Language Processing
-
-
Kuznetsova, P.1
Ordonezz, V.2
Berg, T.L.3
Choi, Y.4
-
50
-
-
84862279067
-
Composing simple image descriptions using web-scale n-grams
-
Li, S., Kulkarni, G., Berg, T. L., Berg, A. C., & Choi, Y. (2011). Composing simple image descriptions using web-scale n-grams. In The SIGNLL Conference on Computational Natural Language Learning.
-
(2011)
The SIGNLL Conference on Computational Natural Language Learning
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
51
-
-
84877085938
-
Learning dependency-based compositional semantics
-
Liang, P., Jordan, M. I., & Klein, D. (2012). Learning dependency-based compositional semantics. Computational Linguistics, 39 (2), 389-446.
-
(2012)
Computational Linguistics
, vol.39
, Issue.2
, pp. 389-446
-
-
Liang, P.1
Jordan, M.I.2
Klein, D.3
-
53
-
-
84960173401
-
Generating multi-sentence natural language descriptions of indoor scenes
-
Lin, D., Fidler, S., Kong, C., & Urtasun, R. (2015). Generating multi-sentence natural language descriptions of indoor scenes. In British Machine Vision Conference.
-
(2015)
British Machine Vision Conference
-
-
Lin, D.1
Fidler, S.2
Kong, C.3
Urtasun, R.4
-
54
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., & Zitnick, C. L. (2014). Microsoft COCO: Common objects in context. In European Conference on Computer Vision.
-
(2014)
European Conference on Computer Vision
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
55
-
-
3042535216
-
Distinctive image features from scale-invariant keypoints
-
Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60 (4), 91-110.
-
(2004)
International Journal of Computer Vision
, vol.60
, Issue.4
, pp. 91-110
-
-
Lowe, D.1
-
56
-
-
85007153677
-
Learning to answer questions from image using convolutional neural network
-
Ma, L., Lu, Z., & Li, H. (2016). Learning to answer questions from image using convolutional neural network. In AAAI Conference on Artificial Intelligence.
-
(2016)
AAAI Conference on Artificial Intelligence
-
-
Ma, L.1
Lu, Z.2
Li, H.3
-
57
-
-
84937822746
-
A multi-world approach to question answering about real-world scenes based on uncertain input
-
Malinowski, M., & Fritz, M. (2014a). A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in Neural Information Processing Systems.
-
(2014)
Advances in Neural Information Processing Systems
-
-
Malinowski, M.1
Fritz, M.2
-
60
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-RNN)
-
Mao, J., Xu, W., Yang, Y., Wang, J., & Yuille, A. L. (2015a). Deep captioning with multimodal recurrent neural networks (m-RNN). In International Conference on Learning Representations.
-
(2015)
International Conference on Learning Representations
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
61
-
-
84973863256
-
Learning like a child: Fast novel visual concept learning from sentence descriptions of images
-
Mao, J., Wei, X., Yang, Y., Wang, J., Huang, Z., & Yuille, A. L. (2015b). Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Mao, J.1
Wei, X.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.L.6
-
64
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A. C., Yamaguchi, K., Berg, T. L., Stratos, K., Daume, III, H., & III (2012). Midge: generating image descriptions from computer vision detections. In Conference of the European Chapter of the Association for Computational Linguistics.
-
(2012)
Conference of the European Chapter of the Association for Computational Linguistics
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.C.6
Yamaguchi, K.7
Berg, T.L.8
Stratos, K.9
Daume, H.10
-
66
-
-
0035328421
-
Modeling the shape of the scene: A holistic representation of the spatial envelope
-
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42 (3), 145-175.
-
(2001)
International Journal of Computer Vision
, vol.42
, Issue.3
, pp. 145-175
-
-
Oliva, A.1
Torralba, A.2
-
70
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
Papineni, K., Roukos, S., Ward, T., & Zhu, W.-J. (2002). BLEU: A method for automatic evaluation of machine translation. In Annual Meeting of the Association for Computational Linguistics.
-
(2002)
Annual Meeting of the Association for Computational Linguistics
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
73
-
-
84900870389
-
The SUN attribute database: Beyond categories for deeper scene understanding
-
Patterson, G., Xu, C., Su, H., & Hays, J. (2014). The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding. International Journal of Computer Vision, 108 (1-2), 59-81.
-
(2014)
International Journal of Computer Vision
, vol.108
, Issue.1-2
, pp. 59-81
-
-
Patterson, G.1
Xu, C.2
Su, H.3
Hays, J.4
-
75
-
-
84856142160
-
Weakly supervised learning of interactions between humans and objects
-
Prest, A., Schmid, C., & Ferrari, V. (2012). Weakly supervised learning of interactions between humans and objects. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34 (3), 601-614.
-
(2012)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.34
, Issue.3
, pp. 601-614
-
-
Prest, A.1
Schmid, C.2
Ferrari, V.3
-
76
-
-
85090348677
-
Collecting image annotations using amazon's mechanical turk
-
Rashtchian, C., Young, P., Hodosh, M., & Hockenmaier, J. (2010). Collecting image annotations using amazon's mechanical turk. In North American Chapter of the Association for Computational Linguistics: Human Language Technologies Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk.
-
(2010)
North American Chapter of the Association for Computational Linguistics: Human Language Technologies Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk
-
-
Rashtchian, C.1
Young, P.2
Hodosh, M.3
Hockenmaier, J.4
-
77
-
-
71749094730
-
An investigation into the validity of some metrics for automatically evaluating natural language generation systems
-
Reiter, E., & Belz, A. (2009). An investigation into the validity of some metrics for automatically evaluating natural language generation systems. Computational Linguistics, 35 (4), 529-588.
-
(2009)
Computational Linguistics
, vol.35
, Issue.4
, pp. 529-588
-
-
Reiter, E.1
Belz, A.2
-
81
-
-
84959211977
-
A dataset for movie description
-
Rohrbach, A., Rohrback, M., Tandon, N., & Schiele, B. (2015). A dataset for movie description. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Rohrbach, A.1
Rohrback, M.2
Tandon, N.3
Schiele, B.4
-
82
-
-
84898775239
-
Translating video content to natural language descriptions
-
Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., & Schiele, B. (2013). Translating Video Content to Natural Language Descriptions. In International Conference on Computer Vision.
-
(2013)
International Conference on Computer Vision
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
83
-
-
85123605149
-
Generating semantically precise scene graphs from textual descriptions for improved image retrieval
-
Schuster, S., Krishna, R., Chang, A., Fei-Fei, L., & Manning, C. D. (2015). Generating semantically precise scene graphs from textual descriptions for improved image retrieval. In Conference on Empirical Methods in Natural Language Processing Vision and Language Workshop.
-
(2015)
Conference on Empirical Methods in Natural Language Processing Vision and Language Workshop
-
-
Schuster, S.1
Krishna, R.2
Chang, A.3
Fei-Fei, L.4
Manning, C.D.5
-
84
-
-
84952235015
-
Analyzing the subject of a picture: A theoretical approach
-
Shatford, S. (1986). Analyzing the subject of a picture: A theoretical approach. Cataloging & Classification Quarterly, 6, 39-62.
-
(1986)
Cataloging & Classification Quarterly
, vol.6
, pp. 39-62
-
-
Shatford, S.1
-
85
-
-
84881536861
-
Indoor segmentation and support inference from RGBD images
-
Silberman, N., Kohli, P., Hoiem, D., & Fergus, R. (2012). Indoor segmentation and support inference from RGBD images. In European Conference on Computer Vision.
-
(2012)
European Conference on Computer Vision
-
-
Silberman, N.1
Kohli, P.2
Hoiem, D.3
Fergus, R.4
-
86
-
-
77955998009
-
Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
-
Socher, R., & Fei-Fei, L. (2010). Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In IEEE Conference on Computer Vision and Pattern Recognition.
-
(2010)
IEEE Conference on Computer Vision and Pattern Recognition
-
-
Socher, R.1
Fei-Fei, L.2
-
87
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
Socher, R., Karpathy, A., Le, Q. V., Manning, C. D., & Ng, A. (2014). Grounded Compositional Semantics for Finding and Describing Images with Sentences. Transactions of the Association for Computational Linguistics, 2, 207-218.
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, pp. 207-218
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.5
-
89
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
Thomason, J., Venugopalan, S., Guadarrama, S., Saenko, K., & Mooney, R. (2014). Integrating Language and Vision to Generate Natural Language Descriptions of Videos in the Wild. In International Conference on Computational Linguistics.
-
(2014)
International Conference on Computational Linguistics
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.5
-
90
-
-
54749092170
-
80 million tiny images: A large data set for nonparametric object and scene recognition
-
Torralba, A., Fergus, R., & Freeman, W. T. (2008). 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30 (11), 1958-1970.
-
(2008)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.30
, Issue.11
, pp. 1958-1970
-
-
Torralba, A.1
Fergus, R.2
Freeman, W.T.3
-
91
-
-
84973861187
-
Common subspace for model and similarity: Phrase learning for caption generation from images
-
Ushiku, Y., Yamaguchi, M., Mukuta, Y., & Harada, T. (2015). Common subspace for model and similarity: Phrase learning for caption generation from images. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Ushiku, Y.1
Yamaguchi, M.2
Mukuta, Y.3
Harada, T.4
-
93
-
-
85088059797
-
Im2Text and Text2Im: Associating images and texts for cross-modal retrieval
-
Verma, Y., & Jawahar, C. V. (2014). Im2Text and Text2Im: Associating Images and Texts for Cross-Modal Retrieval. In British Machine Vision Conference.
-
(2014)
British Machine Vision Conference
-
-
Verma, Y.1
Jawahar, C.V.2
-
94
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In IEEE Conference on Computer Vision and Pattern Recognition.
-
(2015)
IEEE Conference on Computer Vision and Pattern Recognition
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
95
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R., & Bengio, Y. (2015). Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning.
-
(2015)
International Conference on Machine Learning
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
-
96
-
-
84944068309
-
A distributed representation based query expansion approach for image captioning
-
Yagcioglu, S., Erdem, E., Erdem, A., & Cakici, R. (2015). A Distributed Representation Based Query Expansion Approach for Image Captioning. In Annual Meeting of the Association for Computational Linguistics.
-
(2015)
Annual Meeting of the Association for Computational Linguistics
-
-
Yagcioglu, S.1
Erdem, E.2
Erdem, A.3
Cakici, R.4
-
97
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
Yang, Y., Teo, C. L., Daume, III, H., & Aloimonos, Y. (2011). Corpus-guided sentence generation of natural images. In Conference on Empirical Methods in Natural Language Processing.
-
(2011)
Conference on Empirical Methods in Natural Language Processing
-
-
Yang, Y.1
Teo, C.L.2
Daume, H.3
Aloimonos, Y.4
-
99
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015). Describing videos by exploiting temporal structure. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
100
-
-
85026937926
-
See no evil, say no evil: Description generation from densely labeled images
-
Yatskar, M., Galley, M., Vanderwende, L., & Zettlemoyer, L. (2014). See No Evil, Say No Evil: Description Generation from Densely Labeled Images. In Joint Conference on Lexical and Computation Semantics.
-
(2014)
Joint Conference on Lexical and Computation Semantics
-
-
Yatskar, M.1
Galley, M.2
Vanderwende, L.3
Zettlemoyer, L.4
-
101
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Young, P., Lai, A., Hodosh, M., & Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2, 67-78.
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
102
-
-
84973892583
-
Visual madlibs: Fill in the blank description generation and question answering
-
Yu, L., Park, E., Berg, A. C., & Berg, T. L. (2015). Visual madlibs: Fill in the blank description generation and question answering. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Yu, L.1
Park, E.2
Berg, A.C.3
Berg, T.L.4
-
103
-
-
84973911532
-
Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
-
Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In International Conference on Computer Vision.
-
(2015)
International Conference on Computer Vision
-
-
Zhu, Y.1
Kiros, R.2
Zemel, R.3
Salakhutdinov, R.4
Urtasun, R.5
Torralba, A.6
Fidler, S.7
|