-
1
-
-
84973890960
-
VQA: Visual Question Answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual Question Answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
2
-
-
84887366672
-
Semisupervised Learning with Constraints for Person Identification in Multimedia Data
-
M. Baeuml, M. Tapaswi, and R. Stiefelhagen. Semisupervised Learning with Constraints for Person Identification in Multimedia Data. In CVPR, 2013.
-
(2013)
CVPR
-
-
Baeuml, M.1
Tapaswi, M.2
Stiefelhagen, R.3
-
3
-
-
84885996388
-
Video-in-sentences out
-
A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, L. Schmidt, J. Shangguan, J. Siskind, J. Waggoner, S. Wang, J. Wei, Y. Yin, and Z. Zhang. Video-In-sentences Out. In UAI, 2012.
-
(2012)
UAI
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Narayanaswamy, S.9
Salvi, D.10
Schmidt, L.11
Shangguan, J.12
Siskind, J.13
Waggoner, J.14
Wang, S.15
Wei, J.16
Yin, Y.17
Zhang, Z.18
-
4
-
-
84898792367
-
Finding actors and actions in movies
-
P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Finding Actors and Actions in Movies. ICCV, pages 2280-2287, 2013.
-
(2013)
ICCV
, pp. 2280-2287
-
-
Bojanowski, P.1
Bach, F.2
Laptev, I.3
Ponce, J.4
Schmid, C.5
Sivic, J.6
-
5
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
-
(2011)
ACL
-
-
Chen, D.L.1
Dolan, W.B.2
-
7
-
-
70450145539
-
Movie/script: Alignment and parsing of video and text transcription
-
T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar. Movie/Script: Alignment and Parsing of Video and Text Transcription. In ECCV, 2008.
-
(2008)
ECCV
-
-
Cour, T.1
Jordan, C.2
Miltsakaki, E.3
Taskar, B.4
-
8
-
-
84887345951
-
A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching
-
P. Das, C. Xu, R. F. Doell, and J. J. Corso. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching. CVPR, 2013.
-
(2013)
CVPR
-
-
Das, P.1
Xu, C.2
Doell, R.F.3
Corso, J.J.4
-
9
-
-
84887345951
-
A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
-
P. Das, C. Xu, R. F. Doell, and J. J. Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR, 2013.
-
(2013)
CVPR
-
-
Das, P.1
Xu, C.2
Doell, R.F.3
Corso, J.J.4
-
10
-
-
85009912425
-
-
arXiv:1411.4389
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In arXiv:1411.4389, 2014.
-
(2014)
Long-term Recurrent Convolutional Networks for Visual Recognition and Description
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
11
-
-
80051961229
-
Every picture tells a story: Generating sentences for images
-
A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every Picture Tells a Story: Generating Sentences for Images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
12
-
-
84959928474
-
-
arXiv:1506.03340
-
K. M. Hermann, T. Ko?cisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom. Teaching Machines to Read and Comprehend. In arXiv:1506.03340, 2015.
-
(2015)
Teaching Machines to Read and Comprehend
-
-
Hermann, K.M.1
Kocisky, T.2
Grefenstette, E.3
Espeholt, L.4
Kay, W.5
Suleyman, M.6
Blunsom, P.7
-
13
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
15
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. TACL, 2015.
-
(2015)
TACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
16
-
-
84965153327
-
Skip-Thought Vectors
-
R. Kiros, Y. Zhu, R. Salakhutdinov, R. Zemel, A. Torralba, R. Urtasun, and S. Fidler. Skip-Thought Vectors. NIPS, 2015.
-
(2015)
NIPS
-
-
Kiros, R.1
Zhu, Y.2
Salakhutdinov, R.3
Zemel, R.4
Torralba, A.5
Urtasun, R.6
Fidler, S.7
-
17
-
-
84911370987
-
What are you talking about? Text-to-image coreference
-
C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? Text-to-Image Coreference. In CVPR, 2014.
-
(2014)
CVPR
-
-
Kong, C.1
Lin, D.2
Bansal, M.3
Urtasun, R.4
Fidler, S.5
-
18
-
-
84893398951
-
Generating natural-language video descriptions using text-mined knowledge
-
July
-
N. Krishnamoorthy, G. Malkarnenkar, R. J. Mooney, K. Saenko, and S. Guadarrama. Generating Natural-Language Video Descriptions Using Text-Mined Knowledge. In AAAI, July 2013.
-
(2013)
AAAI
-
-
Krishnamoorthy, N.1
Malkarnenkar, G.2
Mooney, R.J.3
Saenko, K.4
Guadarrama, S.5
-
19
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. Berg, and T. Berg. Baby Talk: Understanding and Generating Simple Image Descriptions. In CVPR, 2011.
-
(2011)
CVPR
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.6
Berg, T.7
-
21
-
-
84911442106
-
Visual semantic search: Retrieving videos via complex textual queries
-
D. Lin, S. Fidler, C. Kong, and R. Urtasun. Visual Semantic Search: Retrieving Videos via Complex Textual Queries. CVPR, 2014.
-
(2014)
CVPR
-
-
Lin, D.1
Fidler, S.2
Kong, C.3
Urtasun, R.4
-
22
-
-
85009931853
-
Microsoft COCO: Common Objects in Context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In ECCV. 2014.
-
(2014)
ECCV.
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
23
-
-
84937822746
-
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
-
M. Malinowski and M. Fritz. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In NIPS, 2014.
-
(2014)
NIPS
-
-
Malinowski, M.1
Fritz, M.2
-
24
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
M. Malinowski, M. Rohrbach, and M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images. In ICCV, 2015.
-
(2015)
ICCV
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
26
-
-
85162522202
-
Im2Text: Describing images using 1 million captioned photographs
-
V. Ordonez, G. Kulkarni, and T. Berg. Im2Text: Describing Images Using 1 Million Captioned Photographs. In NIPS, 2011.
-
(2011)
NIPS
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.3
-
28
-
-
84943782750
-
Linking People in Videos with" Their" Names Using Coreference Resolution
-
V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking People in Videos with "Their" Names Using Coreference Resolution. In ECCV. 2014.
-
(2014)
ECCV.
-
-
Ramanathan, V.1
Joulin, A.2
Liang, P.3
Fei-Fei, L.4
-
29
-
-
84898775557
-
Video Event Understanding using Natural Language Descriptions
-
V. Ramanathan, P. Liang, and L. Fei-Fei. Video Event Understanding using Natural Language Descriptions. In ICCV, 2013.
-
(2013)
ICCV
-
-
Ramanathan, V.1
Liang, P.2
Fei-Fei, L.3
-
31
-
-
84926345282
-
Mctest: A challenge dataset for the open-domain machine comprehension of text
-
M. Richardson, C. J. Burges, and E. Renshaw. Mctest: A challenge dataset for the open-domain machine comprehension of text. In EMNLP, 2013.
-
(2013)
EMNLP
-
-
Richardson, M.1
Burges, C.J.2
Renshaw, E.3
-
33
-
-
84898775239
-
Translating video content to natural language descriptions
-
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating Video Content to Natural Language Descriptions. In ICCV, 2013.
-
(2013)
ICCV
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
35
-
-
70450202706
-
Who are you?"-Learning person specific classifiers from video
-
J. Sivic, M. Everingham, and A. Zisserman. "Who are you?"-Learning person specific classifiers from video. CVPR, pages 1145-1152, 2009.
-
(2009)
CVPR
, pp. 1145-1152
-
-
Sivic, J.1
Everingham, M.2
Zisserman, A.3
-
37
-
-
85009879494
-
-
arXiv:1409.4842
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv:1409.4842, 2014.
-
(2014)
Going Deeper with Convolutions
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
38
-
-
84959255361
-
Book2Movie: Aligning video scenes with book chapters
-
M. Tapaswi, M. Bauml, and R. Stiefelhagen. Book2Movie: Aligning Video scenes with Book chapters. In CVPR, 2015.
-
(2015)
CVPR
-
-
Tapaswi, M.1
Bauml, M.2
Stiefelhagen, R.3
-
39
-
-
84977834021
-
Aligning plot synopses to videos for story-based retrieval
-
M. Tapaswi, M. Bäuml, and R. Stiefelhagen. Aligning Plot Synopses to Videos for Story-based Retrieval. IJMIR, 4:3-16, 2015.
-
(2015)
IJMIR
, vol.4
, pp. 3-16
-
-
Tapaswi, M.1
Bäuml, M.2
Stiefelhagen, R.3
-
40
-
-
84973926486
-
Learning common sense through visual abstraction
-
R. Vedantam, X. Lin, T. Batra, C. L. Zitnick, and D. Parikh. Learning Common Sense Through Visual Abstraction. In ICCV, 2015.
-
(2015)
ICCV
-
-
Vedantam, R.1
Lin, X.2
Batra, T.3
Zitnick, C.L.4
Parikh, D.5
-
41
-
-
84944069490
-
Translating videos to natural language using deep recurrent neural networks
-
abs/1312.6229, cs.CV
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating Videos to Natural Language Using Deep Recurrent Neural Networks. CoRR abs/1312.6229, cs.CV, 2014.
-
(2014)
CoRR
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.J.5
Saenko, K.6
-
43
-
-
84944062514
-
Machine comprehension with syntax, frames, and semantics
-
H. Wang, M. Bansal, K. Gimpel, and D. McAllester. Machine Comprehension with Syntax, Frames, and Semantics. In ACL, 2015.
-
(2015)
ACL
-
-
Wang, H.1
Bansal, M.2
Gimpel, K.3
McAllester, D.4
-
45
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
Y. Yang, C. L. Teo, H. Daumé, III, and Y. Aloimonos. Corpus-guided Sentence Generation of Natural Images. In EMNLP, pages 444-454, 2011.
-
(2011)
EMNLP
, pp. 444-454
-
-
Yang, Y.1
Teo, C.L.2
Daumé, H.3
Aloimonos, Y.4
-
46
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In TACL, 2014.
-
(2014)
TACL
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
47
-
-
84959862697
-
Visual madlibs: Fill in the blank image generation and question answering
-
L. Yu, E. Park, A. C. Berg, and T. L. Berg. Visual Madlibs: Fill in the blank Image Generation and Question Answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Yu, L.1
Park, E.2
Berg, A.C.3
Berg, T.L.4
-
48
-
-
84937964578
-
Learning deep features for scene recognition using places database
-
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning Deep Features for Scene Recognition using Places Database. In NIPS, 2014.
-
(2014)
NIPS
-
-
Zhou, B.1
Lapedriza, A.2
Xiao, J.3
Torralba, A.4
Oliva, A.5
-
49
-
-
84973911532
-
Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
-
Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler. Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books. In ICCV, 2015.
-
(2015)
ICCV
-
-
Zhu, Y.1
Kiros, R.2
Zemel, R.3
Salakhutdinov, R.4
Urtasun, R.5
Torralba, A.6
Fidler, S.7
-
50
-
-
84959182108
-
Adopting abstract images for semantic scene understanding
-
C. Zitnick, R. Vedantam, and D. Parikh. Adopting abstract images for semantic scene understanding. PAMI, PP, 2014.
-
(2014)
PAMI
-
-
Zitnick, C.1
Vedantam, R.2
Parikh, D.3
|