-
1
-
-
85044300992
-
-
Torch. http://torch.ch/.
-
-
-
-
2
-
-
85072836458
-
Sort story: Sorting jumbled images and captions into stories
-
H. Agrawal, A. Chandrasekaran, D. Batra, D. Parikh, and M. Bansal. Sort story: Sorting jumbled images and captions into stories. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Agrawal, H.1
Chandrasekaran, A.2
Batra, D.3
Parikh, D.4
Bansal, M.5
-
3
-
-
85044317065
-
-
Amazon. Alexa. http://alexa.amazon.com/.
-
-
-
-
4
-
-
84973890960
-
VQA: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual Question Answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
5
-
-
78649587763
-
VizWiz: Nearly real-time answers to visual questions
-
J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White, S. White, and T. Yeh. VizWiz: Nearly Real-time Answers to Visual Questions. In UIST, 2010.
-
(2010)
UIST
-
-
Bigham, J.P.1
Jayant, C.2
Ji, H.3
Little, G.4
Miller, A.5
Miller, R.C.6
Miller, R.7
Tatarowicz, A.8
White, B.9
White, S.10
Yeh, T.11
-
8
-
-
85072845519
-
Resolving language and vision ambiguities together: Joint segmentation and prepositional attachment resolution in captioned scenes
-
G. Christie, A. Laddha, A. Agrawal, S. Antol, Y. Goyal, K. Kochersberger, and D. Batra. Resolving language and vision ambiguities together: Joint segmentation and prepositional attachment resolution in captioned scenes. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Christie, G.1
Laddha, A.2
Agrawal, A.3
Antol, S.4
Goyal, Y.5
Kochersberger, K.6
Batra, D.7
-
9
-
-
85072846928
-
Human attention in visual question answering: Do humans and deep networks look at the same regions?
-
A. Das, H. Agrawal, C. L. Zitnick, D. Parikh, and D. Batra. Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Das, A.1
Agrawal, H.2
Zitnick, C.L.3
Parikh, D.4
Batra, D.5
-
10
-
-
85041919303
-
GuessWhat?! Visual object discovery through multi-modal dialogue
-
H. de Vries, F. Strub, S. Chandar, O. Pietquin, H. Larochelle, and A. C. Courville. GuessWhat?! Visual object discovery through multi-modal dialogue. In CVPR, 2017.
-
(2017)
CVPR
-
-
De Vries, H.1
Strub, F.2
Chandar, S.3
Pietquin, O.4
Larochelle, H.5
Courville, A.C.6
-
11
-
-
85083950683
-
Evaluating prerequisite qualities for learning end-to-end dialog systems
-
J. Dodge, A. Gane, X. Zhang, A. Bordes, S. Chopra, A. Miller, A. Szlam, and J. Weston. Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems. In ICLR, 2016.
-
(2016)
ICLR
-
-
Dodge, J.1
Gane, A.2
Zhang, X.3
Bordes, A.4
Chopra, S.5
Miller, A.6
Szlam, A.7
Weston, J.8
-
12
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term Recurrent Convolutional Networks for Visual Recognition and Description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
13
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. N. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From Captions to Visual Concepts and Back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
14
-
-
84965148420
-
Are you talking to a machine? Dataset and methods for multilingual image question answering
-
H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering. In NIPS, 2015.
-
(2015)
NIPS
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
16
-
-
85041900002
-
Making the v in vqa matter: Elevating the role of image understanding in visual question answering
-
Y. Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh. Making the v in vqa matter: Elevating the role of image understanding in visual question answering. In CVPR, 2017.
-
(2017)
CVPR
-
-
Goyal, Y.1
Khot, T.2
Summers-Stay, D.3
Batra, D.4
Parikh, D.5
-
17
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep Residual Learning for Image Recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
18
-
-
84965139942
-
Teaching machines to read and comprehend
-
K. M. Hermann, T. Kocisky, E. Grefenstette, L. Espeholt, W. Kay, M. Suleyman, and P. Blunsom. Teaching machines to read and comprehend. In NIPS, 2015.
-
(2015)
NIPS
-
-
Hermann, K.M.1
Kocisky, T.2
Grefenstette, E.3
Espeholt, L.4
Kay, W.5
Suleyman, M.6
Blunsom, P.7
-
19
-
-
85030448950
-
Segmentation from natural language expressions
-
R. Hu, M. Rohrbach, and T. Darrell. Segmentation from natural language expressions. In ECCV, 2016.
-
(2016)
ECCV
-
-
Hu, R.1
Rohrbach, M.2
Darrell, T.3
-
20
-
-
84994137684
-
Visual storytelling
-
T.-H. Huang, F. Ferraro, N. Mostafazadeh, I. Misra, A. Agrawal, J. Devlin, R. Girshick, X. He, P. Kohli, D. Batra, L. Zitnick, D. Parikh, L. Vanderwende, M. Galley, and M. Mitchell. Visual storytelling. In NAACL HLT, 2016.
-
(2016)
NAACL HLT
-
-
Huang, T.-H.1
Ferraro, F.2
Mostafazadeh, N.3
Misra, I.4
Agrawal, A.5
Devlin, J.6
Girshick, R.7
He, X.8
Kohli, P.9
Batra, D.10
Zitnick, L.11
Parikh, D.12
Vanderwende, L.13
Galley, M.14
Mitchell, M.15
-
21
-
-
85041926703
-
Revisiting visual question answering baselines
-
A. Jabri, A. Joulin, and L. van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016.
-
(2016)
ECCV
-
-
Jabri, A.1
Joulin, A.2
Van Der Maaten, L.3
-
22
-
-
84984985591
-
Smart reply: Automated response suggestion for email
-
A. Kannan, K. Kurach, S. Ravi, T. Kaufmann, A. Tomkins, B. Miklos, G. Corrado, L. Lukács, M. Ganea, P. Young, et al. Smart Reply: Automated Response Suggestion for Email. In KDD, 2016.
-
(2016)
KDD
-
-
Kannan, A.1
Kurach, K.2
Ravi, S.3
Kaufmann, T.4
Tomkins, A.5
Miklos, B.6
Corrado, G.7
Lukács, L.8
Ganea, M.9
Young, P.10
-
23
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
24
-
-
84911370987
-
What are you talking about? Text-to-image coreference
-
C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-to-image coreference. In CVPR, 2014.
-
(2014)
CVPR
-
-
Kong, C.1
Lin, D.2
Bansal, M.3
Urtasun, R.4
Fidler, S.5
-
25
-
-
84893350028
-
An ISU dialogue system exhibiting reinforcement learning of dialogue policies: Generic slot-filling in the TALK in-car system
-
O. Lemon, K. Georgila, J. Henderson, and M. Stuttle. An ISU dialogue system exhibiting reinforcement learning of dialogue policies: generic slot-filling in the TALK in-car system. In EACL, 2006.
-
(2006)
EACL
-
-
Lemon, O.1
Georgila, K.2
Henderson, J.3
Stuttle, M.4
-
26
-
-
85029377314
-
Deep reinforcement learning for dialogue generation
-
J. Li, W. Monroe, A. Ritter, M. Galley, J. Gao, and D. Jurafsky. Deep Reinforcement Learning for Dialogue Generation. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Li, J.1
Monroe, W.2
Ritter, A.3
Galley, M.4
Gao, J.5
Jurafsky, D.6
-
27
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common Objects in Context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
28
-
-
85072827450
-
How NOT to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation
-
C.-W. Liu, R. Lowe, I. V. Serban, M. Noseworthy, L. Charlin, and J. Pineau. How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Liu, C.-W.1
Lowe, R.2
Serban, I.V.3
Noseworthy, M.4
Charlin, L.5
Pineau, J.6
-
29
-
-
85011302702
-
SSD: Single shot MultiBox detector
-
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg. SSD: Single Shot MultiBox Detector. In ECCV, 2016.
-
(2016)
ECCV
-
-
Liu, W.1
Anguelov, D.2
Erhan, D.3
Szegedy, C.4
Reed, S.5
Fu, C.-Y.6
Berg, A.C.7
-
30
-
-
84988430909
-
The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems
-
R. Lowe, N. Pow, I. Serban, and J. Pineau. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. In SIGDIAL, 2015.
-
(2015)
SIGDIAL
-
-
Lowe, R.1
Pow, N.2
Serban, I.3
Pineau, J.4
-
32
-
-
85018917850
-
Hierarchical question-image co-attention for visual question answering
-
J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS, 2016.
-
(2016)
NIPS
-
-
Lu, J.1
Yang, J.2
Batra, D.3
Parikh, D.4
-
33
-
-
84937822746
-
A multi-world approach to question answering about real-world scenes based on uncertain input
-
M. Malinowski and M. Fritz. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In NIPS, 2014.
-
(2014)
NIPS
-
-
Malinowski, M.1
Fritz, M.2
-
34
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
-
(2015)
ICCV
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
35
-
-
85007207124
-
Listen, attend, and walk: Neural mapping of navigational instructions to action sequences
-
H. Mei, M. Bansal, and M. R. Walter. Listen, attend, and walk: Neural mapping of navigational instructions to action sequences. In AAAI, 2016.
-
(2016)
AAAI
-
-
Mei, H.1
Bansal, M.2
Walter, M.R.3
-
36
-
-
84924051598
-
Human-level control through deep reinforcement learning
-
02
-
V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis. Human-level control through deep reinforcement learning. Nature, 518(7540):529-533, 02 2015.
-
(2015)
Nature
, vol.518
, Issue.7540
, pp. 529-533
-
-
Mnih, V.1
Kavukcuoglu, K.2
Silver, D.3
Rusu, A.A.4
Veness, J.5
Bellemare, M.G.6
Graves, A.7
Riedmiller, M.8
Fidjeland, A.K.9
Ostrovski, G.10
Petersen, S.11
Beattie, C.12
Sadik, A.13
Antonoglou, I.14
King, H.15
Kumaran, D.16
Wierstra, D.17
Legg, S.18
Hassabis, D.19
-
37
-
-
85041918797
-
-
arXiv preprint
-
N. Mostafazadeh, C. Brockett, B. Dolan, M. Galley, J. Gao, G. P. Spithourakis, and L. Vanderwende. Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation. arXiv preprint arXiv:1701.08251, 2017.
-
(2017)
Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation
-
-
Mostafazadeh, N.1
Brockett, C.2
Dolan, B.3
Galley, M.4
Gao, J.5
Spithourakis, G.P.6
Vanderwende, L.7
-
39
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
-
B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In ICCV, 2015.
-
(2015)
ICCV
-
-
Plummer, B.A.1
Wang, L.2
Cervantes, C.M.3
Caicedo, J.C.4
Hockenmaier, J.5
Lazebnik, S.6
-
40
-
-
85071396128
-
SQuAD: 100,000+ questions for machine comprehension of text
-
P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang. SQuAD: 100,000+ Questions for Machine Comprehension of Text. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Rajpurkar, P.1
Zhang, J.2
Lopyrev, K.3
Liang, P.4
-
41
-
-
84943782750
-
Linking people with "their" names using coreference resolution
-
V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking people with "their" names using coreference resolution. In ECCV, 2014.
-
(2014)
ECCV
-
-
Ramanathan, V.1
Joulin, A.2
Liang, P.3
Fei-Fei, L.4
-
42
-
-
85072826753
-
Question relevance in VQA: Identifying non-visual and false-premise questions
-
A. Ray, G. Christie, M. Bansal, D. Batra, and D. Parikh. Question Relevance in VQA: Identifying Non-Visual And False-Premise Questions. In EMNLP, 2016.
-
(2016)
EMNLP
-
-
Ray, A.1
Christie, G.2
Bansal, M.3
Batra, D.4
Parikh, D.5
-
43
-
-
84965170394
-
Exploring models and data for image question answering
-
M. Ren, R. Kiros, and R. Zemel. Exploring Models and Data for Image Question Answering. In NIPS, 2015.
-
(2015)
NIPS
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
44
-
-
84990024294
-
Grounding of textual phrases in images by reconstruction
-
A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele. Grounding of textual phrases in images by reconstruction. In ECCV, 2016.
-
(2016)
ECCV
-
-
Rohrbach, A.1
Rohrbach, M.2
Hu, R.3
Darrell, T.4
Schiele, B.5
-
46
-
-
85011954479
-
Generating factoid questions with recurrent neural networks: The 30M factoid question-answer corpus
-
I. V. Serban, A. García-Durán, Ç. Gülçehre, S. Ahn, S. Chandar, A. C. Courville, and Y. Bengio. Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus. In ACL, 2016.
-
(2016)
ACL
-
-
Serban, I.V.1
García-Durán, A.2
Gülçehre, C.3
Ahn, S.4
Chandar, S.5
Courville, A.C.6
Bengio, Y.7
-
47
-
-
84980367197
-
Building end-to-end dialogue systems using generative hierarchical neural network models
-
I. V. Serban, A. Sordoni, Y. Bengio, A. Courville, and J. Pineau. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models. In AAAI, 2016.
-
(2016)
AAAI
-
-
Serban, I.V.1
Sordoni, A.2
Bengio, Y.3
Courville, A.4
Pineau, J.5
-
48
-
-
85030483080
-
-
arXiv preprint
-
I. V. Serban, A. Sordoni, R. Lowe, L. Charlin, J. Pineau, A. Courville, and Y. Bengio. A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues. arXiv preprint arXiv:1605.06069, 2016.
-
(2016)
A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues
-
-
Serban, I.V.1
Sordoni, A.2
Lowe, R.3
Charlin, L.4
Pineau, J.5
Courville, A.6
Bengio, Y.7
-
49
-
-
84963949906
-
Mastering the game of go with deep neural networks and tree search
-
D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van Den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, et al. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587):484-489, 2016.
-
(2016)
Nature
, vol.529
, Issue.7587
, pp. 484-489
-
-
Silver, D.1
Huang, A.2
Maddison, C.J.3
Guez, A.4
Sifre, L.5
Van Den Driessche, G.6
Schrittwieser, J.7
Antonoglou, I.8
Panneershelvam, V.9
Lanctot, M.10
-
50
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
51
-
-
84986296727
-
MovieQA: Understanding stories in movies through question-answering
-
M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler. MovieQA: Understanding Stories in Movies through Question-Answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Tapaswi, M.1
Zhu, Y.2
Stiefelhagen, R.3
Torralba, A.4
Urtasun, R.5
Fidler, S.6
-
52
-
-
84901405262
-
Joint video and text parsing for understanding events and answering queries
-
K. Tu, M. Meng, M. W. Lee, T. E. Choe, and S. C. Zhu. Joint Video and Text Parsing for Understanding Events and Answering Queries. IEEE MultiMedia, 2014.
-
(2014)
IEEE MultiMedia
-
-
Tu, K.1
Meng, M.2
Lee, M.W.3
Choe, T.E.4
Zhu, S.C.5
-
53
-
-
84973882730
-
Sequence to sequence - Video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko. Sequence to Sequence - Video to Text. In ICCV, 2015.
-
(2015)
ICCV
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.J.4
Darrell, T.5
Saenko, K.6
-
54
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating Videos to Natural Language Using Deep Recurrent Neural Networks. In NAACL HLT, 2015.
-
(2015)
NAACL HLT
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.J.5
Saenko, K.6
-
57
-
-
85013362186
-
-
arXiv preprint
-
L. Wang, S. Guo, W. Huang, Y. Xiong, and Y. Qiao. Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs. arXiv preprint arXiv:1610.01119, 2016.
-
(2016)
Knowledge Guided Disambiguation for Large-Scale Scene Classification with Multi-Resolution CNNs
-
-
Wang, L.1
Guo, S.2
Huang, W.3
Xiong, Y.4
Qiao, Y.5
-
58
-
-
85044310149
-
-
J. Weizenbaum. ELIZA. http://psych.fullerton.edu/mbirnbaum/psych101/Eliza.htm.
-
ELIZA
-
-
Weizenbaum, J.1
-
59
-
-
85083951707
-
Towards AI-complete question answering: A set of prerequisite toy tasks
-
J. Weston, A. Bordes, S. Chopra, and T. Mikolov. Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks. In ICLR, 2016.
-
(2016)
ICLR
-
-
Weston, J.1
Bordes, A.2
Chopra, S.3
Mikolov, T.4
-
61
-
-
84986334021
-
Stacked attention networks for image question answering
-
Z. Yang, X. He, J. Gao, L. Deng, and A. J. Smola. Stacked Attention Networks for Image Question Answering. In CVPR, 2016.
-
(2016)
CVPR
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.J.5
-
62
-
-
84986278354
-
Yin and yang: Balancing and answering binary visual questions
-
P. Zhang, Y. Goyal, D. Summers-Stay, D. Batra, and D. Parikh. Yin and Yang: Balancing and Answering Binary Visual Questions. In CVPR, 2016.
-
(2016)
CVPR
-
-
Zhang, P.1
Goyal, Y.2
Summers-Stay, D.3
Batra, D.4
Parikh, D.5
-
64
-
-
85018934522
-
Measuring machine intelligence through visual question answering
-
C. L. Zitnick, A. Agrawal, S. Antol, M. Mitchell, D. Batra, and D. Parikh. Measuring machine intelligence through visual question answering. AI Magazine, 2016.
-
(2016)
AI Magazine
-
-
Zitnick, C.L.1
Agrawal, A.2
Antol, S.3
Mitchell, M.4
Batra, D.5
Parikh, D.6
|