-
1
-
-
84959502295
-
-
arXiv preprint arXiv: 1505.00468
-
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: visual question answering. arXiv preprint arXiv: 1505.00468.
-
(2015)
VQA: Visual Question Answering
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
2
-
-
77951560518
-
Robust spoken instruction understanding for hri
-
Pamela J. Hinds, Hiroshi Ishiguro, Takayuki Kanda, and Peter H. Kahn Jr., editors, ACM
-
Rehj Cantrell, Matthias Scheutz, Paul W. Schermerhorn, and Xuan Wu. 2010. Robust spoken instruction understanding for hri. In Pamela J. Hinds, Hiroshi Ishiguro, Takayuki Kanda, and Peter H. Kahn Jr., editors, HRI, pages 275-282. ACM.
-
(2010)
HRI
, pp. 275-282
-
-
Cantrell, R.1
Scheutz, M.2
Schermerhorn, P.W.3
Wu, X.4
-
3
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
Stroudsburg, PA, USA. Association for Computational Linguistics
-
David L. Chen and William B. Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT'11, pages 190-200, Stroudsburg, PA, USA. Association for Computational Linguistics.
-
(2011)
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT'11
, pp. 190-200
-
-
Chen, D.L.1
Dolan, W.B.2
-
4
-
-
77952710493
-
Training a multilingual sportscaster: Using perceptual context to learn language
-
January
-
David L. Chen, Joohyun Kim, and Raymond J. Mooney. 2010. Training a multilingual sportscaster: Using perceptual context to learn language. J. Artif. Int. Res., 37(1):397-436, January.
-
(2010)
J. Artif. Int. Res.
, vol.37
, Issue.1
, pp. 397-436
-
-
Chen, D.L.1
Kim, J.2
Mooney, R.J.3
-
5
-
-
84959908834
-
Deja image-captions: A corpus of expressive descriptions in repetition
-
Denver, Colorado, May-June. Association for Computational Linguistics
-
Jianfu Chen, Polina Kuznetsova, David Warren, and Yejin Choi. 2015. Deja image-captions: A corpus of expressive descriptions in repetition. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 504-514, Denver, Colorado, May-June. Association for Computational Linguistics.
-
(2015)
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
, pp. 504-514
-
-
Chen, J.1
Kuznetsova, P.2
Warren, D.3
Choi, Y.4
-
6
-
-
84946802546
-
Long-term recurrent convolutional networks for visual recognition and description
-
1411.4389
-
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2014. Long-term recurrent convolutional networks for visual recognition and description. CoRR, abs/1411.4389.
-
(2014)
CoRR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
84946802531
-
From captions to visual concepts and back
-
1411.4952
-
Hao Fang, Saurabh Gupta, Forrest N. Iandola, Rupesh Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Piatt, C. Lawrence Zitnick, and Geoffrey Zweig. 2014. From captions to visual concepts and back. CoRR, abs/1411.4952.
-
(2014)
CoRR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Piatt, J.C.10
Lawrence Zitnick, C.11
Zweig, G.12
-
8
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Berlin, Heidelberg. Springer-Verlag
-
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In Proceedings of the 11th European Conference on Computer Vision: Part TV, ECCV'10, pages 15-29, Berlin, Heidelberg. Springer-Verlag.
-
(2010)
Proceedings of the 11th European Conference on Computer Vision: Part TV, ECCV'10
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
10
-
-
0010991761
-
Syntactic complexity
-
D. R. Dowty, L. Karttunen, and A. M. Zwicky, editors, Cambridge University Press, Cambridge
-
L. Frazier. 1985. Syntactic complexity. In D. R. Dowty, L. Karttunen, and A. M. Zwicky, editors, Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives, pages 129-189. Cambridge University Press, Cambridge.
-
(1985)
Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives
, pp. 129-189
-
-
Frazier, L.1
-
11
-
-
84959903294
-
Are you talking to a machine? Dataset and methods for multilingual image question answering
-
1505.05612
-
Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei Xu. 2015. Are you talking to a machine? dataset and methods for multilingual image question answering. CoRR, abs/1505.05612.
-
(2015)
CoRR
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
13
-
-
84943540775
-
ReferltGame: Referring to objects in photographs of natural scenes
-
Doha, Qatar, October. Association for Computational Linguistics
-
Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, and Tamara Berg. 2014. ReferltGame: Referring to Objects in Photographs of Natural Scenes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 787-798, Doha, Qatar, October. Association for Computational Linguistics.
-
(2014)
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)
, pp. 787-798
-
-
Kazemzadeh, S.1
Ordonez, V.2
Matten, M.3
Berg, T.4
-
16
-
-
34147141861
-
Situated dialogue and spatial organization: What, where... and why?
-
March
-
Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I. Christensen. 2007. Situated dialogue and spatial organization: What, where... and why? International Journal of Advanced Robotic Systems, Special Issue on Human and Robot Interactive Communication, 4(2), March.
-
(2007)
International Journal of Advanced Robotic Systems, Special Issue on Human and Robot Interactive Communication
, vol.4
, Issue.2
-
-
Kruijff, G.-J.M.1
Zender, H.2
Jensfelt, P.3
Christensen, H.I.4
-
17
-
-
84956640115
-
Microsoft COCO: Common objects in context
-
1405.0312
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. 2014. Microsoft COCO: common objects in context. CoRR, abs/1405.0312.
-
(2014)
CoRR
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollar, P.7
Zitnick, C.L.8
-
18
-
-
84937822746
-
A multiworld approach to question answering about realworld scenes based on uncertain input
-
Mateusz Malinowski and Mario Fritz. 2014. A multiworld approach to question answering about realworld scenes based on uncertain input. In Advances in Neural Information Processing Systems 27, pages 1682-1690.
-
(2014)
Advances in Neural Information Processing Systems
, vol.27
, pp. 1682-1690
-
-
Malinowski, M.1
Fritz, M.2
-
19
-
-
84959916685
-
Whats cookin? Interpreting cooking videos using text, speech and vision
-
May 31 - June 5, 2015, Denver, Colorado USA
-
Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nicholas Johnston, Andrew Rabinovich, and Kevin Murphy. 2015. Whats cookin? interpreting cooking videos using text, speech and vision. In North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HLT 2015), May 31 - June 5, 2015, Denver, Colorado USA.
-
(2015)
North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HLT 2015)
-
-
Malmaud, J.1
Huang, J.2
Rathod, V.3
Johnston, N.4
Rabinovich, A.5
Murphy, K.6
-
20
-
-
85117622017
-
The Stanford CoreNLP natural language processing toolkit
-
Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bernard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55-60.
-
(2014)
Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations
, pp. 55-60
-
-
Manning, C.D.1
Surdeanu, M.2
Bauer, J.3
Finkel, J.4
Bernard, S.J.5
McClosky, D.6
-
22
-
-
84867118595
-
A joint model of language and perception for grounded attribute learning
-
Edinburgh, Scotland, June
-
Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox. 2012. A Joint Model of Language and Perception for Grounded Attribute Learning. In Proc. of the 2012 International Conference on Machine Learning, Edinburgh, Scotland, June.
-
(2012)
Proc. of the 2012 International Conference on Machine Learning
-
-
Matuszek, C.1
FitzGerald, N.2
Zettlemoyer, L.3
Bo, L.4
Fox, D.5
-
23
-
-
84960146118
-
Discriminative unsupervised alignment of natural language instructions with corresponding video segments
-
May 31 - June 5, 2015, Denver, Colorado USA
-
Iftekhar Nairn, Young C. Song, Qiguang Liu, Liang Huang, Henry Kautz, Jiebo Luo, and Daniel Gildea. 2015. Discriminative unsupervised alignment of natural language instructions with corresponding video segments. In North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HIT 2015), May 31 - June 5, 2015, Denver, Colorado USA.
-
(2015)
North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HIT 2015)
-
-
Nairn, I.1
Song, Y.C.2
Liu, Q.3
Huang, L.4
Kautz, H.5
Luo, J.6
Gildea, D.7
-
25
-
-
85090348677
-
Collecting image annotations using amazon's mechanical turk
-
Stroudsburg, PA, USA. Association for Computational Linguistics
-
Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using amazon's mechanical turk. In Proceedings of the NAACL HIT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT'10, pages 139-147, Stroudsburg, PA, USA. Association for Computational Linguistics.
-
(2010)
Proceedings of the NAACL HIT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT'10
, pp. 139-147
-
-
Rashtchian, C.1
Young, P.2
Hodosh, M.3
Hockenmaier, J.4
-
26
-
-
84898785648
-
Grounding action descriptions in videos
-
Michaela Regneri, Marcus Rohrbach, Dominikus Wetzel, Stefan Thater, Bernt Schiele, and Manfred Pinkal. 2013. Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics (TACL), 1:25-36.
-
(2013)
Transactions of the Association for Computational Linguistics (TACL)
, vol.1
, pp. 25-36
-
-
Regneri, M.1
Rohrbach, M.2
Wetzel, D.3
Thater, S.4
Schiele, B.5
Pinkal, M.6
-
28
-
-
84866710901
-
A database for fine grained activity detection of cooking activities
-
IEEE, IEEE, June
-
Marcus Rohrbach, Sikandar Amin, Mykhaylo Andriluka, and Bernt Schiele. 2012. A database for fine grained activity detection of cooking activities. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, IEEE, June.
-
(2012)
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Rohrbach, M.1
Amin, S.2
Andriluka, M.3
Schiele, B.4
-
29
-
-
33748863029
-
Conversational robots: Building blocks for grounding word meaning
-
Stroudsburg, PA, USA. Association for Computational Linguistics
-
Deb Roy, Kai-Yuh Hsiao, and Nikolaos Mavridis. 2003. Conversational robots: Building blocks for grounding word meaning. In Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-linguistic Data - Volume 6, HLT-NAACL-LWM'04, pages 70-77, Stroudsburg, PA, USA. Association for Computational Linguistics.
-
(2003)
Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-linguistic Data - Volume 6, HLT-NAACL-LWM'04
, pp. 70-77
-
-
Roy, D.1
Hsiao, K.-Y.2
Mavridis, N.3
-
30
-
-
84949572890
-
-
arXiv preprint arXiv:1503.01817
-
Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817.
-
(2015)
The New Data and New Challenges in Multimedia Research
-
-
Thomee, B.1
Shamma, D.A.2
Friedland, G.3
Elizalde, B.4
Ni, K.5
Poland, D.6
Borth, D.7
Li, L.-J.8
-
31
-
-
80052908300
-
Unbiased look at dataset bias
-
Washington, DC, USA. IEEE Computer Society
-
A. Torralba and A. A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR'11, pages 1521-1528, Washington, DC, USA. IEEE Computer Society.
-
(2011)
Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR'11
, pp. 1521-1528
-
-
Torralba, A.1
Efros, A.A.2
-
32
-
-
85006151700
-
An amr parser for english, French, German, Spanish and Japanese and a new amr-annotated corpus
-
June
-
Lucy Vanderwende, Arul Menezes, and Chris Quirk. 2015. An amr parser for english, french, german, Spanish and Japanese and a new amr-annotated corpus. Proceedings of NAACL 2015, June.
-
(2015)
Proceedings of NAACL 2015
-
-
Vanderwende, L.1
Menezes, A.2
Quirk, C.3
-
33
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
Denver, Colorado, June
-
Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, and Kate Saenko. 2015. Translating videos to natural language using deep recurrent neural networks. In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HIT 2015), pages 1494-1504, Denver, Colorado, June.
-
(2015)
Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HIT 2015)
, pp. 1494-1504
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
35
-
-
85026937926
-
See no evil, say no evil: Description generation from densely labeled images
-
Dublin, Ireland, August. Association for Computational Linguistics and Dublin City University
-
∗SEM 2014), pages 110-120, Dublin, Ireland, August. Association for Computational Linguistics and Dublin City University.
-
(2014)
∗SEM 2014)
, pp. 110-120
-
-
Yatskar, M.1
Galley, M.2
Vanderwende, L.3
Zettlemoyer, L.4
-
37
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67-78.
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
40
-
-
84898772194
-
Learning the visual interpretation of sentences
-
Sydney, Australia, December 1-8, 2013
-
C. Lawrence Zitnick, Devi Parikh, and Lucy Vanderwende. 2013. Learning the visual interpretation of sentences. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 1681-1688.
-
(2013)
IEEE International Conference on Computer Vision, ICCV 2013
, pp. 1681-1688
-
-
Lawrence Zitnick, C.1
Parikh, D.2
Vanderwende, L.3
|