SCOPUS 정보 검색 플랫폼

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

Volumn , Issue , 2015, Pages 207-213

A survey of current datasets for vision and language research

(7) Ferraro, Francis a Mostafazadeh, Nasrin b Huang, Ting Hao c Vanderwende, Lucy d Devlin, Jacob d Galley, Michel d Mitchell, Margaret d

a MICROSOFT RESEARCH (United States)

b JOHNS HOPKINS UNIVERSITY (United States)

c UNIVERSITY OF ROCHESTER (United States)

d CARNEGIE MELLON UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE;

ABSTRACT CONCEPT; QUALITY METRICS;

NATURAL LANGUAGE PROCESSING SYSTEMS;

EID: 84959904882 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.18653/v1/d15-1021 Document Type: Conference Paper

Times cited : (56)

References (40)

1
- 84959502295
- arXiv preprint arXiv: 1505.00468
- Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C. Lawrence Zitnick, and Devi Parikh. 2015. VQA: visual question answering. arXiv preprint arXiv: 1505.00468.
- (2015) VQA: Visual Question Answering
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Lawrence Zitnick, C.⁶ Parikh, D.⁷

2
- 77951560518
- Robust spoken instruction understanding for hri
- Pamela J. Hinds, Hiroshi Ishiguro, Takayuki Kanda, and Peter H. Kahn Jr., editors, ACM
- Rehj Cantrell, Matthias Scheutz, Paul W. Schermerhorn, and Xuan Wu. 2010. Robust spoken instruction understanding for hri. In Pamela J. Hinds, Hiroshi Ishiguro, Takayuki Kanda, and Peter H. Kahn Jr., editors, HRI, pages 275-282. ACM.
- (2010) HRI , pp. 275-282
- Cantrell, R.¹ Scheutz, M.² Schermerhorn, P.W.³ Wu, X.⁴

3
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- Stroudsburg, PA, USA. Association for Computational Linguistics
- David L. Chen and William B. Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT'11, pages 190-200, Stroudsburg, PA, USA. Association for Computational Linguistics.
- (2011) Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, HLT'11 , pp. 190-200
- Chen, D.L.¹ Dolan, W.B.²

4
- 77952710493
- Training a multilingual sportscaster: Using perceptual context to learn language
- January
- David L. Chen, Joohyun Kim, and Raymond J. Mooney. 2010. Training a multilingual sportscaster: Using perceptual context to learn language. J. Artif. Int. Res., 37(1):397-436, January.
- (2010) J. Artif. Int. Res. , vol.37 , Issue.1 , pp. 397-436
- Chen, D.L.¹ Kim, J.² Mooney, R.J.³

5
- 84959908834
- Deja image-captions: A corpus of expressive descriptions in repetition
- Denver, Colorado, May-June. Association for Computational Linguistics
- Jianfu Chen, Polina Kuznetsova, David Warren, and Yejin Choi. 2015. Deja image-captions: A corpus of expressive descriptions in repetition. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 504-514, Denver, Colorado, May-June. Association for Computational Linguistics.
- (2015) Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pp. 504-514
- Chen, J.¹ Kuznetsova, P.² Warren, D.³ Choi, Y.⁴

6
- 84946802546
- Long-term recurrent convolutional networks for visual recognition and description
- 1411.4389
- Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2014. Long-term recurrent convolutional networks for visual recognition and description. CoRR, abs/1411.4389.
- (2014) CoRR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

7
- 84946802531
- From captions to visual concepts and back
- 1411.4952
- Hao Fang, Saurabh Gupta, Forrest N. Iandola, Rupesh Srivastava, Li Deng, Piotr Dollar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Piatt, C. Lawrence Zitnick, and Geoffrey Zweig. 2014. From captions to visual concepts and back. CoRR, abs/1411.4952.
- (2014) CoRR
- Fang, H.¹ Gupta, S.² Iandola, F.N.³ Srivastava, R.⁴ Deng, L.⁵ Dollar, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Piatt, J.C.¹⁰ Lawrence Zitnick, C.¹¹ Zweig, G.¹²

8
- 78149311145
- Every picture tells a story: Generating sentences from images
- Berlin, Heidelberg. Springer-Verlag
- Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In Proceedings of the 11th European Conference on Computer Vision: Part TV, ECCV'10, pages 15-29, Berlin, Heidelberg. Springer-Verlag.
- (2010) Proceedings of the 11th European Conference on Computer Vision: Part TV, ECCV'10 , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

9
- 0004281803
- Brown University, Providence, Rhode Island, USA
- W Nelson Francis and Henry Kucera. 1979. Brown Corpus manual: Manual of information to accompany a standard corpus of present-day edited American English for use with digital computers. Brown University, Providence, Rhode Island, USA.
- (1979) Brown Corpus Manual: Manual of Information to Accompany a Standard Corpus of Present-day Edited American English for use with Digital Computers
- Nelson Francis, W.¹ Kucera, H.²

10
- 0010991761
- Syntactic complexity
- D. R. Dowty, L. Karttunen, and A. M. Zwicky, editors, Cambridge University Press, Cambridge
- L. Frazier. 1985. Syntactic complexity. In D. R. Dowty, L. Karttunen, and A. M. Zwicky, editors, Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives, pages 129-189. Cambridge University Press, Cambridge.
- (1985) Natural Language Parsing: Psychological, Computational, and Theoretical Perspectives , pp. 129-189
- Frazier, L.¹

11
- 84959903294
- Are you talking to a machine? Dataset and methods for multilingual image question answering
- 1505.05612
- Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei Xu. 2015. Are you talking to a machine? dataset and methods for multilingual image question answering. CoRR, abs/1505.05612.
- (2015) CoRR
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

12
- 84888145576
- Reporting bias and knowledge extraction
- AKBC 13
- Jonathan Gordon and Benjamin Van Durme. 2013. Reporting bias and knowledge extraction. In Automated Knowledge Base Construction (AKBC) 2013: The 3rd Workshop on Knowledge Extraction, at CIKM 2013, AKBC 13.
- (2013) Automated Knowledge Base Construction (AKBC) 2013: The 3rd Workshop on Knowledge Extraction, at CIKM 2013
- Gordon, J.¹ Van Durme, B.²

13
- 84943540775
- ReferltGame: Referring to objects in photographs of natural scenes
- Doha, Qatar, October. Association for Computational Linguistics
- Sahar Kazemzadeh, Vicente Ordonez, Mark Matten, and Tamara Berg. 2014. ReferltGame: Referring to Objects in Photographs of Natural Scenes. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 787-798, Doha, Qatar, October. Association for Computational Linguistics.
- (2014) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 787-798
- Kazemzadeh, S.¹ Ordonez, V.² Matten, M.³ Berg, T.⁴

14
- 84959191227
- Joint photo stream and blog post summarization and exploration
- Gunhee Kim, Seungwhan Moon, and Leonid Sigal. 2015. Joint Photo Stream and Blog Post Summarization and Exploration. In 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015).
- (2015) 28th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2015)
- Kim, G.¹ Moon, S.² Sigal, L.³

15
- 84959892243
- Toward interactive grounded language acquisition
- Thomas Kollar, Jayant Krishnamurthy, and Grant Strimel. 2013. Toward interactive grounded language acquisition. In Robotics: Science and Systems.
- (2013) Robotics: Science and Systems
- Kollar, T.¹ Krishnamurthy, J.² Strimel, G.³

16
- 34147141861
- Situated dialogue and spatial organization: What, where... and why?
- March
- Geert-Jan M. Kruijff, Hendrik Zender, Patric Jensfelt, and Henrik I. Christensen. 2007. Situated dialogue and spatial organization: What, where... and why? International Journal of Advanced Robotic Systems, Special Issue on Human and Robot Interactive Communication, 4(2), March.
- (2007) International Journal of Advanced Robotic Systems, Special Issue on Human and Robot Interactive Communication , vol.4 , Issue.2
- Kruijff, G.-J.M.¹ Zender, H.² Jensfelt, P.³ Christensen, H.I.⁴

17
- 84956640115
- Microsoft COCO: Common objects in context
- 1405.0312
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C. Lawrence Zitnick. 2014. Microsoft COCO: common objects in context. CoRR, abs/1405.0312.
- (2014) CoRR
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollar, P.⁷ Zitnick, C.L.⁸

18
- 84937822746
- A multiworld approach to question answering about realworld scenes based on uncertain input
- Mateusz Malinowski and Mario Fritz. 2014. A multiworld approach to question answering about realworld scenes based on uncertain input. In Advances in Neural Information Processing Systems 27, pages 1682-1690.
- (2014) Advances in Neural Information Processing Systems , vol.27 , pp. 1682-1690
- Malinowski, M.¹ Fritz, M.²

19
- 84959916685
- Whats cookin? Interpreting cooking videos using text, speech and vision
- May 31 - June 5, 2015, Denver, Colorado USA
- Jonathan Malmaud, Jonathan Huang, Vivek Rathod, Nicholas Johnston, Andrew Rabinovich, and Kevin Murphy. 2015. Whats cookin? interpreting cooking videos using text, speech and vision. In North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HLT 2015), May 31 - June 5, 2015, Denver, Colorado USA.
- (2015) North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HLT 2015)
- Malmaud, J.¹ Huang, J.² Rathod, V.³ Johnston, N.⁴ Rabinovich, A.⁵ Murphy, K.⁶

20
- 85117622017
- The Stanford CoreNLP natural language processing toolkit
- Christopher D. Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven J. Bernard, and David McClosky. 2014. The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 55-60.
- (2014) Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations , pp. 55-60
- Manning, C.D.¹ Surdeanu, M.² Bauer, J.³ Finkel, J.⁴ Bernard, S.J.⁵ McClosky, D.⁶

21
- 85026948352
- Mitchell Marcus, Beatrice Santorini, Mary Ann Marcinkiewicz, and Ann Taylor. 1999. Brown corpus, treebank-3.
- (1999) Brown Corpus, treebank-3
- Marcus, M.¹ Santorini, B.² Marcinkiewicz, M.A.³ Taylor, A.⁴

22
- 84867118595
- A joint model of language and perception for grounded attribute learning
- Edinburgh, Scotland, June
- Cynthia Matuszek, Nicholas FitzGerald, Luke Zettlemoyer, Liefeng Bo, and Dieter Fox. 2012. A Joint Model of Language and Perception for Grounded Attribute Learning. In Proc. of the 2012 International Conference on Machine Learning, Edinburgh, Scotland, June.
- (2012) Proc. of the 2012 International Conference on Machine Learning
- Matuszek, C.¹ FitzGerald, N.² Zettlemoyer, L.³ Bo, L.⁴ Fox, D.⁵

23
- 84960146118
- Discriminative unsupervised alignment of natural language instructions with corresponding video segments
- May 31 - June 5, 2015, Denver, Colorado USA
- Iftekhar Nairn, Young C. Song, Qiguang Liu, Liang Huang, Henry Kautz, Jiebo Luo, and Daniel Gildea. 2015. Discriminative unsupervised alignment of natural language instructions with corresponding video segments. In North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HIT 2015), May 31 - June 5, 2015, Denver, Colorado USA.
- (2015) North American Chapter of the Association for Computational Linguistics Human Language Technologies (NAACL HIT 2015)
- Nairn, I.¹ Song, Y.C.² Liu, Q.³ Huang, L.⁴ Kautz, H.⁵ Luo, J.⁶ Gildea, D.⁷

24
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- Vicente Ordonez, Girish Kulkarni, and Tamara L. Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In Neural Information Processing Systems (NIPS).
- (2011) Neural Information Processing Systems (NIPS)
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

25
- 85090348677
- Collecting image annotations using amazon's mechanical turk
- Stroudsburg, PA, USA. Association for Computational Linguistics
- Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using amazon's mechanical turk. In Proceedings of the NAACL HIT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT'10, pages 139-147, Stroudsburg, PA, USA. Association for Computational Linguistics.
- (2010) Proceedings of the NAACL HIT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT'10 , pp. 139-147
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

26
- 84898785648
- Grounding action descriptions in videos
- Michaela Regneri, Marcus Rohrbach, Dominikus Wetzel, Stefan Thater, Bernt Schiele, and Manfred Pinkal. 2013. Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics (TACL), 1:25-36.
- (2013) Transactions of the Association for Computational Linguistics (TACL) , vol.1 , pp. 25-36
- Regneri, M.¹ Rohrbach, M.² Wetzel, D.³ Thater, S.⁴ Schiele, B.⁵ Pinkal, M.⁶

27
- 84959934876
- Question answering about images using visual semantic embeddings
- Mengye Ren, Ryan Kiros, and Richard Zemel. 2015. Question answering about images using visual semantic embeddings. In Deep Learning Workshop, ICML 2015.
- (2015) Deep Learning Workshop, ICML 2015
- Ren, M.¹ Kiros, R.² Zemel, R.³

28
- 84866710901
- A database for fine grained activity detection of cooking activities
- IEEE, IEEE, June
- Marcus Rohrbach, Sikandar Amin, Mykhaylo Andriluka, and Bernt Schiele. 2012. A database for fine grained activity detection of cooking activities. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, IEEE, June.
- (2012) IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Rohrbach, M.¹ Amin, S.² Andriluka, M.³ Schiele, B.⁴

29
- 33748863029
- Conversational robots: Building blocks for grounding word meaning
- Stroudsburg, PA, USA. Association for Computational Linguistics
- Deb Roy, Kai-Yuh Hsiao, and Nikolaos Mavridis. 2003. Conversational robots: Building blocks for grounding word meaning. In Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-linguistic Data - Volume 6, HLT-NAACL-LWM'04, pages 70-77, Stroudsburg, PA, USA. Association for Computational Linguistics.
- (2003) Proceedings of the HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-linguistic Data - Volume 6, HLT-NAACL-LWM'04 , pp. 70-77
- Roy, D.¹ Hsiao, K.-Y.² Mavridis, N.³

30
- 84949572890
- arXiv preprint arXiv:1503.01817
- Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. arXiv preprint arXiv:1503.01817.
- (2015) The New Data and New Challenges in Multimedia Research
- Thomee, B.¹ Shamma, D.A.² Friedland, G.³ Elizalde, B.⁴ Ni, K.⁵ Poland, D.⁶ Borth, D.⁷ Li, L.-J.⁸

31
- 80052908300
- Unbiased look at dataset bias
- Washington, DC, USA. IEEE Computer Society
- A. Torralba and A. A. Efros. 2011. Unbiased look at dataset bias. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR'11, pages 1521-1528, Washington, DC, USA. IEEE Computer Society.
- (2011) Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, CVPR'11 , pp. 1521-1528
- Torralba, A.¹ Efros, A.A.²

32
- 85006151700
- An amr parser for english, French, German, Spanish and Japanese and a new amr-annotated corpus
- June
- Lucy Vanderwende, Arul Menezes, and Chris Quirk. 2015. An amr parser for english, french, german, Spanish and Japanese and a new amr-annotated corpus. Proceedings of NAACL 2015, June.
- (2015) Proceedings of NAACL 2015
- Vanderwende, L.¹ Menezes, A.² Quirk, C.³

33
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- Denver, Colorado, June
- Subhashini Venugopalan, Huijuan Xu, Jeff Donahue, Marcus Rohrbach, Raymond Mooney, and Kate Saenko. 2015. Translating videos to natural language using deep recurrent neural networks. In Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HIT 2015), pages 1494-1504, Denver, Colorado, June.
- (2015) Proceedings the 2015 Conference of the North American Chapter of the Association for Computational Linguistics - Human Language Technologies (NAACL HIT 2015) , pp. 1494-1504
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

34
- 0004057837
- Academic Press, New York
- Terry Winograd. 1972. Understanding Natural Language. Academic Press, New York.
- (1972) Understanding Natural Language
- Winograd, T.¹

35
- 85026937926
- See no evil, say no evil: Description generation from densely labeled images
- Dublin, Ireland, August. Association for Computational Linguistics and Dublin City University
- ∗SEM 2014), pages 110-120, Dublin, Ireland, August. Association for Computational Linguistics and Dublin City University.
- (2014) ∗SEM 2014) , pp. 110-120
- Yatskar, M.¹ Galley, M.² Vanderwende, L.³ Zettlemoyer, L.⁴

36
- 0000754012
- A model and an hypothesis for language structure
- Victor H. Yngve. 1960. A model and an hypothesis for language structure. Proceedings of the American Philosophical Society, 104:444-466.
- (1960) Proceedings of the American Philosophical Society , vol.104 , pp. 444-466
- Yngve, V.H.¹

37
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics, 2:67-78.
- (2014) Transactions of the Association for Computational Linguistics , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

38
- 84897743886
- Grounded language learning from video described with sentences
- Sofia, Bulgaria. Association for Computational Linguistics. Best Paper Award
- Haonan Yu and Jeffrey Mark Siskind. 2013. Grounded language learning from video described with sentences. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), volume 1, pages 53-63, Sofia, Bulgaria. Association for Computational Linguistics. Best Paper Award.
- (2013) Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , vol.1 , pp. 53-63
- Yu, H.¹ Siskind, J.M.²

39
- 84959862697
- arXiv preprint arXiv:1506.002 78
- Licheng Yu, Eunbyung Park, Alexander C. Berg, and Tamara L. Berg. 2015. Visual Madlibs: Fill in the blank Image Generation and Question Answering. arXiv preprint arXiv:1506.002 78.
- (2015) Visual Madlibs: Fill in the Blank Image Generation and Question Answering
- Yu, L.¹ Park, E.² Berg, A.C.³ Berg, T.L.⁴

40
- 84898772194
- Learning the visual interpretation of sentences
- Sydney, Australia, December 1-8, 2013
- C. Lawrence Zitnick, Devi Parikh, and Lucy Vanderwende. 2013. Learning the visual interpretation of sentences. In IEEE International Conference on Computer Vision, ICCV 2013, Sydney, Australia, December 1-8, 2013, pages 1681-1688.
- (2013) IEEE International Conference on Computer Vision, ICCV 2013 , pp. 1681-1688
- Lawrence Zitnick, C.¹ Parikh, D.² Vanderwende, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.