SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 5534-5542

Situation recognition: Visual semantic role labeling for image understanding

(3) Yatskar, Mark a Zettlemoyer, Luke a Farhadi, Ali a,b

a UNIVERSITY OF WASHINGTON (United States)

b Allen Institute for Artificial Intelligence ^* (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; SEMANTICS; WOOL; YARN;

ACTIVITY RECOGNITION; FRAMENET; INDEPENDENT OBJECTS; LARGE SPACES; LARGE-SCALE DATASET; SITUATION RECOGNITION; STRUCTURED PREDICTION; VISUAL SEMANTICS;

PATTERN RECOGNITION;

EID: 84986247420 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.597 Document Type: Conference Paper

Times cited : (300)

References (52)

1
- 84959502295
- arXiv preprint arXiv: 1505. 00468, 3
- S. Antol et al. Vqa: Visual question answering. ArXiv preprint arXiv: 1505. 00468, 2015.
- (2015) Vqa: Visual Question Answering
- Antol, S.¹

2
- 84944115859
- arXiv: 1411. 5654, 3
- X. Chen et al. Learning a recurrent visual representation for image caption generation. ArXiv: 1411. 5654, 2014.
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹

3
- 84930656106
- PhD thesis, CMU, 2, 3
- D. Das. Semi-Supervised and Latent-Variable Models of Natural Language Semantics. PhD thesis, CMU, 2012.
- (2012) Semi-Supervised and Latent-Variable Models of Natural Language Semantics
- Das, D.¹

4
- 84898427335
- Recognizing human actions in still images: A study of bag-of-features and part-based representations
- 2
- V. Delaitre et al. Recognizing human actions in still images: A study of bag-of-features and part-based representations. In BMVC, 2010.
- (2010) BMVC
- Delaitre, V.¹

5
- 84946590544
- Construction and Analysis of a Large Scale Image Ontology
- 3
- J. Deng et al. Construction and Analysis of a Large Scale Image Ontology. Vision Sciences Society, 2009.
- (2009) Vision Sciences Society
- Deng, J.¹

6
- 70450161428
- An empirical study of context in object detection
- 3
- S. Divvala et al. An empirical study of context in object detection. In CVPR, 2009.
- (2009) CVPR
- Divvala, S.¹

7
- 84906928552
- Comparing automatic evaluation measures for image description
- 3
- D. Elliott et al. Comparing automatic evaluation measures for image description. In ACL, 2014.
- (2014) ACL
- Elliott, D.¹

8
- 85009936768
- Linking people with their names using coreference resolution
- V. R. et al.
- V. R. et al. Linking people with "their" names using coreference resolution. In ECCV, 2014.
- (2014) ECCV , vol.3

9
- 84986248327
- arXiv preprint arXiv: 1507. 05670. Z. et al.
- Z. et al. Building a large-scale multimodal knowledge base for visual question answering. ArXiv preprint arXiv: 1507. 05670, 2015.
- (2015) Building A Large-scale Multimodal Knowledge Base for Visual Question Answering

10
- 84921069139
- The pascal visual object classes challenge 2009
- 2
- M. Everingham et al. The pascal visual object classes challenge 2009. In 2th PASCAL Challenge Workshop, 2009.
- (2009) 2th PASCAL Challenge Workshop
- Everingham, M.¹

11
- 84944115860
- arXiv: 1411. 4952, 3
- H. Fang et al. From captions to visual concepts and back. ArXiv: 1411. 4952, 2014.
- (2014) From Captions to Visual Concepts and Back
- Fang, H.¹

12
- 78149311145
- Every picture tells a story: Generating sentences from images
- 3
- A. Farhadi et al. Every picture tells a story: Generating sentences from images. In ECCV 2010, pages 15-29. 2010.
- (2010) ECCV 2010 , pp. 15-29
- Farhadi, A.¹

13
- 0012686456
- Wiley Online Library, 2, 3
- C. Fellbaum. WordNet. Wiley Online Library, 1998.
- (1998) WordNet
- Fellbaum, C.¹

14
- 23844488601
- Background to framenet
- 2, 3
- C. J. Fillmore et al. Background to framenet. International Journal of lexicography, 2003.
- (2003) International Journal of Lexicography
- Fillmore, C.J.¹

15
- 84959925712
- Semantic role labelling with neural network factors
- 6
- N. FitzGerald et al. Semantic role labelling with neural network factors. In EMNLP, 2015.
- (2015) EMNLP
- FitzGerald, N.¹

16
- 84898958665
- Devise: A deep visual-semantic embedding model
- 3
- A. Frome et al. Devise: A deep visual-semantic embedding model. In NIPS, 2013.
- (2013) NIPS
- Frome, A.¹

17
- 78651403274
- Context based object categorization: A critical survey
- 3
- C. Galleguillos et. Al. Context based object categorization: A critical survey. CVIU, 2010.
- (2010) CVIU
- Galleguillos, C.¹

18
- 84957033954
- arXiv preprint arXiv: 1505. 05612, 3
- H. e. A. Gao. Are you talking to a machine dataset and methods for multilingual image question answering. ArXiv preprint arXiv: 1505. 05612, 2015.
- (2015) Are You Talking to A Machine Dataset and Methods for Multilingual Image Question Answering
- Gao, H.E.A.¹

19
- 84943742382
- A dataset of syntactic-ngrams over time from a very large corpus of english books
- 4
- Y. Goldberg et al. A dataset of syntactic-ngrams over time from a very large corpus of english books. In SEM, 2013.
- (2013) SEM
- Goldberg, Y.¹

20
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
- 3
- S. Guadarrama et al. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, 2013.
- (2013) ICCV
- Guadarrama, S.¹

21
- 84902318725
- A survey on still image based human action recognition
- 2
- G. Guo et al. A survey on still image based human action recognition. Pattern Recognition, 2014.
- (2014) Pattern Recognition
- Guo, G.¹

22
- 70450155469
- Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
- 2
- A. Gupta et al. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV. 2008.
- (2008) ECCV.
- Gupta, A.¹

23
- 84986248703
- arXiv preprint arXiv: 1505. 04474, 3
- S. Gupta et al. Visual semantic role labeling. ArXiv preprint arXiv: 1505. 04474, 2015.
- (2015) Visual Semantic Role Labeling
- Gupta, S.¹

24
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 3
- M. Hodosh et al. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 2013.
- (2013) JAIR
- Hodosh, M.¹

25
- 85009867858
- arXiv: 1408. 5093, 6
- Y. Jia et al. Caffe: Convolutional architecture for fast feature embedding. ArXiv: 1408. 5093, 2014.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹

26
- 84942676733
- arXiv: 1412. 2306, 3
- A. Karpathy et al. Deep visual-semantic alignments for generating image descriptions. ArXiv: 1412. 2306, 2014.
- (2014) Deep Visual-semantic Alignments for Generating Image Descriptions
- Karpathy, A.¹

27
- 84964379003
- From treebank to propbank
- Citeseer, 3
- P. Kingsbury and M. Palmer. From treebank to propbank. In LREC. Citeseer, 2002.
- (2002) LREC
- Kingsbury, P.¹ Palmer, M.²

28
- 84911370987
- What are you talking about text-to-image coreference
- 3
- C. Kong et al. What are you talking about text-to-image coreference. In CVPR, 2014.
- (2014) CVPR
- Kong, C.¹

29
- 85009863830
- Is this a wampimuk
- 3
- A. Lazaridou et al. Is this a wampimuk In ACL, 2014.
- (2014) ACL
- Lazaridou, A.¹

30
- 85062874978
- Tuhoi: Trento universal human object interaction dataset
- 2
- D.-T. Le et al. Tuhoi: Trento universal human object interaction dataset. V&L Net 2014, 2014.
- (2014) V&L Net 2014
- Le, D.-T.¹

31
- 78149310629
- What, where and who classifying events by scene and object recognition
- 2
- L.-J. Li et al. What, where and who classifying events by scene and object recognition. In CVPR, 2007.
- (2007) CVPR
- Li, L.-J.¹

32
- 85009931853
- Microsoft coco: Common objects in context
- 3
- T.-Y. Lin et al. Microsoft coco: Common objects in context. In ECCV. 2014.
- (2014) ECCV.
- Lin, T.-Y.¹

33
- 80052880806
- Action recognition from a distributed representation of pose and appearance
- 3
- S. Maji et al. Action recognition from a distributed representation of pose and appearance. In CVPR, 2011.
- (2011) CVPR
- Maji, S.¹

34
- 84951072975
- arXiv: 1410. 1090, 3
- J. Mao et al. Explain images with multimodal recurrent neural networks. ArXiv: 1410. 1090, 2014.
- (2014) Explain Images with Multimodal Recurrent Neural Networks
- Mao, J.¹

35
- 70450177757
- Actions in context
- 1, 3
- M. Marszalek et al. Actions in context. In CVPR, 2009.
- (2009) CVPR
- Marszalek, M.¹

36
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- 3
- V. Ordonez et al. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹

37
- 81255158440
- Semlink: Linking propbank, verbnet and framenet
- 3
- M. Palmer. Semlink: Linking propbank, verbnet and framenet. In GLC, pages 9-15, 2009.
- (2009) GLC , pp. 9-15
- Palmer, M.¹

38
- 50649096757
- Objects in context
- 3
- A. Rabinovich et al. Objects in context. In ICCV, 2007.
- (2007) ICCV
- Rabinovich, A.¹

39
- 84962816362
- arXiv preprint arXiv: 1505. 02074, 3
- M. Ren et al. Image question answering: A visual semantic embedding model and a new dataset. ArXiv preprint arXiv: 1505. 02074, 2015.
- (2015) Image Question Answering: A Visual Semantic Embedding Model and A New Dataset
- Ren, M.¹

40
- 84994124048
- Describing common human visual actions in images
- 3
- M. Ronchi et al. Describing common human visual actions in images. In BMVC, 2015.
- (2015) BMVC
- Ronchi, M.¹

41
- 84921954402
- ImageNet large scale visual recognition challenge
- 2, 6
- O. Russakovsky et al. ImageNet Large Scale Visual Recognition Challenge. CoRR, 2014.
- (2014) CoRR
- Russakovsky, O.¹

42
- 84883376937
- Grounded models of semantic representation
- 3
- C. Silberer et al. Grounded models of semantic representation. In EMNLP, 2012.
- (2012) EMNLP
- Silberer, C.¹

43
- 84925410541
- CoRR, abs/1409. 1556, 2, 6
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409. 1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

44
- 84884955228
- 2
- K. Soomro et al. Ucf101: A dataset of 101 human actions classes from videos in the wild. 2012.
- (2012) Ucf101: A Dataset of 101 Human Actions Classes from Videos in the Wild
- Soomro, K.¹

45
- 84959197551
- arXiv: 1411. 5726, 3
- R. Vedantam et al. Cider: Consensus-based image description evaluation. ArXiv: 1411. 5726, 2014.
- (2014) Cider: Consensus-based Image Description Evaluation
- Vedantam, R.¹

46
- 84939821075
- arXiv: 1411. 4555, 3
- O. Vinyals et al. Show and tell: A neural image caption generator. ArXiv: 1411. 4555, 2014. 3
- (2014) Show and Tell: A Neural Image Caption Generator
- Vinyals, O.¹

47
- 85009851491
- Modeling mutual context of object and human pose in human-object interaction activities
- 3
- B. Yao et al. Modeling mutual context of object and human pose in human-object interaction activities. In CVPR.
- CVPR
- Yao, B.¹

48
- 77955987964
- Grouplet: A structured image representation for recognizing human and object interactions
- 2
- B. Yao et al. Grouplet: A structured image representation for recognizing human and object interactions. In CVPR, 2010.
- (2010) CVPR
- Yao, B.¹

49
- 84856672971
- Human action recognition by learning bases of action attributes and parts
- 2
- B. Yao et al. Human action recognition by learning bases of action attributes and parts. In ICCV, 2011.
- (2011) ICCV
- Yao, B.¹

50
- 85026937926
- See no evil, say no evil: Description generation from densely labeled images
- 3
- M. Yatskar et al. See no evil, say no evil: Description generation from densely labeled images. SEM, 2014.
- (2014) SEM
- Yatskar, M.¹

51
- 84959862697
- arXiv preprint arXiv: 1506. 00278, 3
- L. e. A. Yu. Visual madlibs: Fill in the blank image generation and question answering. ArXiv preprint arXiv: 1506. 00278, 2015.
- (2015) Visual Madlibs: Fill in the Blank Image Generation and Question Answering
- Yu, L.E.A.¹

52
- 84952058866
- Reasoning about object affordances in a knowledge base representation
- 3
- Y. Zhu et al. Reasoning about object affordances in a knowledge base representation. In ECCV. 2014.
- (2014) ECCV
- Zhu, Y.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.