SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2015 International Conference on Computer Vision, ICCV 2015, Issue , 2015, Pages 2641-2649

Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models

(6) Plummer, Bryan A a Wang, Liwei a Cervantes, Chris M a Caicedo, Juan C b Hockenmaier, Julia a Lazebnik, Svetlana a

a UNIVERSITY OF ILLINOIS AT URBANA CHAMPAIGN (United States)

b FUNDACIÓN UNIVERSITARIA KONRAD LORENZ (Colombia)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION;

BOUNDING BOX; COREFERENCE CHAINS; IMAGE DESCRIPTIONS; LANGUAGE UNDERSTANDING; REFERENCE RESOLUTION; RETRIEVAL METHODS; SENTENCE-BASED; STATE OF THE ART;

INFORMATION RETRIEVAL;

EID: 84973856017 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2015.303 Document Type: Conference Paper

Times cited : (1701)

References (41)

1
- 84944115859
- arXiv 1411 5654. 1
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. arXiv: 1411. 5654, 2014. 1
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

2
- 84952349296
- arxiv. org 1505 01809. 1
- J. Devlin, H. Cheng, H. Fang, S. Gupta, L. Deng, X. He, G. Zweig, and M. Mitchell. Language models for image captioning: The quirks and what works. In arxiv. org: 1505. 01809, 2015. 1
- (2015) Language Models for Image Captioning: The Quirks and What Works
- Devlin, J.¹ Cheng, H.² Fang, H.³ Gupta, S.⁴ Deng, L.⁵ He, X.⁶ Zweig, G.⁷ Mitchell, M.⁸

3
- 84901455535
- Detecting visual text
- 8
- J. Dodge, A. Goyal, X. Han, A. Mensch, M. Mitchell, K. Stratos, K. Yamaguchi, Y. Choi, H. D. III, A. C. Berg, and T. L. Berg. Detecting visual text. In NAACL, 2012. 8
- (2012) NAACL
- Dodge, J.¹ Goyal, A.² Han, X.³ Mensch, A.⁴ Mitchell, M.⁵ Stratos, K.⁶ Yamaguchi, K.⁷ Choi, Y.⁸ Berg, A.C.⁹ Berg, T.L.¹⁰

4
- 84944046597
- arXiv 1411 4389. 1
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. arXiv: 1411. 4389, 2014. 1
- (2014) Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

5
- 77951298115
- The Pascal visual object classes (VOC) challenge
- 7
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes (VOC) challenge. IJCV, 88 (2): 303-338, 2010. 7
- (2010) IJCV , vol.88 , Issue.2 , pp. 303-338
- Everingham, M.¹ Van Gool, L.² Williams, C.K.³ Winn, J.⁴ Zisserman, A.⁵

6
- 84944115860
- arXiv 1411 4952. 1
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. arXiv: 1411. 4952, 2014. 1
- (2014) From Captions to Visual Concepts and Back
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

7
- 80052017343
- Every picture tells a story: Generating sentences from images
- 1, 8
- A. Farhadi, S. Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. A. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV. 2010. 1, 8
- (2010) ECCV
- Farhadi, A.¹ Hejrati, S.² Sadeghi, A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.A.⁷

8
- 84887365305
- A sentence is worth a thousand pixels
- 8
- S. Fidler, A. Sharma, and R. Urtasun. A sentence is worth a thousand pixels. In CVPR, 2013. 8
- (2013) CVPR
- Fidler, S.¹ Sharma, A.² Urtasun, R.³

9
- 84894905366
- A multi-view embedding space for modeling internet images, tags, and their semantics
- 6
- Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106 (2): 210-233, 2014. 6
- (2014) IJCV , vol.106 , Issue.2 , pp. 210-233
- Gong, Y.¹ Ke, Q.² Isard, M.³ Lazebnik, S.⁴

10
- 84959243872
- Improving image-sentence embeddings using large weakly annotated photo collections
- 1, 5, 6, 7
- Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, 2014. 1, 5, 6, 7
- (2014) ECCV
- Gong, Y.¹ Wang, L.² Hodosh, M.³ Hockenmaier, J.⁴ Lazebnik, S.⁵

11
- 38049183286
- The iapr tc-12 benchmark: A new evaluation resource for visual information systems
- 1, 2
- M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, pages 13-23, 2006. 1, 2
- (2006) International Workshop OntoImage , pp. 13-23
- Grubinger, M.¹ Clough, P.² Müller, H.³ Deselaers, T.⁴

12
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 1
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 2013. 1
- (2013) JAIR
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

13
- 84862286506
- Crosscaption coreference resolution for automatic image understanding
- 3, 8. ACL
- M. Hodosh, P. Young, C. Rashtchian, and J. Hockenmaier. Crosscaption coreference resolution for automatic image understanding. In CoNLL, pages 162-171. ACL, 2010. 3, 8
- (2010) CoNLL , pp. 162-171
- Hodosh, M.¹ Young, P.² Rashtchian, C.³ Hockenmaier, J.⁴

14
- 0000107975
- Relations between two sets of variates
- 5
- H. Hotelling. Relations between two sets of variates. Biometrika, pages 321-377, 1936. 5
- (1936) Biometrika , pp. 321-377
- Hotelling, H.¹

15
- 84959233256
- Image retrieval using scene graphs
- 2
- J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. Bernstein, and L. Fei-Fei. Image retrieval using scene graphs. In CVPR, 2015. 2
- (2015) CVPR
- Johnson, J.¹ Krishna, R.² Stark, M.³ Li, L.-J.⁴ Shamma, D.A.⁵ Bernstein, M.⁶ Fei-Fei, L.⁷

16
- 84942676733
- arXiv 1412 2306. 1, 5, 6, 7, 8
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. arXiv: 1412. 2306, 2014. 1, 5, 6, 7, 8
- (2014) Deep Visual-semantic Alignments for Generating Image Descriptions
- Karpathy, A.¹ Fei-Fei, L.²

17
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- 1
- A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, 2014. 1
- (2014) NIPS
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

18
- 84943540775
- Referitgame: Referring to objects in photographs of natural scenes
- 2
- S. Kazemzadeh, V. Ordonez, M. Matten, and T. Berg. Referitgame: Referring to objects in photographs of natural scenes. In EMNLP, 2014. 2
- (2014) EMNLP
- Kazemzadeh, S.¹ Ordonez, V.² Matten, M.³ Berg, T.⁴

19
- 84944113729
- arXiv 1411 2539. 1, 5, 7, 8
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visualsemantic embeddings with multimodal neural language models. arXiv: 1411. 2539, 2014. 1, 5, 7, 8
- (2014) Unifying Visualsemantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

20
- 84965125568
- Fisher vectors derived from hybrid Gaussian-laplacian mixture models for image annotation
- 1, 5, 6, 7, 8
- B. Klein, G. Lev, G. Sadeh, and L. Wolf. Fisher vectors derived from hybrid Gaussian-laplacian mixture models for image annotation. CVPR, 2015. 1, 5, 6, 7, 8
- (2015) CVPR
- Klein, B.¹ Lev, G.² Sadeh, G.³ Wolf, L.⁴

21
- 84911370987
- What are you talking about? Text-to-image coreference
- 5
- C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-to-image coreference. In CVPR, 2014. 5
- (2014) CVPR
- Kong, C.¹ Lin, D.² Bansal, M.³ Urtasun, R.⁴ Fidler, S.⁵

22
- 80052901011
- Baby talk: Understanding and generating image descriptions
- 1, 8
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating image descriptions. In CVPR, 2011. 1, 8
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

23
- 84970028761
- arXiv 1502 03671. 1
- R. Lebret, P. O. Pinheiro, and R. Collobert. Phrase-based image captioning. arXiv: 1502. 03671, 2015. 1
- (2015) Phrase-based Image Captioning
- Lebret, R.¹ Pinheiro, P.O.² Collobert, R.³

24
- 84937834115
- Microsoft COCO: Common objects in context
- 1
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014. 1
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

25
- 84939821073
- arXiv 1412 6632. 1, 5, 6, 7, 8
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). arXiv: 1412. 6632, 2014. 1, 5, 6, 7, 8
- (2014) Deep Captioning with Multimodal Recurrent Neural Networks (M-rnn)
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.⁵

26
- 0040637968
- arXiv cmp-lg/9505043 3
- J. F. McCarthy and W. G. Lehnert. Using decision trees for coreference resolution. arXiv cmp-lg/9505043, 1995. 3
- (1995) Using Decision Trees for Coreference Resolution
- McCarthy, J.F.¹ Lehnert, W.G.²

27
- 84898956512
- Distributed representations of words and phrases and their compositionality
- 5
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013. 5
- (2013) NIPS
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

28
- 85162522202
- Im2Text: Describing images using 1 million captioned photographs
- 1
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2Text: Describing images using 1 million captioned photographs. NIPS, 2011. 1
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

29
- 79959771606
- Improving the fisher kernel for large-scale image classification
- 5
- F. Perronnin, J. Sánchez, and T. Mensink. Improving the fisher kernel for large-scale image classification. In ECCV, 2010. 5
- (2010) ECCV
- Perronnin, F.¹ Sánchez, J.² Mensink, T.³

30
- 84943782750
- Linking people in videos with "their" names using coreference resolution
- 3
- V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking people in videos with "their" names using coreference resolution. In ECCV, 2014. 3
- (2014) ECCV
- Ramanathan, V.¹ Joulin, A.² Liang, P.³ Fei-Fei, L.⁴

31
- 85090348677
- Collecting image annotations using Amazon's mechanical turk
- 1, 4. ACL
- C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using Amazon's mechanical turk. In NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 139-147. ACL, 2010. 1, 4
- (2010) NAACL HLT Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk , pp. 139-147
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

32
- 84925410541
- arXiv 1409 1556. 5
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv: 1409. 1556, 2014. 5
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

33
- 84893795422
- Parsing with compositional vector grammars
- 6
- R. Socher, J. Bauer, C. D. Manning, and A. Y. Ng. Parsing With Compositional Vector Grammars. In ACL, 2013. 6
- (2013) ACL
- Socher, R.¹ Bauer, J.² Manning, C.D.³ Ng, A.Y.⁴

34
- 0039891959
- A machine learning approach to coreference resolution of noun phrases
- 3
- W. M. Soon, H. T. Ng, and D. C. Y. Lim. A machine learning approach to coreference resolution of noun phrases. Computational Linguistics, 27 (4): 521-544, 2001. 3
- (2001) Computational Linguistics , vol.27 , Issue.4 , pp. 521-544
- Soon, W.M.¹ Ng, H.T.² Lim, D.C.Y.³

35
- 52049123532
- Utility data annotation with Amazon Mechanical Turk
- 4
- A. Sorokin and D. Forsyth. Utility data annotation with Amazon Mechanical Turk. Internet Vision Workshop, 2008. 4
- (2008) Internet Vision Workshop
- Sorokin, A.¹ Forsyth, D.²

36
- 84875707366
- Crowdsourcing annotations for visual object detection
- 3
- H. Su, J. Deng, and L. Fei-Fei. Crowdsourcing annotations for visual object detection. In AAAI Technical Report, 4th Human Computation Workshop, 2012. 3
- (2012) AAAI Technical Report, 4th Human Computation Workshop
- Su, H.¹ Deng, J.² Fei-Fei, L.³

37
- 84939821075
- arXiv 1411 4555. 1
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv: 1411. 4555, 2014. 1
- (2014) Show and Tell: A Neural Image Caption Generator
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

38
- 84939821074
- arXiv 1502 03044. 1
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv: 1502. 03044, 2015. 1
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

39
- 77954862144
- I2T: Image parsing to text description
- 1
- B. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu. I2T: Image parsing to text description. Proc. IEEE, 98 (8): 1485-1508, 2010. 1
- (2010) Proc. IEEE , vol.98 , Issue.8 , pp. 1485-1508
- Yao, B.¹ Yang, X.² Lin, L.³ Lee, M.W.⁴ Zhu, S.-C.⁵

40
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- 1, 3
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2: 67-78, 2014. 1, 3
- (2014) TACL , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

41
- 84952018709
- Edge boxes: Locating object proposals from edges
- 6
- C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. In ECCV, 2014. 6
- (2014) ECCV
- Zitnick, C.L.¹ Dollár, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.