SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 07-12-June-2015, Issue , 2015, Pages 4566-4575

CIDEr: Consensus-based image description evaluation

(3) Vedantam, Ramakrishna a Zitnick, C Lawrence b Parikh, Devi a

a VIRGINIA POLYTECHNIC INSTITUTE AND STATE UNIVERSITY (United States)

b MICROSOFT RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; NATURAL LANGUAGE PROCESSING SYSTEMS;

ACTION RECOGNITION; AUTOMATED METRIC; HUMAN ANNOTATIONS; HUMAN JUDGMENTS; IMAGE DESCRIPTIONS; NATURAL LANGUAGE PROCESSING; RECENT PROGRESS; SYSTEMATIC EVALUATION;

PATTERN RECOGNITION;

EID: 84956980995 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2015.7299087 Document Type: Conference Paper

Times cited : (4972)

References (47)

1
- 85116156579
- S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. pages 65-72, 2005.
- (2005) Meteor: An automatic metric for mt evaluation with improved correlation with human judgments , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

2
- 85026926619
- Understanding and predicting importance in images
- A. C. Berg, T. L. Berg, H. D. III, J. Dodge, A. Goyal, X. Han, A. Mensch, M. Mitchell, A. Sood, K. Stratos, and K. Yamaguchi. Understanding and predicting importance in images. In CVPR. IEEE, 2012.
- (2012) CVPR. IEEE
- Berg, A.C.¹ Berg, T.L.² Dodge, J.³ Goyal, A.⁴ Han, X.⁵ Mensch, A.⁶ Mitchell, M.⁷ Sood, A.⁸ Stratos, K.⁹ Yamaguchi, K.¹⁰

3
- 33750347385
- The physics of optimal decision making: A formal analysis of models of performance in two-alternative forcedchoice tasks
- Oct.
- R. Bogacz, E. Brown, J. Moehlis, P. Holmes, and J. D. Cohen. The physics of optimal decision making: A formal analysis of models of performance in two-alternative forcedchoice tasks. Psychol Rev, 113(4):700-765, Oct. 2006.
- (2006) Psychol Rev , vol.113 , Issue.4 , pp. 700-765
- Bogacz, R.¹ Brown, E.² Moehlis, J.³ Holmes, P.⁴ Cohen, J.D.⁵

4
- 84893361786
- Re-evaluating the role of bleu in machine translation research
- C. Callison-burch and M. Osborne. Re-evaluating the role of bleu in machine translation research. In In EACL, pages 249-256, 2006.
- (2006) EACL , pp. 249-256
- Callison-Burch, C.¹ Osborne, M.²

5
- 84952349295
- ArXiv e-prints, Apr.
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft COCO Captions: Data Collection and Evaluation Server. ArXiv e-prints, Apr. 2015.
- (2015) Microsoft COCO Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

6
- 84944115859
- Learning a recurrent visual representation for image caption generation
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. CoRR, abs/1411. 5654, 2014.
- (2014) CoRR, abs/1411. 5654
- Chen, X.¹ Zitnick, C.L.²

7
- 72249100259
- ImageNet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009.
- (2009) CVPR09
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

8
- 85107661995
- Meteor universal: Language specific translation evaluation for any target language
- M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the EACL 2014 Workshop on Statistical Ma-chine Translation, 2014.
- (2014) Proceedings of the EACL 2014 Workshop on Statistical Ma-chine Translation
- Denkowski, M.¹ Lavie, A.²

9
- 84937961492
- Learning to rank using high-order information
- P. K. Dokania, A. Behl, C. V. Jawahar, and P. M. Kumar. Learning to rank using high-order information. ECCV, 2014.
- (2014) ECCV
- Dokania, P.K.¹ Behl, A.² Jawahar, C.V.³ Kumar, P.M.⁴

10
- 84946802546
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. CoRR, abs/1411. 4389, 2014.
- (2014) CoRR, abs/1411. 4389
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

11
- 84906929591
- Image description using visual dependency representations
- D. Elliott and F. Keller. Image description using visual dependency representations. In EMNLP, pages 1292-1302. ACL, 2013.
- (2013) EMNLP 1292-1302. ACL
- Elliott, D.¹ Keller, F.²

12
- 84906928552
- Comparing automatic evaluation measures for image description
- Baltimore, Maryland, June . Association for Computational Linguistics
- D. Elliott and F. Keller. Comparing automatic evaluation measures for image description. In Proceedings of the 52nd Annual Meeting of the Association for Computational Lin-guistics (Volume 2: Short Papers), pages 452-457, Baltimore, Maryland, June 2014. Association for Computational Linguistics.
- (2014) Proceedings of the 52nd Annual Meeting of the Association for Computational Lin-guistics (Volume 2: Short Papers , pp. 452-457
- Elliott, D.¹ Keller, F.²

13
- 51849167307
- M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results. http://www. pascalnetwork. org/challenges/VOC/voc2010/workshop/index. html.
- The PASCAL Visual Object Classes Challenge 2010 (VOC2010) Results
- Everingham, M.¹ Van Gool, L.² Williams, C.K.I.³ Winn, J.⁴ Zisserman, A.⁵

14
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In Pro-ceedings of the 11th European Conference on Computer Vi-sion: Part IV, ECCV'10, 2010.
- (2010) Pro-ceedings of the 11th European Conference on Computer VI-sion: Part IV, ECCV'10
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

15
- 77955422240
- Object detection with discriminatively trained part based models
- P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9):1627-1645, 2010.
- (2010) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.32 , Issue.9 , pp. 1627-1645
- Felzenszwalb, P.F.¹ Girshick, R.B.² McAllester, D.³ Ramanan, D.⁴

16
- 57149125139
- Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
- D. A. Forsyth, P. H. S. Torr, and A. Zisserman, editors, , of Lecture Notes in Com-puter Science. Springer
- A. Gupta and L. S. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In D. A. Forsyth, P. H. S. Torr, and A. Zisserman, editors, ECCV (1), volume 5302 of Lecture Notes in Com-puter Science, pages 16-29. Springer, 2008.
- (2008) ECCV , vol.5302 , Issue.1 , pp. 16-29
- Gupta, A.¹ Davis, L.S.²

17
- 84959255956
- A. Gupta, Y. Verma, and C. Jawahar. Choosing linguistics over vision to describe images. 2012.
- (2012) Choosing Linguistics over Vision to Describe Images.
- Gupta, A.¹ Verma, Y.² Jawahar, C.³

18
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artif. Intell. Res. (JAIR), 47:853-899, 2013.
- (2013) J. Artif. Intell. Res. (JAIR) , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

19
- 84959099868
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CoRR, abs/1412. 2306, 2014.
- (2014) CoRR, abs/1412. 2306
- Karpathy, A.¹ Fei-Fei, L.²

20
- 84959252592
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. CoRR, 2014.
- (2014) CoRR
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

21
- 84946802533
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. CoRR, abs/1411. 2539, 2014.
- (2014) CoRR, abs/1411. 2539
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

22
- 80052901011
- Baby talk: Understanding and generating image descriptions
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating image descriptions. In Proceedings of the 24th CVPR, 2011.
- (2011) Proceedings of the 24th CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

23
- 70450172710
- Learning to detect unseen object classes by betweenclass attribute transfer
- C. H. Lampert, H. Nickisch, and S. Harmeling. Learning to detect unseen object classes by betweenclass attribute transfer. In In CVPR, 2009.
- (2009) CVPR
- Lampert, C.H.¹ Nickisch, H.² Harmeling, S.³

24
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- Stroudsburg, PA, USA, Association for Computational Linguistics
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Fifteenth Conference on Computa-tional Natural Language Learning, CoNLL '11, pages 220-228, Stroudsburg, PA, USA, 2011. Association for Computational Linguistics.
- (2011) Proceedings of the Fifteenth Conference on Computa-tional Natural Language Learning, CoNLL '11 , pp. 220-228
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

25
- 84937834115
- Microsoft COCO: Common objects in context
- T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

26
- 80052880806
- Action recognition from a distributed representation of pose and appearance
- S. Maji, L. Bourdev, and J. Malik. Action recognition from a distributed representation of pose and appearance. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
- (2011) IEEE International Conference on Computer Vision and Pattern Recognition (CVPR)
- Maji, S.¹ Bourdev, L.² Malik, J.³

27
- 84951072975
- Explain images with multimodal recurrent neural networks
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. CoRR, abs/1410. 1090, 2014.
- (2014) CoRR, abs/1410. 1090
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

28
- 0034850577
- A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics
- July
- D. Martin, C. Fowlkes, D. Tal, and J. Malik. A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics. In Proc. 8th Int'l Conf. Computer Vision, volume 2, pages 416-423, July 2001.
- (2001) Proc. 8th Int'l Conf. Computer Vision , vol.2 , pp. 416-423
- Martin, D.¹ Fowlkes, C.² Tal, D.³ Malik, J.⁴

29
- 84959231566
- Midge: Generating descriptions of images
- Stroudsburg, PA, USA. Association for Computational Linguistics
- M. Mitchell, X. Han, and J. Hayes. Midge: Generating descriptions of images. In Proceedings of the Seventh Interna-tional Natural Language Generation Conference, INLG '12, pages 131-133, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics.
- (2012) Proceedings of the Seventh Interna-tional Natural Language Generation Conference, INLG '12 , pp. 131-133
- Mitchell, M.¹ Han, X.² Hayes, J.³

30
- 79951843401
- Springer Publishing Company, Incorporated, 1st edition
- H. Mller, P. Clough, T. Deselaers, and B. Caputo. Image-CLEF: Experimental Evaluation in Visual Information Re-trieval. Springer Publishing Company, Incorporated, 1st edition, 2010.
- (2010) Image-CLEF: Experimental Evaluation in Visual Information Re-trieval
- Mller, H.¹ Clough, P.² Deselaers, T.³ Caputo, B.⁴

31
- 85013202438
- Evaluating content selection in summarization: The pyramid method
- A. Nenkova and R. J. Passonneau. Evaluating content selection in summarization: The pyramid method. In HLT-NAACL, pages 145-152, 2004.
- (2004) HLT-NAACL , pp. 145-152
- Nenkova, A.¹ Passonneau, R.J.²

32
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In Neural Information Processing Systems (NIPS), 2011.
- (2011) Neural Information Processing Systems (NIPS)
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

33
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- Stroudsburg, PA, USA. Association for Computational Linguistics
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02, pages 311-318, Stroudsburg, PA, USA, 2002. Association for Computational Linguistics.
- (2002) Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL '02 , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

34
- 84856670612
- Relative attributes
- D. Parikh and K. Grauman. Relative Attributes. In ICCV, 2011.
- (2011) ICCV
- Parikh, D.¹ Grauman, K.²

35
- 85090348677
- Collecting image annotations using amazon's mechanical turk
- Stroudsburg, PA, USA. Association for Computational Linguistics
- C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10, Stroudsburg, PA, USA, 2010. Association for Computational Linguistics.
- (2010) Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, CSLDAMT '10
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

36
- 8844253324
- Understanding inverse document frequency: On theoretical arguments for idf
- S. Robertson. Understanding inverse document frequency: On theoretical arguments for idf. Journal of Documentation, 60:2004, 2004.
- (2004) Journal of Documentation , vol.60 , pp. 2004
- Robertson, S.¹

37
- 84898775239
- Translating video content to natural language descriptions
- December
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In IEEE International Conference on Computer Vision (ICCV), December 2013.
- (2013) IEEE International Conference on Computer Vision (ICCV)
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

38
- 80052889458
- M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. 2011.
- (2011) Recognition Using Visual Phrases.
- Sadeghi, M.A.¹ Farhadi, A.²

39
- 0036537472
- A taxonomy and evaluation of dense two-frame stereo correspondence algorithms
- D. Scharstein and R. Szeliski. A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int. J. Comput. Vision, 2002.
- (2002) Int. J. Comput. Vision
- Scharstein, D.¹ Szeliski, R.²

40
- 52049123532
- Utility data annotation with amazon mechanical turk
- June
- A. Sorokin and D. Forsyth. Utility data annotation with amazon mechanical turk. In Computer Vision and Pattern Recog-nition Workshops, 2008. CVPRW '08. IEEE Computer Soci-ety Conference on, pages 1-8, June 2008.
- (2008) Computer Vision and Pattern Recog-nition Workshops, 2008. CVPRW '08. IEEE Computer Soci-ety Conference on , pp. 1-8
- Sorokin, A.¹ Forsyth, D.²

41
- 80053456767
- Adaptively learning the crowd kernel
- O. Tamuz, C. Liu, S. Belongie, O. Shamir, and A. T. Kalai. Adaptively learning the crowd kernel. In In ICML11, 2011.
- (2011) ICML11
- Tamuz, O.¹ Liu, C.² Belongie, S.³ Shamir, O.⁴ Kalai, A.T.⁵

42
- 84959197551
- Cider: Consensus-based image description evaluation
- R. Vedantam, C. L. Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. CoRR, abs/1411. 5726, 2014.
- (2014) CoRR, abs/1411. 5726
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

43
- 84951910303
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CoRR, abs/1411. 4555, 2014.
- (2014) CoRR, abs/1411. 4555
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

44
- 85026931000
- Corpusguided sentence generation of natural images
- Y. Yang, C. L. Teo, H. D. III, and Y. Aloimonos. Corpusguided sentence generation of natural images. In EMNLP. ACL, 2011.
- (2011) EMNLP. ACL
- Yang, Y.¹ Teo, C.L.² Aloimonos, Y.³

45
- 85026937926
- See no evil, say no evil: Description generation from densely labeled images
- Association for Computational Linguistics and Dublin City University, Dublin, Ireland, August
- M. Yatskar, M. Galley, L. Vanderwende, and L. Zettlemoyer. See no evil, say no evil: Description generation from densely labeled images. In Proceedings of the Third Joint Conference on Lexical and Computational Semantics (SEM 2014), page 110120, Dublin, Ireland, August 2014. Association for Computational Linguistics and Dublin City University.
- (2014) Proceedings of the Third Joint Conference on Lexical and Computational Semantics (SEM 2014) , pp. 110120
- Yatskar, M.¹ Galley, M.² Vanderwende, L.³ Zettlemoyer, L.⁴

46
- 78650200194
- C. yew Lin. Rouge: A package for automatic evaluation of summaries. pages 25-26, 2004.
- (2004) Rouge: A Package for Automatic Evaluation of Summaries , pp. 25-26
- Yew Lin, C.¹

47
- 84887338442
- Bringing semantics into focus using visual abstraction
- C. L. Zitnick and D. Parikh. Bringing semantics into focus using visual abstraction. In CVPR, 2013.
- (2013) CVPR
- Zitnick, C.L.¹ Parikh, D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.