SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 07-12-June-2015, Issue , 2015, Pages 3202-3212

A dataset for Movie Description

(4) Rohrbach, Anna a Rohrbach, Marcus b Tandon, Niket a Schiele, Bernt a

a MAX PLANCK INSTITUTE FOR INFORMATICS (Germany)

b INTERNATIONAL COMPUTER SCIENCE INSTITUTE (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTATIONAL LINGUISTICS; COMPUTER VISION; LINGUISTICS; PATTERN RECOGNITION;

AUDIO DESCRIPTION; DATA-SOURCE; LINGUISTIC DESCRIPTIONS; MOVIE PRODUCTION; PARALLEL CORPORA; VISUALLY IMPAIRED PEOPLE;

MOTION PICTURES;

EID: 84959211977 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2015.7298940 Document Type: Conference Paper

Times cited : (468)

References (76)

1
- 84959210562
- British amazon
- British amazon. http://www. amazon. co. uk/, 2014
- (2014)

2
- 85026927645
- Castingwords transcription service. http: //castingwords. com/, 2014
- (2014) Castingwords Transcription Service

3
- 85026927356
- Makemkv. http://www. makemkv. com/, 2014
- (2014)

4
- 85026933692
- Subtitle edit. http://www. nikse. dk/SubtitleEdit/, 2014
- (2014)

5
- 85026924402
- Xmedia recode. http://www. xmedia-recode. de/, 2014
- (2014)

6
- 85119347781
- Semantic parsing with combinatory categorial grammars
- Y. Artzi, N. FitzGerald, and L. S. Zettlemoyer. Semantic parsing with combinatory categorial grammars. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2013
- (2013) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
- Artzi, Y.¹ FitzGerald, N.² Zettlemoyer, L.S.³

7
- 85143187999
- The Berkeley framenet project
- C. F. Baker, C. J. Fillmore, and J. B. Lowe. The berkeley framenet project. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 1998
- (1998) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
- Baker, C.F.¹ Fillmore, C.J.² Lowe, J.B.³

8
- 84885996388
- Video in sentences out
- A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, L. Schmidt, J. Shangguan, J. M. Siskind, J. Waggoner, S. Wang, J. Wei, Y. Yin, and Z. Zhang. Video in sentences out. In Proceedings of the conference on Uncertainty in Artificial Intelligence (UAI), 2012
- (2012) Proceedings of the Conference on Uncertainty in Artificial Intelligence (UAI)
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰ Schmidt, L.¹¹ Shangguan, J.¹² Siskind, J.M.¹³ Waggoner, J.¹⁴ Wang, S.¹⁵ Wei, J.¹⁶ Yin, Y.¹⁷ Zhang, Z.¹⁸

9
- 84904308637
- Semantic parsing on freebase from question-answer pairs
- J. Berant, A. Chou, R. Frostig, and P. Liang. Semantic parsing on freebase from question-answer pairs. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013
- (2013) Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Berant, J.¹ Chou, A.² Frostig, R.³ Liang, P.⁴

10
- 84898792367
- Finding actors and actions in movies
- P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Finding actors and actions in movies. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013
- (2013) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Bojanowski, P.¹ Bach, F.² Laptev, I.³ Ponce, J.⁴ Schmid, C.⁵ Sivic, J.⁶

11
- 84943800045
- Weakly supervised action labeling in videos under ordering constraints
- P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Weakly supervised action labeling in videos under ordering constraints. In Proceedings of the European Conference on Computer Vision (ECCV), 2014
- (2014) Proceedings of the European Conference on Computer Vision (ECCV)
- Bojanowski, P.¹ Lajugie, R.² Bach, F.³ Laptev, I.⁴ Ponce, J.⁵ Schmid, C.⁶ Sivic, J.⁷

12
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- D. Chen and W. Dolan. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2011
- (2011) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
- Chen, D.¹ Dolan, W.²

13
- 84944115859
- arXiv:1411. 5654
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. arXiv:1411. 5654, 2014
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

14
- 70350260037
- Movie/script: Alignment and parsing of video and text transcription
- T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar. Movie/script: Alignment and parsing of video and text transcription. In Proceedings of the European Conference on Computer Vision (ECCV), 2008
- (2008) Proceedings of the European Conference on Computer Vision (ECCV)
- Cour, T.¹ Jordan, C.² Miltsakaki, E.³ Taskar, B.⁴

15
- 70450175499
- Learning from ambiguously labeled images
- T. Cour, B. Sapp, C. Jordan, and B. Taskar. Learning from ambiguously labeled images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009
- (2009) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Cour, T.¹ Sapp, B.² Jordan, C.³ Taskar, B.⁴

16
- 85026928731
- An exact dual decomposition algorithm for shallow semantic parsing with constraints
- D. Das, A. F. Martins, and N. A. Smith. An exact dual decomposition algorithm for shallow semantic parsing with constraints. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2012
- (2012) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
- Das, D.¹ Martins, A.F.² Smith, N.A.³

17
- 84887345951
- Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- P. Das, C. Xu, R. Doell, and J. Corso. Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2013
- (2013) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Das, P.¹ Xu, C.² Doell, R.³ Corso, J.⁴

18
- 84893049875
- Clausie: Clause-based open information extraction
- L. Del Corro and R. Gemulla. Clausie: Clause-based open information extraction. In Proceedings of the International World Wide Web Conference (WWW), 2013
- (2013) Proceedings of the International World Wide Web Conference (WWW)
- Del Corro, L.¹ Gemulla, R.²

19
- 72249100259
- Imagenet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009
- (2009) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

20
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

21
- 85081863350
- Automatic annotation of human actions in video
- O. Duchenne, I. Laptev, J. Sivic, F. Bach, and J. Ponce. Automatic annotation of human actions in video. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2009
- (2009) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Duchenne, O.¹ Laptev, I.² Sivic, J.³ Bach, F.⁴ Ponce, J.⁵

22
- 84898027861
- Hello! My name is buffy"-automatic naming of characters in tv video
- M. Everingham, J. Sivic, and A. Zisserman. "hello! my name is. buffy"-automatic naming of characters in tv video. In Proceedings of the British Machine Vision Conference (BMVC), 2006
- (2006) Proceedings of the British Machine Vision Conference (BMVC)
- Everingham, M.¹ Sivic, J.² Zisserman, A.³

23
- 84907031424
- Open question answering over curated and extracted knowledge bases
- A. Fader, L. Zettlemoyer, and O. Etzioni. Open question answering over curated and extracted knowledge bases. In Proceedings of the ACM SIGKDD international conference on Knowledge discovery and data mining, 2014
- (2014) Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
- Fader, A.¹ Zettlemoyer, L.² Etzioni, O.³

24
- 84944115860
- arXiv:1411. 4952
- H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. arXiv:1411. 4952, 2014
- (2014) From Captions to Visual Concepts and Back
- Fang, H.¹ Gupta, S.² Iandola, F.N.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰ Zitnick, C.L.¹¹ Zweig, G.¹²

25
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In Proceedings of the European Conference on Computer Vision (ECCV), 2010
- (2010) Proceedings of the European Conference on Computer Vision (ECCV)
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

26
- 0004289791
- The MIT Press
- C. Fellbaum, editor. WordNet: An Electronic Lexical Database. The MIT Press, 1998
- (1998) WordNet: An Electronic Lexical Database
- Fellbaum, C.¹

27
- 77956527163
- A computer-vision-assisted system for videodescription scripting
- L. Gagnon, C. Chapdelaine, D. Byrns, S. Foucher, M. Heritier, and V. Gupta. A computer-vision-assisted system for videodescription scripting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops), 2010
- (2010) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops)
- Gagnon, L.¹ Chapdelaine, C.² Byrns, D.³ Foucher, S.⁴ Heritier, M.⁵ Gupta, V.⁶

28
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
- S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013
- (2013) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

29
- 70450202741
- Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
- A. Gupta, P. Srinivasan, J. Shi, and L. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009
- (2009) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Gupta, A.¹ Srinivasan, P.² Shi, J.³ Davis, L.⁴

30
- 84877964523
- Automated textual descriptions for a wide range of video events with 48 human actions
- P. Hanckmann, K. Schutte, and G. J. Burghouts. Automated textual descriptions for a wide range of video events with 48 human actions. In Proceedings of the European Conference on Computer Vision Workshops (ECCV Workshops), 2012
- (2012) Proceedings of the European Conference on Computer Vision Workshops (ECCV Workshops)
- Hanckmann, P.¹ Schutte, K.² Burghouts, G.J.³

31
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Hodosh, A. Young, M. Lai, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In Transactions of the Association for Computational Linguistics (TACL), 2014
- (2014) Transactions of the Association for Computational Linguistics (TACL)
- Hodosh, P.¹ Young, A.² Lai, M.³ Hockenmaier, J.⁴

32
- 84924803045
- LSDA: Large scale detection through adaptation
- J. Hoffman, S. Guadarrama, E. Tzeng, J. Donahue, R. Girshick, T. Darrell, and K. Saenko. LSDA: Large scale detection through adaptation. In Advances in Neural Information Processing Systems (NIPS), 2014
- (2014) Advances in Neural Information Processing Systems (NIPS)
- Hoffman, J.¹ Guadarrama, S.² Tzeng, E.³ Donahue, J.⁴ Girshick, R.⁵ Darrell, T.⁶ Saenko, K.⁷

33
- 84942676733
- arXiv:1412. 2306
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. arXiv:1412. 2306, 2014
- (2014) Deep Visual-semantic Alignments for Generating Image Descriptions
- Karpathy, A.¹ Fei-Fei, L.²

34
- 84863029475
- Human focused video description
- M. U. G. Khan, L. Zhang, and Y. Gotoh. Human focused video description. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2011
- (2011) Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV Workshops)
- Khan, M.U.G.¹ Zhang, L.² Gotoh, Y.³

35
- 85034840044
- Extending verbnet with novel verb classes
- K. Kipper, A. Korhonen, N. Ryant, and M. Palmer. Extending verbnet with novel verb classes. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), 2006
- (2006) Proceedings of the International Conference on Language Resources and Evaluation (LREC)
- Kipper, K.¹ Korhonen, A.² Ryant, N.³ Palmer, M.⁴

36
- 84919921461
- Multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language models. In Proceedings of the International Conference on Machine Learning (ICML), 2014
- (2014) Proceedings of the International Conference on Machine Learning (ICML)
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

37
- 84944113729
- arXiv:1411. 2539
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. arXiv:1411. 2539, 2014
- (2014) Unifying Visual-semantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

38
- 85110867932
- Moses: Open source toolkit for statistical machine translation
- P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open source toolkit for statistical machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2007
- (2007) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
- Koehn, P.¹ Hoang, H.² Birch, A.³ Callison-Burch, C.⁴ Federico, M.⁵ Bertoldi, N.⁶ Cowan, B.⁷ Shen, W.⁸ Moran, C.⁹ Zens, R.¹⁰ Dyer, C.¹¹ Bojar, O.¹² Constantin, A.¹³ Herbst, E.¹⁴

39
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. International Journal of Computer Vision (IJCV), 2002
- (2002) International Journal of Computer Vision (IJCV)
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

40
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems (NIPS), 2012
- (2012) Advances in Neural Information Processing Systems (NIPS)
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

41
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011
- (2011) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

42
- 84878189119
- Collective generation of natural image descriptions
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2012
- (2012) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

43
- 84934873221
- Treetalk: Composition and compression of trees for image descriptions
- P. Kuznetsova, V. Ordonez, T. L. Berg, U. C. Hill, and Y. Choi. Treetalk: Composition and compression of trees for image descriptions. In Transactions of the Association for Computational Linguistics (TACL), 2014
- (2014) Transactions of the Association for Computational Linguistics (TACL)
- Kuznetsova, P.¹ Ordonez, V.² Berg, T.L.³ Hill, U.C.⁴ Choi, Y.⁵

44
- 77956512046
- Technical report, Dept. of Computing Technical Report, University of Surrey
- Lakritz and Salway. The semi-automatic generation of audio description from screenplays. Technical report, Dept. of Computing Technical Report, University of Surrey, 2006
- (2006) The Semi-automatic Generation of Audio Description from Screenplays
- Lakritz¹ Salway²

45
- 51949083365
- Learning realistic human actions from movies
- I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2008
- (2008) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Laptev, I.¹ Marszalek, M.² Schmid, C.³ Rozenfeld, B.⁴

46
- 84906925645
- Contextdependent semantic parsing for time expressions
- K. Lee, Y. Artzi, J. Dodge, and L. Zettlemoyer. Contextdependent semantic parsing for time expressions. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), 2014
- (2014) Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)
- Lee, K.¹ Artzi, Y.² Dodge, J.³ Zettlemoyer, L.⁴

47
- 84862279067
- Composing simple image descriptions using web-scale N-grams
- S. Li, G. Kulkarni, T. Berg, A. Berg, and Y. Choi. Composing simple image descriptions using web-scale N-grams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics, 2011
- (2011) Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL). Association for Computational Linguistics
- Li, S.¹ Kulkarni, G.² Berg, T.³ Berg, A.⁴ Choi, Y.⁵

48
- 80052870735
- Tvparser: An automatic tv video parsing method
- C. Liang, C. Xu, J. Cheng, and H. Lu. Tvparser: An automatic tv video parsing method. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011
- (2011) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Liang, C.¹ Xu, C.² Cheng, J.³ Lu, H.⁴

49
- 84937834115
- Microsoft coco: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV), 2014
- (2014) Proceedings of the European Conference on Computer Vision (ECCV)
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

50
- 84939821073
- arXiv:1412. 6632
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Deep captioning with multimodal recurrent neural networks (mrnn). arXiv:1412. 6632, 2014
- (2014) Deep Captioning with Multimodal Recurrent Neural Networks (Mrnn)
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

51
- 70450177757
- Actions in context
- M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009
- (2009) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Marszalek, M.¹ Laptev, I.² Schmid, C.³

52
- 85034832841
- Generating image descriptions from computer vision detections
- M. Mitchell, J. Dodge, A. Goyal, K. Yamaguchi, K. Stratos, X. Han, A. Mensch, A. C. Berg, T. L. Berg, and H. D. III. Midge: Generating image descriptions from computer vision detections. In Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2012
- (2012) Proceedings of the Conference of the European Chapter of the Association for Computational Linguistics (EACL)
- Mitchell, M.¹ Dodge, J.² Goyal, A.³ Yamaguchi, K.⁴ Stratos, K.⁵ Han, X.⁶ Mensch, A.⁷ Berg, A.C.⁸ Berg, T.L.⁹ Midge, H.D.¹⁰

53
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In Advances in Neural Information Processing Systems (NIPS), 2011
- (2011) Advances in Neural Information Processing Systems (NIPS)
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

54
- 84905274625
- Trecvid 2012-an overview of the goals, tasks, data, evaluation mechanisms and metrics
- USA
- P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, B. Shaw, A. F. Smeaton, and G. Quéenot. Trecvid 2012-an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2012. NIST, USA, 2012
- (2012) Proceedings of TRECVID 2012. NIST
- Over, P.¹ Awad, G.² Michel, M.³ Fiscus, J.⁴ Sanders, G.⁵ Shaw, B.⁶ Smeaton, A.F.⁷ Quéenot, G.⁸

55
- 85081941118
- Wordnet: Similarity: Measuring the relatedness of concepts
- T. Pedersen, S. Patwardhan, and J. Michelizzi. Wordnet:: Similarity: measuring the relatedness of concepts. In Demonstration Papers at HLT-NAACL 2004, 2004
- (2004) Demonstration Papers at HLT-NAACL 2004
- Pedersen, T.¹ Patwardhan, S.² Michelizzi, J.³

56
- 84943782750
- Linking people in videos with "their" names using coreference resolution
- V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking people in videos with "their" names using coreference resolution. In Proceedings of the European Conference on Computer Vision (ECCV), 2014
- (2014) Proceedings of the European Conference on Computer Vision (ECCV)
- Ramanathan, V.¹ Joulin, A.² Liang, P.³ Fei-Fei, L.⁴

57
- 84898785648
- Grounding Action Descriptions in Videos
- M. Regneri, M. Rohrbach, D. Wetzel, S. Thater, B. Schiele, and M. Pinkal. Grounding Action Descriptions in Videos. Transactions of the Association for Computational Linguistics (TACL), 1, 2013
- (2013) Transactions of the Association for Computational Linguistics (TACL) , vol.1
- Regneri, M.¹ Rohrbach, M.² Wetzel, D.³ Thater, S.⁴ Schiele, B.⁵ Pinkal, M.⁶

58
- 84960170289
- Coherent multi-sentence video description with variable level of detail
- September
- A. Rohrbach, M. Rohrbach,W. Qiu, A. Friedrich, M. Pinkal, and B. Schiele. Coherent multi-sentence video description with variable level of detail. In Proceedings of the German Confeence on Pattern Recognition (GCPR), September 2014
- (2014) Proceedings of the German Confeence on Pattern Recognition (GCPR)
- Rohrbach, A.¹ Rohrbach, M.² Qiu, W.³ Friedrich, A.⁴ Pinkal, M.⁵ Schiele, B.⁶

59
- 84898775239
- Translating video content to natural language descriptions
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013
- (2013) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

60
- 84909978410
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, 2014
- (2014) ImageNet Large Scale Visual Recognition Challenge
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

61
- 85139629908
- A corpus-based analysis of audio description
- A. Salway. A corpus-based analysis of audio description. Media for all: Subtitling for the deaf, audio description and sign language, 2007
- (2007) Media for All: Subtitling for the Deaf, Audio Description and Sign Language
- Salway, A.¹

62
- 36849060633
- Associating characters with events in films
- A. Salway, B. Lehane, and N. E. O'Connor. Associating characters with events in films. In Proceedings of the ACM international conference on Image and video retrieval (CIVR), 2007
- (2007) Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR)
- Salway, A.¹ Lehane, B.² O'Connor, N.E.³

63
- 79960117324
- Verbnet overview, extensions, mappings and applications
- K. K. Schuler, A. Korhonen, and S. W. Brown. Verbnet overview, extensions, mappings and applications. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2009
- (2009) Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
- Schuler, K.K.¹ Korhonen, A.² Brown, S.W.³

64
- 70450202706
- Who are you"-learning person specific classifiers from video
- J. Sivic, M. Everingham, and A. Zisserman. "who are you"-learning person specific classifiers from video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009
- (2009) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Sivic, J.¹ Everingham, M.² Zisserman, A.³

65
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics (TACL)
- Transactions of the Association for Computational Linguistics (TACL)
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

66
- 84455192418
- Towards textually describing complex video contents with audio-visual concept classifiers
- C. C. Tan, Y.-G. Jiang, and C.-W. Ngo. Towards textually describing complex video contents with audio-visual concept classifiers. In Proceedings of the ACM international conference on Multimedia (MM), 2011
- (2011) Proceedings of the ACM International Conference on Multimedia (MM)
- Tan, C.C.¹ Jiang, Y.-G.² Ngo, C.-W.³

67
- 84866659479
- Knock! knock! who is it" probabilistic person identification in tvseries
- M. Tapaswi, M. Baeuml, and R. Stiefelhagen. "knock! knock! who is it" probabilistic person identification in tvseries. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012
- (2012) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Tapaswi, M.¹ Baeuml, M.² Stiefelhagen, R.³

68
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. J. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In Proceedings of the International Conference on Computational Linguistics (COLING), 2014
- (2014) Proceedings of the International Conference on Computational Linguistics (COLING)
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.J.⁵

69
- 84959246420
- arXiv:1503. 01070v1
- A. Torabi, C. Pal, H. Larochelle, and A. Courville. Using descriptive video services to create a large data source for video annotation research. arXiv:1503. 01070v1, 2015
- (2015) Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
- Torabi, A.¹ Pal, C.² Larochelle, H.³ Courville, A.⁴

70
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL), 2015
- (2015) Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

71
- 84939821075
- arXiv:1411. 4555
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. arXiv:1411. 4555, 2014
- (2014) Show and Tell: A Neural Image Caption Generator
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

72
- 84898805910
- Action recognition with improved trajectories
- H. Wang and C. Schmid. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013
- (2013) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Wang, H.¹ Schmid, C.²

73
- 77955988947
- Sun database: Large-scale scene recognition from abbey to zoo
- J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010
- (2010) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Xiao, J.¹ Hays, J.² Ehinger, K.A.³ Oliva, A.⁴ Torralba, A.⁵

74
- 84959223725
- arXiv:1502. 08029v3
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Video description generation incorporating spatio-temporal features and a soft-attention mechanism. arXiv:1502. 08029v3, 2015
- (2015) Video Description Generation Incorporating Spatio-temporal Features and A Soft-attention Mechanism
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

75
- 83055186332
- It makes sense: A wide-coverage word sense disambiguation system for free text
- Z. Zhong and H. T. Ng. It makes sense: A wide-coverage word sense disambiguation system for free text. In Proceedings of the ACL 2010 System Demonstrations, 2010
- (2010) Proceedings of the ACL 2010 System Demonstrations
- Zhong, Z.¹ Ng, H.T.²

76
- 84937964578
- Learning deep features for scene recognition using places database
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning Deep Features for Scene Recognition using Places Database. Advances in Neural Information Processing Systems (NIPS), 2014.
- (2014) Advances in Neural Information Processing Systems (NIPS)
- Zhou, B.¹ Lapedriza, A.² Xiao, J.³ Torralba, A.⁴ Oliva, A.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.