SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 1-10

Deep compositional captioning: Describing novel object categories without paired training data

(6) Hendricks, Lisa Anne a Venugopalan, Subhashini c Rohrbach, Marcus a,b Mooney, Raymond c Saenko, Kate d Darrell, Trevor a,b

a UNIVERSITY OF CALIFORNIA (United States)

b ICSI (United States)

c UNIVERSITY OF TEXAS AT AUSTIN (United States)

d UNIVERSITY OF MASSACHUSETTS LOWELL (United States)

Author keywords

[No Author keywords available]

Indexed keywords

CHARACTER RECOGNITION; COMPUTER VISION; KNOWLEDGE MANAGEMENT; OBJECT RECOGNITION;

DEEP NEURAL NETWORKS; IMAGE CAPTIONING; IN CONTEXTS; NOVEL CONCEPT; OBJECT CATEGORIES; TEXT CORPORA; TRAINING DATA; VIDEO CLIPS;

PATTERN RECOGNITION;

EID: 84986274522 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.8 Document Type: Conference Paper

Times cited : (305)

References (37)

1
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, volume 29, pages 65-72, 2005.
- (2005) Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation And/or Summarization , vol.29 , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

2
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
- (2011) ACL
- Chen, D.L.¹ Dolan, W.B.²

3
- 80052876786
- What does classifying more than 10, 000 image categories tell us
- J. Deng, A. Berg, K. Li, and L. Fei-Fei. What does classifying more than 10, 000 image categories tell us In ECCV, 2010.
- (2010) ECCV
- Deng, J.¹ Berg, A.² Li, K.³ Fei-Fei, L.⁴

4
- 84944096380
- Language models for image captioning: The quirks and what works
- J. Devlin, H. Cheng, H. Fang, S. Gupta, L. Deng, X. He, G. Zweig, and M. Mitchell. Language models for image captioning: The quirks and what works. ACL, 2015.
- (2015) ACL
- Devlin, J.¹ Cheng, H.² Fang, H.³ Gupta, S.⁴ Deng, L.⁵ He, X.⁶ Zweig, G.⁷ Mitchell, M.⁸

5
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

6
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.N.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰ Zitnick, C.L.¹¹ Zweig, G.¹²

7
- 84898958665
- Devise: A deep visual-semantic embedding model
- A. Frome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, T. Mikolov, et al. Devise: A deep visual-semantic embedding model. In NIPS, 2013.
- (2013) NIPS
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

8
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
- S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition. In ICCV, 2013.
- (2013) ICCV
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

9
- 70450202741
- Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
- A. Guptal, P. Srinivasan, J. Shi, and L. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In CVPR, 2009.
- (2009) CVPR
- Guptal, A.¹ Srinivasan, P.² Shi, J.³ Davis, L.⁴

10
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9 (8): 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

11
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Hodosh, A. Young, M. Lai, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In TACL, 2014.
- (2014) TACL
- Hodosh, P.¹ Young, A.² Lai, M.³ Hockenmaier, J.⁴

12
- 84924803045
- LSDA: Large scale detection through adaptation
- J. Hoffman, S. Guadarrama, E. Tzeng, J. Donahue, R. Girshick, T. Darrell, and K. Saenko. LSDA: Large scale detection through adaptation. In NIPS, 2014.
- (2014) NIPS
- Hoffman, J.¹ Guadarrama, S.² Tzeng, E.³ Donahue, J.⁴ Girshick, R.⁵ Darrell, T.⁶ Saenko, K.⁷

13
- 70449621223
- The mir flickr retrieval evaluation
- New York, NY, USA. ACM
- M. J. Huiskes and M. S. Lew. The mir flickr retrieval evaluation. In MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval, New York, NY, USA, 2008. ACM.
- (2008) MIR '08: Proceedings of the 2008 ACM International Conference on Multimedia Information Retrieval
- Huiskes, M.J.¹ Lew, M.S.²

14
- 84913580146
- Caffe: Convolutional architecture for fast feature embedding
- ACM
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia, pages 675-678. ACM, 2014.
- (2014) Proceedings of the ACM International Conference on Multimedia , pp. 675-678
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

15
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CVPR, 2015.
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

16
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhuditnov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
- (2015) TACL
- Kiros, R.¹ Salakhuditnov, R.² Zemel, R.S.³

17
- 84893398951
- Generating natural-language video descriptions using text-mined knowledge
- N. Krishnamoorthy, G. Malkarnenkar, R. J. Mooney, K. Saenko, and S. Guadarrama. Generating natural-language video descriptions using text-mined knowledge. In AAAI, 2013.
- (2013) AAAI
- Krishnamoorthy, N.¹ Malkarnenkar, G.² Mooney, R.J.³ Saenko, K.⁴ Guadarrama, S.⁵

18
- 84887601544
- Babytalk: Understanding and generating simple image descriptions
- G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. Berg. Babytalk: Understanding and generating simple image descriptions. TPAMI, 2013.
- (2013) TPAMI
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.⁸

19
- 84894522762
- Attributebased classification for zero-shot visual object categorization
- C. Lampert, H. Nickisch, and S. Harmeling. Attributebased classification for zero-shot visual object categorization. TPAMI, 2014.
- (2014) TPAMI
- Lampert, C.¹ Nickisch, H.² Harmeling, S.³

20
- 85009931853
- Microsoft coco: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

21
- 84973863256
- Learning like a child: Fast novel visual concept learning from sentence descriptions of images
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. L. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, 2015.
- (2015) ICCV
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.L.⁶

22
- 85083951332
- Efficient estimation of word representations in vector space
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. ICLR Workshop, 2013.
- (2013) ICLR Workshop
- Mikolov, T.¹ Chen, K.² Corrado, G.³ Dean, J.⁴

23
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: A method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

24
- 84856670612
- Relative attributes
- D. Parikh and K. Grauman. Relative attributes. In ICCV, 2011.
- (2011) ICCV
- Parikh, D.¹ Grauman, K.²

25
- 85090348677
- Collecting image annotations using amazon's mechanical turk
- Association for Computational Linguistics
- C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk, pages 139-147. Association for Computational Linguistics, 2010.
- (2010) Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk , pp. 139-147
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

26
- 84973887740
- The long-short story of movie description
- A. Rohrbach, M. Rohrbach, and B. Schiele. The long-short story of movie description. GCPR, 2015.
- (2015) GCPR
- Rohrbach, A.¹ Rohrbach, M.² Schiele, B.³

27
- 77955989949
- What helps Where-and Why Semantic Relatedness for Knowledge Transfer
- M. Rohrbach, M. Stark, G. Szarvas, I. Gurevych, and B. Schiele. What helps Where-and Why Semantic Relatedness for Knowledge Transfer. In CVPR, 2010.
- (2010) CVPR
- Rohrbach, M.¹ Stark, M.² Szarvas, G.³ Gurevych, I.⁴ Schiele, B.⁵

28
- 84947041871
- Imagenet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2014.
- (2014) IJCV
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰

29
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

30
- 84898938559
- Zero-shot learning through cross-modal transfer
- R. Socher, M. Ganjoo, C. D. Manning, and A. Ng. Zero-shot learning through cross-modal transfer. In NIPS. 2013.
- (2013) NIPS.
- Socher, R.¹ Ganjoo, M.² Manning, C.D.³ Ng, A.⁴

31
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. J. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, 2014.
- (2014) COLING
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.J.⁵

32
- 84905470734
- Overview of the imageclef 2012 flickr photo annotation and retrieval task
- B. Thomee and A. Popescu. Overview of the imageclef 2012 flickr photo annotation and retrieval task. In CLEF (Online Working Notes/Labs/Workshop), volume 12, 2012.
- (2012) CLEF (Online Working Notes/Labs/Workshop) , vol.12
- Thomee, B.¹ Popescu, A.²

33
- 84983470508
- Feature-rich part-of-speech tagging with a cyclic dependency network
- K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. Feature-rich part-of-speech tagging with a cyclic dependency network. In NAACL, 2003.
- (2003) NAACL
- Toutanova, K.¹ Klein, D.² Manning, C.D.³ Singer, Y.⁴

34
- 84973882730
- Sequence to sequence-video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. ICCV, 2015.
- (2015) ICCV
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.J.⁴ Darrell, T.⁵ Saenko, K.⁶

35
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL, 2015.
- (2015) NAACL
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

36
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

37
- 85087664395
- Image captioning with an intermediate attributes layer
- Q. Wu, C. Shen, A. v. d. Hengel, L. Liu, and A. Dick. Image captioning with an intermediate attributes layer. ArXiv preprint arXiv: 1506. 01144, 2015.
- (2015) ArXiv Preprint ArXiv: 1506. 01144
- Wu, Q.¹ Shen, C.² Hengel A, V.D.³ Liu, L.⁴ Dick, A.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.