SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 3242-3250

Knowing when to look: Adaptive attention via a visual sentinel for image captioning

(4) Lu, Jiasen b Xiong, Caiming a Parikh, Devi c Socher, Richard a

a Salesforcecom Inc (United States)

b VIRGINIA POLYTECHNIC INSTITUTE AND STATE UNIVERSITY (United States)

c GEORGIA INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

BEHAVIORAL RESEARCH; COMPUTER VISION; DECODING; PATTERN RECOGNITION; STATISTICAL TESTS;

ATTENTION MODEL; ENCODER-DECODER; IMAGE CAPTIONING; LANGUAGE MODEL; NON VISUALS; STATE OF THE ART; VISUAL ATTENTION; VISUAL INFORMATION;

VISUAL LANGUAGES;

EID: 85041910666 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.345 Document Type: Conference Paper

Times cited : (1427)

References (36)

1
- 85021678581
- Spice: Semantic propositional image caption evaluation
- P. Anderson, B. Fernando, M. Johnson, and S. Gould. Spice: Semantic propositional image caption evaluation. In ECCV, 2016.
- (2016) ECCV
- Anderson, P.¹ Fernando, B.² Johnson, M.³ Gould, S.⁴

2
- 84922389693
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv: 1409.0473, 2014.
- (2014) Neural Machine Translation by Jointly Learning to Align and Translate
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

3
- 84957029470
- Mind's eye: A recurrent visual representation for image caption generation
- X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015.
- (2015) CVPR
- Chen, X.¹ Lawrence Zitnick, C.²

4
- 84961291190
- K. Cho, B. Van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv: 1406.1078, 2014.
- (2014) Learning Phrase Representations Using Rnn Encoder-Decoder for Statistical Machine Translation
- Cho, K.¹ Van Merrienboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

5
- 85107661995
- Meteor universal: Language specific translation evaluation for any target language
- M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL 2014 Workshop on Statistical Machine Translation, 2014.
- (2014) EACL 2014 Workshop on Statistical Machine Translation
- Denkowski, M.¹ Lavie, A.²

6
- 84965102873
- J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv: 1505.04467, 2015.
- (2015) Exploring Nearest Neighbor Approaches For Image Captioning
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Zitnick, C.L.⁵

7
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

8
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰

9
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
- (2010) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

10
- 84986274465
- Deep residual learning for image recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

11
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

12
- 84919921461
- Multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Multimodal neural language models. In ICML, 2014.
- (2014) ICML
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

13
- 80052901011
- Babytalk: Understanding and generating simple image descriptions
- G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. In CVPR, 2011.
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.L.⁸

14
- 84878189119
- Collective generation of natural image descriptions
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, 2012.
- (2012) ACL
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

15
- 78650200194
- Rouge: A package for automatic evaluation of summaries
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In ACL 2004 Workshop, 2004.
- (2004) ACL 2004 Workshop
- Lin, C.-Y.¹

16
- 84937834115
- Microsoft coco: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

17
- 85018917850
- Hierarchical question-image co-attention for visual question answering
- J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical question-image co-attention for visual question answering. In NIPS, 2016.
- (2016) NIPS
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

18
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

19
- 85037338922
- S. Merity, C. Xiong, J. Bradbury, and R. Socher. Pointer sentinel mixture models. arXiv preprint arXiv: 1609.07843, 2016.
- (2016) Pointer Sentinel Mixture Models
- Merity, S.¹ Xiong, C.² Bradbury, J.³ Socher, R.⁴

20
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, and H. Daumé III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012.
- (2012) EACL
- Mitchell, M.¹ Han, X.² Dodge, J.³ Mensch, A.⁴ Goyal, A.⁵ Berg, A.⁶ Yamaguchi, K.⁷ Berg, T.⁸ Stratos, K.⁹ Daumé, H.¹⁰

21
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

22
- 85027880264
- R. R.Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra. Grad-cam: Why did you say that? visual explanations from deep networks via gradient-based localization. arXiv: 1611.01646, 2016.
- (2016) Grad-Cam: Why Did You Say That? Visual Explanations from Deep Networks Via Gradient-Based Localization
- Selvaraju, R.R.¹ Das, A.² Vedantam, R.³ Cogswell, M.⁴ Parikh, D.⁵ Batra, D.⁶

23
- 84906925854
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. 2014.
- (2014) Grounded Compositional Semantics for Finding and Describing Images With Sentences
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

24
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
- (2014) NIPS
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

25
- 84990032289
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. arXiv preprint arXiv: 1512.00567, 2015.
- (2015) Rethinking the Inception Architecture For Computer Vision
- Szegedy, C.¹ Vanhoucke, V.² Ioffe, S.³ Shlens, J.⁴ Wojna, Z.⁵

26
- 84956980995
- Cider: Consensus-based image description evaluation
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

27
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

28
- 85028032121
- Q. Wu, C. Shen, L. Liu, A. Dick, and A. v. d. Hengel. What value do explicit high level concepts have in vision to language problems? arXiv preprint arXiv: 1506.01144, 2015.
- (2015) What Value do Explicit High Level Concepts Have in Vision to Language Problems?
- Wu, Q.¹ Shen, C.² Liu, L.³ Dick, A.⁴ Hengel, A.V.D.⁵

29
- 84999008900
- Dynamic memory networks for visual and textual question answering
- C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. In ICML, 2016.
- (2016) ICML
- Xiong, C.¹ Merity, S.² Socher, R.³

30
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

31
- 84986334021
- Stacked attention networks for image question answering
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016.
- (2016) CVPR
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

32
- 85030211479
- Encode, review, and decode: Reviewer module for caption generation
- Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdinov, and W. W. Cohen. Encode, review, and decode: Reviewer module for caption generation. In NIPS, 2016.
- (2016) NIPS
- Yang, Z.¹ Yuan, Y.² Wu, Y.³ Salakhutdinov, R.⁴ Cohen, W.W.⁵

33
- 85029380574
- T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei. Boosting image captioning with attributes. arXiv preprint arXiv: 1611.01646, 2015.
- (2015) Boosting Image Captioning With Attributes
- Yao, T.¹ Pan, Y.² Li, Y.³ Qiu, Z.⁴ Mei, T.⁵

34
- 84986317307
- Image captioning with semantic attention
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016.
- (2016) CVPR
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

35
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In ACL, 2014.
- (2014) ACL
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

36
- 84990054197
- B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba. Learning deep features for discriminative localization. arXiv preprint arXiv: 1512.04150, 2015.
- (2015) Learning Deep Features For Discriminative Localization
- Zhou, B.¹ Khosla, A.² Lapedriza, A.³ Oliva, A.⁴ Torralba, A.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.