SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2017-October, Issue , 2017, Pages 1251-1259

Areas of Attention for Image Captioning

(4) Pedersoli, Marco a Lucas, Thomas b Schmid, Cordelia b Verbeek, Jakob b

a ÉCOLE DE TECHNOLOGIE SUPÉRIEURE (Canada)

b UNIV GRENOBLE ALPES (France)

Author keywords

[No Author keywords available]

Indexed keywords

OBJECT DETECTION;

ATTENTION MECHANISMS; AUTOMATIC IMAGE CAPTIONING; IMAGE CAPTIONING; IMAGE REGIONS; LANGUAGE MODEL; OBJECT DETECTORS; PAIRWISE INTERACTION; YIELD STATE;

COMPUTER VISION;

EID: 85041899820 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2017.140 Document Type: Conference Paper

Times cited : (214)

References (39)

1
- 85083953689
- Neural machine translation by jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015
- (2015) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 84965179228
- Scheduled sampling for sequence prediction with recurrent neural networks
- S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS, 2015
- (2015) NIPS
- Bengio, S.¹ Vinyals, O.² Jaitly, N.³ Shazeer, N.⁴

3
- 84986269551
- Weakly supervised deep detection networks
- H. Bilen and A. Vedaldi. Weakly supervised deep detection networks. In CVPR, 2016
- (2016) CVPR
- Bilen, H.¹ Vedaldi, A.²

4
- 84965139600
- Attention-based models for speech recognition
- J. Chorowski, D. Bahdanau, D. Serdyuk, K. Cho, and Y. Bengio. Attention-based models for speech recognition. In NIPS, 2015
- (2015) NIPS
- Chorowski, J.¹ Bahdanau, D.² Serdyuk, D.³ Cho, K.⁴ Bengio, Y.⁵

5
- 84939821078
- Empirical evaluation of gated recurrent neural networks on sequence modeling
- J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. In NIPS Deep Learning Workshop, 2014
- (2014) NIPS Deep Learning Workshop
- Chung, J.¹ Gulcehre, C.² Cho, K.³ Bengio, Y.⁴

6
- 84911376072
- Multi-fold MIL training for weakly supervised object localization
- R. Cinbis, J. Verbeek, and C. Schmid. Multi-fold MIL training for weakly supervised object localization. In CVPR, 2014
- (2014) CVPR
- Cinbis, R.¹ Verbeek, J.² Schmid, C.³

7
- 85198028989
- Imagenet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, 2009
- (2009) CVPR
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

8
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. Hendricks, M. Rohrbach, S. Venugopalan, S. Guadarrama, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.² Rohrbach, M.³ Venugopalan, S.⁴ Guadarrama, S.⁵ Saenko, K.⁶ Darrell, T.⁷

9
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, C. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰ Zitnick, C.¹¹ Zweig, G.¹²

10
- 85029359197
- Fast r-cnn
- R. Girshick. Fast R-CNN. In ICCV, 2015
- (2015) ICCV
- Girshick, R.¹

11
- 84983208884
- DRAW: A recurrent neural network for image generation
- K. Gregor, I. Danihelka, A. Graves, and D.Wierstra. DRAW: A recurrent neural network for image generation. In ICML, 2015
- (2015) ICML
- Gregor, K.¹ Danihelka, I.² Graves, A.³ Wierstra, D.⁴

12
- 84928278589
- Spatial pyramid pooling in deep convolutional networks for visual recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In ECCV, 2014
- (2014) ECCV
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

13
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735-1780, 1997
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

14
- 84965096967
- Spatial transformer networks
- M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. In NIPS, 2015
- (2015) NIPS
- Jaderberg, M.¹ Simonyan, K.² Zisserman, A.³ Kavukcuoglu, K.⁴

15
- 84986312327
- arXiv:1506.06272
- J. Jin, K. Fu, R. Cui, F. Sha, and C. Zhang. Aligning where to see and what to tell: image caption with region-based attention and scene factorization. ArXiv:1506.06272, 2015
- (2015) Aligning Where to See and What to Tell: Image Caption with Region-based Attention and Scene Factorization
- Jin, J.¹ Fu, K.² Cui, R.³ Sha, F.⁴ Zhang, C.⁵

16
- 84986245786
- DenseCap: Fully convolutional localization networks for dense captioning
- J. Johnson, A. Karpathy, and L. Fei-Fei. DenseCap: Fully convolutional localization networks for dense captioning. In CVPR, 2016
- (2016) CVPR
- Johnson, J.¹ Karpathy, A.² Fei-Fei, L.³

17
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

18
- 85083951076
- Adam: A method for stochastic optimization
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015
- (2015) ICLR
- Kingma, D.¹ Ba, J.²

19
- 84919921461
- Multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language models. In ICML, 2014
- (2014) ICML
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

20
- 84937834115
- Microsoft COCO: Common objects in context
- T. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. Zitnick. Microsoft COCO: common objects in context. In ECCV, 2014
- (2014) ECCV
- Lin, T.¹ Maire, M.² Belongie, S.³ Bourdev, L.⁴ Girshick, R.⁵ Hays, J.⁶ Perona, P.⁷ Ramanan, D.⁸ Dollár, P.⁹ Zitnick, C.¹⁰

21
- 85030472316
- Attention correctness in neural image captioning
- C. Liu, J. Mao, F. Sha, and A. Yuille. Attention correctness in neural image captioning. In AAAI, 2017
- (2017) AAAI
- Liu, C.¹ Mao, J.² Sha, F.³ Yuille, A.⁴

22
- 85011302702
- SSD: Single shot multibox detector
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. Berg. SSD: Single shot multibox detector. In ECCV, 2016
- (2016) ECCV
- Liu, W.¹ Anguelov, D.² Erhan, D.³ Szegedy, C.⁴ Reed, S.⁵ Fu, C.-Y.⁶ Berg, A.⁷

23
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-RNN)
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-RNN). ICLR, 2015
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

24
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-tosentence models
- B. Plummer, L. Wang, C. Cervantes, J. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-tosentence models. In ICCV, 2015
- (2015) ICCV
- Plummer, B.¹ Wang, L.² Cervantes, C.³ Caicedo, J.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

25
- 85083951479
- Sequence level training with recurrent neural networks
- M. Ranzato, S. Chopra, M. Auli, andW. Zaremba. Sequence level training with recurrent neural networks. In ICLR, 2016
- (2016) ICLR
- Ranzato, M.¹ Chopra, S.² Auli, M.³ Zaremba, W.⁴

26
- 84960980241
- Faster R-CNN: Towards real-time object detection with region proposal networks
- S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. In NIPS, 2015
- (2015) NIPS
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

27
- 84990024294
- Grounding of textual phrases in images by reconstruction
- A. Rohrbach, M. Rohrbach, R. Hu, T. Darrell, and B. Schiele. Grounding of textual phrases in images by reconstruction. In ECCV, 2016
- (2016) ECCV
- Rohrbach, A.¹ Rohrbach, M.² Hu, R.³ Darrell, T.⁴ Schiele, B.⁵

28
- 84885881090
- Objectcentric spatial pooling for image classification
- O. Russakovsky, Y. Lin, K. Yu, and L. Fei-Fei. Objectcentric spatial pooling for image classification. In ECCV, 2012
- (2012) ECCV
- Russakovsky, O.¹ Lin, Y.² Yu, K.³ Fei-Fei, L.⁴

29
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

30
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. Le. Sequence to sequence learning with neural networks. In NIPS, 2014
- (2014) NIPS
- Sutskever, I.¹ Vinyals, O.² Le, Q.³

31
- 84881160857
- Selective search for object recognition
- J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders. Selective search for object recognition. IJCV, 104(2):154-171, 2013
- (2013) IJCV , vol.104 , Issue.2 , pp. 154-171
- Uijlings, J.¹ Van de Sande, K.² Gevers, T.³ Smeulders, A.⁴

32
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

33
- 84986301177
- What value do explicit high level concepts have in vision to language problems
- Q. Wu, C. Shen, L. Liu, A. Dick, and A. van den Hengel. What value do explicit high level concepts have in vision to language problems In CVPR, 2016
- (2016) CVPR
- Wu, Q.¹ Shen, C.² Liu, L.³ Dick, A.⁴ Van den Hengel, A.⁵

34
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

35
- 85030211479
- Encode, review, and decode: Reviewer module for caption generation
- Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdinov, and W. Cohen. Encode, review, and decode: Reviewer module for caption generation. In NIPS, 2016
- (2016) NIPS
- Yang, Z.¹ Yuan, Y.² Wu, Y.³ Salakhutdinov, R.⁴ Cohen, W.⁵

36
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015
- (2015) ICCV
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

37
- 84986240394
- S. Yeung, O. Russakovsky, N. Jin, M. Andriluka, G. Mori, and L. Fei-Fei. Every moment counts: Dense detailed labeling of actions in complex videos
- Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
- Yeung, S.¹ Russakovsky, O.² Jin, N.³ Andriluka, M.⁴ Mori, G.⁵ Fei-Fei, L.⁶

38
- 84986317307
- Image captioning with semantic attention
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016
- (2016) CVPR
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

39
- 84952018709
- Edge boxes: Locating object proposals from edges
- C. Zitnick and P. Dollár. Edge boxes: locating object proposals from edges. In ECCV, 2014.
- (2014) ECCV
- Zitnick, C.¹ Dollár, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.