SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 6298-6306

SCA-CNN: Spatial and channel-wise attention in convolutional networks for image captioning

(7) Chen, Long a Zhang, Hanwang b Xiao, Jun a Nie, Liqiang c Shao, Jian a Liu, Wei d Chua, Tat Seng e

a ZHEJIANG UNIVERSITY (China)

b Columbia University ^* (United States)

c SHANDONG UNIVERSITY (China)

d TENCENT AI LAB (China)

e NATIONAL UNIVERSITY OF SINGAPORE (Singapore)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; CONVOLUTION; ENCODING (SYMBOLS); NEURAL NETWORKS; SIGNAL ENCODING;

ATTENTION MECHANISMS; CONVOLUTIONAL NETWORKS; CONVOLUTIONAL NEURAL NETWORK; DYNAMIC FEATURES; QUESTION ANSWERING; SPATIAL ATTENTION; STATE OF THE ART; VISUAL ATTENTION MODEL;

BEHAVIORAL RESEARCH;

EID: 85029348551 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.667 Document Type: Conference Paper

Times cited : (1744)

References (43)

1
- 84973890960
- Vqa: Visual question answering
- In 2
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, 2015. 2
- (2015) ICCV
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Lawrence Zitnick, C.⁶ Parikh, D.⁷

2
- 84959933549
- Neural machine translation by jointly learning to align and translate
- In 2
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2014. 2
- (2014) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

3
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- In 5
- S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL, 2005. 5
- (2005) ACL
- Banerjee, S.¹ Lavie, A.²

4
- 85044532646
- Abc-cnn: An attention based convolutional neural network for visual question answering
- In 1
- K. Chen, J. Wang, L.-C. Chen, H. Gao, W. Xu, and R. Nevatia. Abc-cnn: An attention based convolutional neural network for visual question answering. In CVPR, 2016. 1
- (2016) CVPR
- Chen, K.¹ Wang, J.² Chen, L.-C.³ Gao, H.⁴ Xu, W.⁵ Nevatia, R.⁶

5
- 0036517313
- Control of goal-directed and stimulus-driven attention in the brain
- 1
- M. Corbetta and G. L. Shulman. Control of goal-directed and stimulus-driven attention in the brain. Nature reviews neuroscience, 2002. 1
- (2002) Nature Reviews Neuroscience
- Corbetta, M.¹ Shulman, G.L.²

6
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- In 2
- J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015. 2
- (2015) CVPR
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

7
- 84965148420
- Are you talking to a machine? Dataset and methods for multilingual image question
- In 2
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? dataset and methods for multilingual image question. In NIPS, 2015. 2
- (2015) NIPS
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

8
- 84978717864
- 1, 2, 3, 5
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. 2016. 1, 2, 3, 5
- (2016) Deep Residual Learning for Image Recognition
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

9
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 1997. 2, 5
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

10
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 4
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 2013. 4
- (2013) JAIR
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

11
- 84973917813
- Guiding the long-short term memory model for image caption generation
- In 2, 5, 6
- X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding the long-short term memory model for image caption generation. In ICCV, 2015. 2, 5, 6
- (2015) ICCV
- Jia, X.¹ Gavves, E.² Fernando, B.³ Tuytelaars, T.⁴

12
- 84962850780
- Deep compositional cross-modal learning to rank via local-global alignment
- In 2
- X. Jiang, F. Wu, X. Li, Z. Zhao, W. Lu, S. Tang, and Y. Zhuang. Deep compositional cross-modal learning to rank via local-global alignment. In ACM MM, pages 69-78, 2015. 2
- (2015) ACM MM , pp. 69-78
- Jiang, X.¹ Wu, F.² Li, X.³ Zhao, Z.⁴ Lu, W.⁵ Tang, S.⁶ Zhuang, Y.⁷

13
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- In 2, 5, 6
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 2, 5, 6
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

14
- 84990070438
- Visual genome: Connecting language and vision using crowdsourced dense image annotations
- 2
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 2016. 2
- (2016) IJCV
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

15
- 79960403098
- Rouge: A package for automatic evaluation of summaries
- 5
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In ACL, 2004. 5
- (2004) ACL
- Lin, C.-Y.¹

16
- 84937834115
- Microsoft coco: Common objects in context
- In 5
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 5
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

17
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- In 2
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015. 2
- (2015) ICCV
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

18
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- In 6
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015. 6
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

19
- 84937959846
- Recurrent models of visual attention
- 1
- V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In NIPS, 2014. 1
- (2014) NIPS
- Mnih, V.¹ Heess, N.² Graves, A.³

20
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- In 5
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002. 5
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

21
- 84965170394
- Exploring models and data for image question answering
- In 2
- M. Ren, R. Kiros, and R. Zemel. Exploring models and data for image question answering. In NIPS, 2015. 2
- (2015) NIPS
- Ren, M.¹ Kiros, R.² Zemel, R.³

22
- 85030460823
- arXiv preprint 2
- P. H. Seo, Z. Lin, S. Cohen, X. Shen, and B. Han. Hierarchical attention networks. arXiv preprint arXiv:1606.02393, 2016. 2
- (2016) Hierarchical Attention Networks
- Seo, P.H.¹ Lin, Z.² Cohen, S.³ Shen, X.⁴ Han, B.⁵

23
- 84951869843
- Supervised discrete hashing
- In 2
- F. Shen, C. Shen, W. Liu, and H. Tao Shen. Supervised discrete hashing. In CVPR, pages 37-45, 2015. 2
- (2015) CVPR , pp. 37-45
- Shen, F.¹ Shen, C.² Liu, W.³ Tao Shen, H.⁴

24
- 84887334105
- Inductive hashing on manifolds
- In 2
- F. Shen, C. Shen, Q. Shi, A. Van Den Hengel, and Z. Tang. Inductive hashing on manifolds. In CVPR, pages 1562-1569, 2013. 2
- (2013) CVPR , pp. 1562-1569
- Shen, F.¹ Shen, C.² Shi, Q.³ Van Den Hengel, A.⁴ Tang, Z.⁵

25
- 84925410541
- arXiv preprint 1, 2, 3, 5
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. 1, 2, 3, 5
- (2014) Very Deep Convolutional Networks for Large-Scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

26
- 84937961845
- Deep networks with internal selective attention through feedback connections
- In 1
- M. F. Stollenga, J. Masci, F. Gomez, and J. Schmidhuber. Deep networks with internal selective attention through feedback connections. In NIPS, 2014. 1
- (2014) NIPS
- Stollenga, M.F.¹ Masci, J.² Gomez, F.³ Schmidhuber, J.⁴

27
- 84986296808
- Rethinking the inception architecture for computer vision
- In 6
- C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR, pages 2818-2826, 2016. 6
- (2016) CVPR , pp. 2818-2826
- Szegedy, C.¹ Vanhoucke, V.² Ioffe, S.³ Shlens, J.⁴ Wojna, Z.⁵

28
- 84956980995
- Cider: Consensus-based image description evaluation
- In 5
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015. 5
- (2015) CVPR
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

29
- 84973882730
- Sequence to sequence-video to text
- In 2
- S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, 2015. 2
- (2015) ICCV
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

30
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- In 2
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL-HLT, 2015. 2
- (2015) NAACL-HLT
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

31
- 84946747440
- Show and tell: A neural image caption generator
- In 2, 5, 6
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 2, 5, 6
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

32
- 84981331874
- Hcp: A flexible cnn framework for multi-label image classification
- 1
- Y. Wei, W. Xia, M. Lin, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan. Hcp: A flexible cnn framework for multi-label image classification. TPAMI, 2016. 1
- (2016) TPAMI
- Wei, Y.¹ Xia, W.² Lin, M.³ Huang, J.⁴ Ni, B.⁵ Dong, J.⁶ Zhao, Y.⁷ Yan, S.⁸

33
- 85035008367
- Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
- H. Xu and K. Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, 2016. 1, 2
- (2016) ECCV
- Xu, H.¹ Saenko, K.²

34
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- In 1, 2, 3, 5, 6
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. 1, 2, 3, 5, 6
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.S.⁷ Bengio, Y.⁸

35
- 84986334021
- Stacked attention networks for image question answering
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016. 1, 2
- (2016) CVPR
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

36
- 84973884896
- Describing videos by exploiting temporal structure
- In 1
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015. 1
- (2015) ICCV
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

37
- 84986317307
- Image captioning with semantic attention
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016. 2, 6
- (2016) CVPR
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

38
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- 5
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014. 5
- (2014) TACL
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

39
- 84969736572
- arXiv preprint 5
- M. D. Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012. 5
- (2012) Adadelta: An Adaptive Learning Rate Method
- Zeiler, M.D.¹

40
- 84921476116
- Visualizing and understanding convolutional networks
- In 1, 2, 7
- M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. In ECCV, 2014. 1, 2, 7
- (2014) ECCV
- Zeiler, M.D.¹ Fergus, R.²

41
- 85029388674
- Visual translation embedding network for visual relation detection
- In 2
- H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua. Visual translation embedding network for visual relation detection. In CVPR, 2017. 2
- (2017) CVPR
- Zhang, H.¹ Kyaw, Z.² Chang, S.-F.³ Chua, T.-S.⁴

42
- 84994666699
- Partial multimodal sparse coding via adaptive similarity structure regularization
- In 2
- Z. Zhao, H. Lu, C. Deng, X. He, and Y. Zhuang. Partial multimodal sparse coding via adaptive similarity structure regularization. In ACM MM, pages 152-156, 2016. 2
- (2016) ACM MM , pp. 152-156
- Zhao, Z.¹ Lu, H.² Deng, C.³ He, X.⁴ Zhuang, Y.⁵

43
- 84986275767
- Visual7w: Grounded question answering in images
- In 2
- Y. Zhu, O. Groth, M. Bernstein, and L. Fei-Fei. Visual7w: Grounded question answering in images. In CVPR, 2016. 2
- (2016) CVPR
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.