SCOPUS 정보 검색 플랫폼

31st AAAI Conference on Artificial Intelligence, AAAI 2017

Volumn , Issue , 2017, Pages 4176-4182

Attention correctness in neural image captioning

(4) Liu, Chenxi a Mao, Junhua b Sha, Fei b,c Yuille, Alan a,b

a JOHNS HOPKINS UNIVERSITY (United States)

b UNIVERSITY OF CALIFORNIA (United States)

c UNIVERSITY OF SOUTHERN CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; NATURAL LANGUAGE PROCESSING SYSTEMS;

ATTENTION MECHANISMS; HUMAN ANNOTATIONS; HUMAN LIKE; IMAGE CAPTIONING; MACHINE PERCEPTION; QUANTITATIVE EVALUATION;

IMAGE ENHANCEMENT;

EID: 85030472316 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (231)

References (34)

1
- 84955518079
- arXiv preprint arXiv:1412.7755
- Ba, J.; Mnih, V.; and Kavukcuoglu, K. 2014. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755.
- (2014) Multiple Object Recognition with Visual Attention
- Ba, J.¹ Mnih, V.² Kavukcuoglu, K.³

2
- 84922389693
- arXiv preprint arXiv:1409.0473
- Bahdanau, D.; Cho, K.; and Bengio, Y. 2014. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- (2014) Neural Machine Translation by Jointly Learning to Align and Translate
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

3
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- Banerjee, S., and Lavie, A. 2005. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Volume 29, 65-72.
- (2005) Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation And/or Summarization , vol.29 , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

4
- 84944115859
- arXiv preprint arXiv:1411.5654
- Chen, X., and Zitnick, C. L. 2014. Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654.
- (2014) Learning a Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

5
- 84986262382
- arXiv preprint arXiv:1511.05960
- Chen, K.; Wang, J.; Chen, L.-C.; Gao, H.; Xu, W.; and Nevatia, R. 2015. Abc-cnn: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960.
- (2015) Abc-cnn: An Attention Based Convolutional Neural Network for Visual Question Answering
- Chen, K.¹ Wang, J.² Chen, L.-C.³ Gao, H.⁴ Xu, W.⁵ Nevatia, R.⁶

6
- 84919728106
- arXiv preprint arXiv:1406.1078
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
- (2014) Learning Phrase Representations Using Rnn Encoder-decoder for Statistical Machine Translation
- Cho, K.¹ Van Merriënboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

7
- 84990044140
- arXiv preprint arXiv:1606.03556
- Das, A.; Agrawal, H.; Zitnick, C. L.; Parikh, D.; and Batra, D. 2016. Human attention in visual question answering: Do humans and deep networks look at the same regions? arXiv preprint arXiv:1606.03556.
- (2016) Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
- Das, A.¹ Agrawal, H.² Zitnick, C.L.³ Parikh, D.⁴ Batra, D.⁵

8
- 85198028989
- Imagenet: A large-scale hierarchical image database
- IEEE
- Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR, 248-255. IEEE.
- (2009) CVPR , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

9
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- Donahue, J.; Anne Hendricks, L.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; and Darrell, T. 2015. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2625-2634.
- (2015) CVPR , pp. 2625-2634
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

10
- 84959250180
- From captions to visual concepts and back
- Fang, H.; Gupta, S.; Iandola, F.; Srivastava, R. K.; Deng, L.; Dollár, P.; Gao, J.; He, X.; Mitchell, M.; Platt, J. C.; et al. 2015. From captions to visual concepts and back. In CVPR, 1473-1482.
- (2015) CVPR , pp. 1473-1482
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰

11
- 0031573117
- Long short-term memory
- Hochreiter, S., and Schmidhuber, J. 1997. Long short-term memory. Neural computation 9(8):1735-1780.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

12
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- Hodosh, M.; Young, P.; and Hockenmaier, J. 2013. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research 853-899.
- (2013) Journal of Artificial Intelligence Research , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

13
- 84986302997
- arXiv preprint arXiv:1511.04164
- Hu, R.; Xu, H.; Rohrbach, M.; Feng, J.; Saenko, K.; and Darrell, T. 2015. Natural language object retrieval. arXiv preprint arXiv:1511.04164.
- (2015) Natural Language Object Retrieval
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

14
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- Karpathy, A., and Fei-Fei, L. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR, 3128-3137.
- (2015) CVPR , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

15
- 84941620184
- arXiv preprint arXiv:1412.6980
- Kingma, D., and Ba, J. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.¹ Ba, J.²

16
- 84944113729
- arXiv preprint arXiv:1411.2539
- Kiros, R.; Salakhutdinov, R.; and Zemel, R. S. 2014. Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539.
- (2014) Unifying Visual-semantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

17
- 84937834115
- arXiv preprint arXiv:1405.0312
- Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C. L.; and Dollár, P. 2014. Microsoft coco: Common objects in context. arXiv preprint arXiv:1405.0312.
- (2014) Microsoft Coco: Common Objects in Context
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Bourdev, L.⁴ Girshick, R.⁵ Hays, J.⁶ Perona, P.⁷ Ramanan, D.⁸ Zitnick, C.L.⁹ Dollár, P.¹⁰

18
- 84994358876
- arXiv preprint arXiv:1508.04025
- Luong, M.-T.; Pham, H.; and Manning, C. D. 2015. Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025.
- (2015) Effective Approaches to Attention-based Neural Machine Translation
- Luong, M.-T.¹ Pham, H.² Manning, C.D.³

19
- 85117622017
- The stanford corenlp natural language processing toolkit
- Manning, C. D.; Surdeanu, M.; Bauer, J.; Finkel, J. R.; Bethard, S.; and McClosky, D. 2014. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), 55-60.
- (2014) ACL (System Demonstrations) , pp. 55-60
- Manning, C.D.¹ Surdeanu, M.² Bauer, J.³ Finkel, J.R.⁴ Bethard, S.⁵ McClosky, D.⁶

20
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Huang, Z.; and Yuille, A. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

21
- 84986260074
- Generation and comprehension of unambiguous object descriptions
- Mao, J.; Huang, J.; Toshev, A.; Camburu, O.; Yuille, A.; and Murphy, K. 2016. Generation and comprehension of unambiguous object descriptions. In CVPR.
- (2016) CVPR
- Mao, J.¹ Huang, J.² Toshev, A.³ Camburu, O.⁴ Yuille, A.⁵ Murphy, K.⁶

22
- 84898956512
- Distributed representations of words and phrases and their compositionality
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111-3119.
- (2013) Advances in Neural Information Processing Systems , pp. 3111-3119
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

23
- 84937959846
- Recurrent models of visual attention
- Mnih, V.; Heess, N.; Graves, A.; et al. 2014. Recurrent models of visual attention. In Advances in Neural Information Processing Systems, 2204-2212.
- (2014) Advances in Neural Information Processing Systems , pp. 2204-2212
- Mnih, V.¹ Heess, N.² Graves, A.³

24
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL, 311-318.
- (2002) ACL , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

25
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
- Plummer, B. A.; Wang, L.; Cervantes, C. M.; Caicedo, J. C.; Hockenmaier, J.; and Lazebnik, S. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In ICCV, 2641-2649.
- (2015) ICCV , pp. 2641-2649
- Plummer, B.A.¹ Wang, L.² Cervantes, C.M.³ Caicedo, J.C.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

26
- 84986259962
- Learning to localize little landmarks
- Shih, K. J.; Singh, S.; and Hoiem, D. 2016. Learning to localize little landmarks. In Computer Vision and Pattern Recognition.
- (2016) Computer Vision and Pattern Recognition
- Shih, K.J.¹ Singh, S.² Hoiem, D.³

27
- 84925410541
- arXiv preprint arXiv:1409.1556
- Simonyan, K., and Zisserman, A. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

28
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929-1958.
- (2014) The Journal of Machine Learning Research , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

29
- 84946747440
- Show and tell: A neural image caption generator
- Vinyals, O.; Toshev, A.; Bengio, S.; and Erhan, D. 2015. Show and tell: A neural image caption generator. In CVPR, 3156-3164.
- (2015) CVPR , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

30
- 84990044633
- arXiv preprint arXiv:1511.05234
- Xu, H., and Saenko, K. 2015. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. arXiv preprint arXiv:1511.05234.
- (2015) Ask, Attend and Answer: Exploring Question-guided Spatial Attention for Visual Question Answering
- Xu, H.¹ Saenko, K.²

31
- 84939821074
- arXiv preprint arXiv:1502.03044
- Xu, K.; Ba, J.; Kiros, R.; Courville, A.; Salakhutdinov, R.; Zemel, R.; and Bengio, Y. 2015. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044.
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

32
- 84995439884
- arXiv preprint arXiv:1603.03925
- You, Q.; Jin, H.; Wang, Z.; Fang, C.; and Luo, J. 2016. Image captioning with semantic attention. arXiv preprint arXiv:1603.03925.
- (2016) Image Captioning with Semantic Attention
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

33
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- Young, P.; Lai, A.; Hodosh, M.; and Hockenmaier, J. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics 2:67-78.
- (2014) Transactions of the Association for Computational Linguistics , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

34
- 84990038229
- arXiv preprint arXiv:1511.03416
- Zhu, Y.; Groth, O.; Bernstein, M.; and Fei-Fei, L. 2015. Visual7w: Grounded question answering in images. arXiv preprint arXiv:1511.03416.
- (2015) Visual7w: Grounded Question Answering in Images
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.