SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 4651-4659

Image captioning with semantic attention

(5) You, Quanzeng a Jin, Hailin b Wang, Zhaowen b Fang, Chen b Luo, Jiebo a

a University of Rochester (United States)

b ADOBE RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

NATURAL LANGUAGE PROCESSING SYSTEMS; PATTERN RECOGNITION; RECURRENT NEURAL NETWORKS; SEMANTICS;

EVALUATION METRICS; HIDDEN STATE; IMAGE CAPTIONING; MICROSOFT; NATURAL LANGUAGE PROCESSING; NATURAL LANGUAGES; SEMANTIC CONCEPT; STATE-OF-THE-ART APPROACH;

COMPUTER VISION;

EID: 84986317307 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.503 Document Type: Conference Paper

Times cited : (1902)

References (38)

1
- 85083951423
- Multiple object recognition with visual attention
- J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. ICLR, 2015.
- (2015) ICLR
- Ba, J.¹ Mnih, V.² Kavukcuoglu, K.³

2
- 84965166940
- Neural machine translation by jointly learning to align and translate
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2014.
- (2014) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

3
- 84952349295
- arXiv preprint arXiv:1504.00325
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

4
- 84957029470
- Mind's eye: A recurrent visual representation for image caption generation
- X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, pages 2422-2431, 2015.
- (2015) CVPR , pp. 2422-2431
- Chen, X.¹ Zitnick, C.L.²

5
- 84961291190
- Learning phrase representations using rnn encoder-decoder for statistical machine translation
- K. Cho, B. Van Merrïenboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014.
- (2014) EMNLP
- Cho, K.¹ Van Merrïenboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

6
- 84867478719
- Learning where to attend with deep architectures for image tracking
- M. Denil, L. Bazzani, H. Larochelle, and N. de Freitas. Learning where to attend with deep architectures for image tracking. Neural computation, 24(8):2151-2184, 2012.
- (2012) Neural Computation , vol.24 , Issue.8 , pp. 2151-2184
- Denil, M.¹ Bazzani, L.² Larochelle, H.³ De Freitas, N.⁴

7
- 84965102873
- arXiv preprint arXiv:1505.04467
- J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467, 2015.
- (2015) Exploring Nearest Neighbor Approaches for Image Captioning
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Zitnick, C.L.⁵

8
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, pages 2626-2634, 2015.
- (2015) CVPR , pp. 2626-2634
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

9
- 84906929591
- Image description using visual dependency representations
- D. Elliott and F. Keller. Image description using visual dependency representations. In EMNLP, pages 1292-1302, 2013.
- (2013) EMNLP , pp. 1292-1302
- Elliott, D.¹ Keller, F.²

10
- 84959190514
- On the relationship between visual attributes and convolutional networks
- V. Escorcia, J. C. Niebles, and B. Ghanem. On the relationship between visual attributes and convolutional networks. In CVPR, pages 1256-1264, 2015.
- (2015) CVPR , pp. 1256-1264
- Escorcia, V.¹ Niebles, J.C.² Ghanem, B.³

11
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015.
- (2015) CVPR , pp. 1473-1482
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

12
- 78149311145
- Every picture tells a story: Generating sentences from images
- Springer
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, pages 15-29. Springer, 2010.
- (2010) ECCV , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

13
- 85083950293
- Deep convolutional ranking for multilabel image annotation
- Y. Gong, Y. Jia, T. Leung, A. Toshev, and S. Ioffe. Deep convolutional ranking for multilabel image annotation. ICLR, 2014.
- (2014) ICLR
- Gong, Y.¹ Jia, Y.² Leung, T.³ Toshev, A.⁴ Ioffe, S.⁵

14
- 84906484732
- Improving image-sentence embeddings using large weakly annotated photo collections
- Springer
- Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In ECCV, pages 529-545. Springer, 2014.
- (2014) ECCV , pp. 529-545
- Gong, Y.¹ Wang, L.² Hodosh, M.³ Hockenmaier, J.⁴ Lazebnik, S.⁵

15
- 84965100881
- arXiv preprint arXiv:1502.04623
- K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015.
- (2015) Draw: A Recurrent Neural Network for Image Generation
- Gregor, K.¹ Danihelka, I.² Graves, A.³ Wierstra, D.⁴

16
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- June
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, June 2015.
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

17
- 0003153058
- Shifts in selective visual attention: Towards the underlying neural circuitry
- Springer
- C. Koch and S. Ullman. Shifts in selective visual attention: towards the underlying neural circuitry. In Matters of intelligence, pages 115-141. Springer, 1987.
- (1987) Matters of Intelligence , pp. 115-141
- Koch, C.¹ Ullman, S.²

18
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097-1105, 2012.
- (2012) NIPS , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

19
- 80052901011
- Baby talk: Understanding and generating image descriptions
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating image descriptions. In CVPR. Citeseer, 2011.
- (2011) CVPR. Citeseer
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

20
- 84878189119
- Collective generation of natural image descriptions
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, pages 359-368, 2012.
- (2012) ACL , pp. 359-368
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

21
- 85162061663
- Learning to combine foveal glimpses with a third-order boltzmann machine
- H. Larochelle and G. E. Hinton. Learning to combine foveal glimpses with a third-order boltzmann machine. In NIPS, pages 1243-1251, 2010.
- (2010) NIPS , pp. 1243-1251
- Larochelle, H.¹ Hinton, G.E.²

22
- 85083952381
- Simple image description generator via a linear phrase-based approach
- R. Lebret, P. O. Pinheiro, and R. Collobert. Simple image description generator via a linear phrase-based approach. ICLR, 2015.
- (2015) ICLR
- Lebret, R.¹ Pinheiro, P.O.² Collobert, R.³

23
- 84862279067
- Composing simple image descriptions using web-scale ngrams
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale ngrams. In CoNLL, pages 220-228, 2011.
- (2011) CoNLL , pp. 220-228
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

24
- 84959205572
- Fully convolutional networks for semantic segmentation
- June
- J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. In CVPR, June 2015.
- (2015) CVPR
- Long, J.¹ Shelhamer, E.² Darrell, T.³

25
- 84973863256
- Learning like a child: Fast novel visual concept learning from sentence descriptions of images
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, 2015.
- (2015) ICCV
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

26
- 84939821073
- arXiv preprint arXiv:1412.6632
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (mrnn). arXiv preprint arXiv:1412.6632, 2014.
- (2014) Deep Captioning with Multimodal Recurrent Neural Networks (Mrnn)
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.⁵

27
- 84898956512
- Distributed representations of words and phrases and their compositionality
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119, 2013.
- (2013) NIPS , pp. 3111-3119
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

28
- 84937959846
- Recurrent models of visual attention
- V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In NIPS, pages 2204-2212, 2014.
- (2014) NIPS , pp. 2204-2212
- Mnih, V.¹ Heess, N.² Graves, A.³

29
- 84961289992
- Glove: Global vectors for word representation
- J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. EMNLP, 12:1532-1543, 2014.
- (2014) EMNLP , vol.12 , pp. 1532-1543
- Pennington, J.¹ Socher, R.² Manning, C.D.³

30
- 1542349299
- A feedback model of visual attention
- M.W. Spratling and M. H. Johnson. A feedback model of visual attention. Journal of cognitive neuroscience, 16(2):219-237, 2004.
- (2004) Journal of Cognitive Neuroscience , vol.16 , Issue.2 , pp. 219-237
- Spratling, M.W.¹ Johnson, M.H.²

31
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104-3112, 2014.
- (2014) NIPS , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

32
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

33
- 84937843152
- Learning generative models with visual attention
- Y. Tang, N. Srivastava, and R. R. Salakhutdinov. Learning generative models with visual attention. In NIPS, pages 1808-1816, 2014.
- (2014) NIPS , pp. 1808-1816
- Tang, Y.¹ Srivastava, N.² Salakhutdinov, R.R.³

34
- 84960937383
- Lecture 6.5-rmsprop
- T. Tieleman and G. Hinton. Lecture 6.5-rmsprop, coursera: Neural networks for machine learning. 2012.
- (2012) Coursera: Neural Networks for Machine Learning.
- Tieleman, T.¹ Hinton, G.²

35
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015.
- (2015) CVPR , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

36
- 84986301177
- What Value Do Explicit High-Level Concepts Have in Vision to Language Problems?
- Q. Wu, C. Shen, A. van den Hengel, L. Liu, and A. Dick. What Value Do Explicit High-Level Concepts Have in Vision to Language Problems? In CVPR, 2016.
- (2016) CVPR
- Wu, Q.¹ Shen, C.² Hengel Den A.Van³ Liu, L.⁴ Dick, A.⁵

37
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

38
- 84959187860
- Conceptlearner: Discovering visual concepts from weakly labeled image collections
- June
- B. Zhou, V. Jagadeesh, and R. Piramuthu. Conceptlearner: Discovering visual concepts from weakly labeled image collections. In CVPR, June 2015.
- (2015) CVPR
- Zhou, B.¹ Jagadeesh, V.² Piramuthu, R.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.