SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 4565-4574

DenseCap: Fully convolutional localization networks for dense captioning

(3) Johnson, Justin a Karpathy, Andrej a Fei Fei, Li a

a Stanford University (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; CONVOLUTION; PATTERN RECOGNITION; RECURRENT NEURAL NETWORKS;

ACCURACY IMPROVEMENT; COMPUTER VISION SYSTEM; CONVOLUTIONAL NETWORKS; IMAGE CAPTIONING; NATURAL LANGUAGES; SALIENT REGIONS; SINGLE WORDS; STATE-OF-THE-ART APPROACH;

NETWORK ARCHITECTURE;

EID: 84986245786 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.494 Document Type: Conference Paper

Times cited : (1163)

References (54)

1
- 0041876117
- Matching words and pictures
- K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003.
- (2003) JMLR
- Barnard, K.¹ Duygulu, P.² Forsyth, D.³ De Freitas, N.⁴ Blei, D.M.⁵ Jordan, M.I.⁶

2
- 0142166851
- A neural probabilistic language model
- Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. A neural probabilistic language model. The Journal of Machine Learning Research, 3:1137-1155, 2003.
- (2003) The Journal of Machine Learning Research , vol.3 , pp. 1137-1155
- Bengio, Y.¹ Ducharme, R.² Vincent, P.³ Janvin, C.⁴

3
- 84952349295
- arXiv preprint arXiv:1504.00325
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

4
- 84957029470
- Mind's eye: A recurrent visual representation for image caption generation
- X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. CVPR, 2015.
- (2015) CVPR
- Chen, X.¹ Zitnick, C.L.²

5
- 85009929513
- Describing multimedia content using attention-based encoder-decoder networks
- abs/1507.01053
- K. Cho, A. C. Courville, and Y. Bengio. Describing multimedia content using attention-based encoder-decoder networks. CoRR, abs/1507.01053, 2015.
- (2015) CoRR
- Cho, K.¹ Courville, A.C.² Bengio, Y.³

6
- 84990044091
- Torch7: A matlab-like environment for machine learning
- R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, number EPFL-CONF-192376, 2011.
- (2011) BigLearn, NIPS Workshop, Number EPFL-CONF-192376
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

7
- 85107661995
- Meteor universal: Language specific translation evaluation for any target language
- M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In Proceedings of the EACL 2014 Workshop on Statistical Machine Translation, 2014.
- (2014) Proceedings of the EACL 2014 Workshop on Statistical Machine Translation
- Denkowski, M.¹ Lavie, A.²

8
- 85009912425
- arXiv preprint arXiv:1411.4389
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411.4389, 2014.
- (2014) Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

9
- 84911443425
- Scalable object detection using deep neural networks
- D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. CVPR, 2014.
- (2014) CVPR
- Erhan, D.¹ Szegedy, C.² Toshev, A.³ Anguelov, D.⁴

10
- 77951298115
- The PASCAL visual object classes (VOC) challenge
- M. Everingham, L. Van Gool, C. K. Williams, J. Winn, and A. Zisserman. The PASCAL visual object classes (VOC) challenge. International journal of computer vision, 88(2):303-338, 2010.
- (2010) International Journal of Computer Vision , vol.88 , Issue.2 , pp. 303-338
- Everingham, M.¹ Van Gool, L.² Williams, C.K.³ Winn, J.⁴ Zisserman, A.⁵

11
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

12
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. ECCV, 2010.
- (2010) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

13
- 85029359197
- Fast R-CNN
- R. Girshick. Fast R-CNN. ICCV, 2015.
- (2015) ICCV
- Girshick, R.¹

14
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR, 2014.
- (2014) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

15
- 84906979661
- arXiv preprint arXiv:1308.0850
- A. Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.
- (2013) Generating Sequences with Recurrent Neural Networks
- Graves, A.¹

16
- 84983208884
- Draw: A recurrent neural network for image generation
- K. Gregor, I. Danihelka, A. Graves, and D.Wierstra. DRAW: A recurrent neural network for image generation. ICML, 2015.
- (2015) ICML
- Gregor, K.¹ Danihelka, I.² Graves, A.³ Wierstra, D.⁴

17
- 84939247735
- Spatial pyramid pooling in deep convolutional networks for visual recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015, 2015.
- (2015) IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

18
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

19
- 84965096967
- Spatial transformer networks
- M. Jaderberg, K. Simonyan, A. Zisserman, and K. Kavukcuoglu. Spatial transformer networks. NIPS, 2015.
- (2015) NIPS
- Jaderberg, M.¹ Simonyan, K.² Zisserman, A.³ Kavukcuoglu, K.⁴

20
- 84856653718
- Learning cross-modality similarity for multinomial data
- Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. ICCV, 2011.
- (2011) ICCV
- Jia, Y.¹ Salzmann, M.² Darrell, T.³

21
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CVPR, 2015.
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

22
- 84959876313
- arXiv preprint arXiv:1506.02078
- A. Karpathy, J. Johnson, and L. Fei-Fei. Visualizing and understanding recurrent networks. arXiv preprint arXiv:1506.02078, 2015.
- (2015) Visualizing and Understanding Recurrent Networks
- Karpathy, A.¹ Johnson, J.² Fei-Fei, L.³

23
- 85083951076
- Adam: A method for stochastic optimization
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015.
- (2015) ICLR
- Kingma, D.¹ Ba, J.²

24
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
- (2015) TACL
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

25
- 84978730111
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, M. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. 2016.
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰ Bernstein, M.¹¹ Fei-Fei, L.¹²

26
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

27
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. CVPR, 2011.
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

28
- 84907331257
- Generalizing image captions for image-text parallel corpus
- Citeseer
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Generalizing image captions for image-text parallel corpus. In ACL (2), pages 790-796. Citeseer, 2013.
- (2013) ACL , vol.2 , pp. 790-796
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

29
- 0032203257
- Gradientbased learning applied to document recognition
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, 1998.
- (1998) Proceedings of the IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

30
- 85009931853
- Microsoft coco: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

31
- 84959205572
- Fully convolutional networks for semantic segmentation
- J. Long, E. Shelhamer, and T. Darrell. Fully convolutional networks for semantic segmentation. CVPR, 2015.
- (2015) CVPR
- Long, J.¹ Shelhamer, E.² Darrell, T.³

32
- 84951072975
- Explain images with multimodal recurrent neural networks
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.
- (2014) ArXiv Preprint arXiv:1410.1090
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

33
- 79959829092
- Recurrent neural network based language model
- T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, 2010.
- (2010) INTERSPEECH
- Mikolov, T.¹ Karafiát, M.² Burget, L.³ Cernockỳ, J.⁴ Khudanpur, S.⁵

34
- 84936796885
- Large scale retrieval and generation of image descriptions
- V. Ordonez, X. Han, P. Kuznetsova, G. Kulkarni, M. Mitchell, K. Yamaguchi, K. Stratos, A. Goyal, J. Dodge, A. Mensch, et al. Large scale retrieval and generation of image descriptions. International Journal of Computer Vision (IJCV), 2015.
- (2015) International Journal of Computer Vision (IJCV)
- Ordonez, V.¹ Han, X.² Kuznetsova, P.³ Kulkarni, G.⁴ Mitchell, M.⁵ Yamaguchi, K.⁶ Stratos, K.⁷ Goyal, A.⁸ Dodge, J.⁹ Mensch, A.¹⁰

35
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer imageto-sentence models
- B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer imageto-sentence models. ICCV, 2015.
- (2015) ICCV
- Plummer, B.A.¹ Wang, L.² Cervantes, C.M.³ Caicedo, J.C.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

36
- 85009891462
- qassemoquab. stnbhwd
- qassemoquab. stnbhwd. https://github.com/qassemoquab/stnbhwd, 2015.
- (2015)

37
- 84961917629
- arXiv preprint arXiv:1506.02640
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640, 2015.
- (2015) You only Look Once: Unified, Real-time Object Detection
- Redmon, J.¹ Divvala, S.² Girshick, R.³ Farhadi, A.⁴

38
- 84960980241
- Faster R-CNN: Towards real-time object detection with region proposal networks
- S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS, 2015.
- (2015) NIPS
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

39
- 84947041871
- Image net large scale visual recognition challenge
- April
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), pages 1-42, April 2015.
- (2015) International Journal of Computer Vision (IJCV) , pp. 1-42
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

40
- 85083951635
- OverFeat: Integrated recognition, localization and detection using convolutional networks
- P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. OverFeat: Integrated recognition, localization and detection using convolutional networks. ICLR, 2014.
- (2014) ICLR
- Sermanet, P.¹ Eigen, D.² Zhang, X.³ Mathieu, M.⁴ Fergus, R.⁵ LeCun, Y.⁶

41
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

42
- 77955998009
- Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
- R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. CVPR, 2010.
- (2010) CVPR
- Socher, R.¹ Fei-Fei, L.²

43
- 84964474107
- Grounded compositional semantics for finding and describing images with sentences
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. TACL, 2014.
- (2014) TACL
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

44
- 80053459857
- Generating text with recurrent neural networks
- I. Sutskever, J. Martens, and G. E. Hinton. Generating text with recurrent neural networks. ICML, 2011.
- (2011) ICML
- Sutskever, I.¹ Martens, J.² Hinton, G.E.³

45
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

46
- 84962336509
- arXiv preprint arXiv:1412.1441
- C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. arXiv preprint arXiv:1412.1441, 2014.
- (2014) Scalable, High-quality Object Detection
- Szegedy, C.¹ Reed, S.² Erhan, D.³ Anguelov, D.⁴

47
- 84957922397
- Yfcc100m: The new data in multimedia research
- B. Thomee, B. Elizalde, D. A. Shamma, K. Ni, G. Friedland, D. Poland, D. Borth, and L.-J. Li. Yfcc100m: The new data in multimedia research. Communications of the ACM, 59(2):64-73, 2016.
- (2016) Communications of the ACM , vol.59 , Issue.2 , pp. 64-73
- Thomee, B.¹ Elizalde, B.² Shamma, D.A.³ Ni, K.⁴ Friedland, G.⁵ Poland, D.⁶ Borth, D.⁷ Li, L.-J.⁸

48
- 84956980995
- Cider: Consensus-based image description evaluation
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

49
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

50
- 0000903748
- Generalization of backpropagation with application to a recurrent gas market model
- P. J. Werbos. Generalization of backpropagation with application to a recurrent gas market model. Neural Networks, 1(4):339-356, 1988.
- (1988) Neural Networks , vol.1 , Issue.4 , pp. 339-356
- Werbos, P.J.¹

51
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

52
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014.
- (2014) TACL
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

53
- 85009899017
- Visualizing and understanding convolutional networks
- M. D. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. ECCV, 2014.
- (2014) ECCV
- Zeiler, M.D.¹ Fergus, R.²

54
- 85009853104
- Edge boxes: Locating object proposals from edges
- C. L. Zitnick and P. Dollár. Edge boxes: Locating object proposals from edges. ECCV, 2014.
- (2014) ECCV
- Zitnick, C.L.¹ Dollár, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.