SCOPUS 정보 검색 플랫폼

Iv and L-MM 2016 - Proceedings of the 2016 ACM Workshop on Vision and Language Integration Meets Multimedia Fusion, co-located with ACM Multimedia 2016

Volumn , Issue , 2016, Pages 1-8

Exploiting scene context for image captioning

(3) Shetty, Rakshith a Tavakoli, Hamed R a Laaksonen, Jorma a

a AALTO UNIVERSITY (Finland)

Author keywords

[No Author keywords available]

Indexed keywords

BENCHMARKING; COMPUTATIONAL LINGUISTICS; NEURAL NETWORKS;

ATTENTION MECHANISMS; CONTEXT FEATURES; CONVOLUTIONAL NEURAL NETWORK; IMAGE CAPTIONING; LEARNING TECHNIQUES; LONG SHORT TERM MEMORY; STATE-OF-THE-ART PERFORMANCE; VISUAL FEATURE;

VISUAL LANGUAGES;

EID: 84995460741 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2983563.2983571 Document Type: Conference Paper

Times cited : (9)

References (47)

1
- 84995425614
- Deep Learning and Unsupervised Feature Learning NIPS Workshop
- F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS Workshop, 2012.
- (2012) Theano: New Features and Speed Improvements
- Bastien, F.¹ Lamblin, P.² Pascanu, R.³ Bergstra, J.⁴ Goodfellow, I.J.⁵ Bergeron, A.⁶ Bouchard, N.⁷ Bengio, Y.⁸

2
- 85009859594
- arXiv, abs/1601.03896
- R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, and B. Plank. Automatic description generation from images: A survey. arXiv, abs/1601.03896, 2016.
- (2016) Automatic Description Generation from Images: A Survey
- Bernardi, R.¹ Cakici, R.² Elliott, D.³ Erdem, A.⁴ Erdem, E.⁵ Ikizler-Cinbis, N.⁶ Keller, F.⁷ Muscat, A.⁸ Plank, B.⁹

3
- 84952349295
- arXiv, abs/1504.00325
- X. Chen, T.-Y. L. Hao Fang, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. arXiv, abs/1504.00325, 2015.
- (2015) Microsoft COCO Captions: Data Collection and Evaluation Server
- Chen, X.¹ Hao Fang, T.-Y.L.² Vedantam, R.³ Gupta, S.⁴ Dollar, P.⁵ Zitnick, C.L.⁶

4
- 84995523967
- 1st captioning challenge slides
- Y. Cui, M. Ruggero Ronchi, and T.-Y. Lin. 1st captioning challenge slides. Large-scale Scene UNderstanding Workshop, 2015.
- (2015) Large-scale Scene UNderstanding Workshop
- Cui, Y.¹ Ruggero Ronchi, M.² Lin, T.-Y.³

5
- 85198028989
- ImageNet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
- (2009) CVPR
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

6
- 85107661995
- Meteor universal: Language specific translation evaluation for any target language
- M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL, 2014.
- (2014) EACL
- Denkowski, M.¹ Lavie, A.²

7
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

8
- 84919881041
- DeCAF: A deep convolutional activation feature for generic visual recognition
- J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. DeCAF: A deep convolutional activation feature for generic visual recognition. In ICML, 2014.
- (2014) ICML
- Donahue, J.¹ Jia, Y.² Vinyals, O.³ Hoffman, J.⁴ Zhang, N.⁵ Tzeng, E.⁶ Darrell, T.⁷

9
- 84943812736
- Describing images using inferred visual dependency representations
- D. Elliott and A. P. de Vries. Describing images using inferred visual dependency representations. In ACL, 2015.
- (2015) ACL
- Elliott, D.¹ De Vries, A.P.²

10
- 84906929591
- Image description using visual dependency representations
- D. Elliott and F. Keller. Image description using visual dependency representations. In EMNLP, 2013.
- (2013) EMNLP
- Elliott, D.¹ Keller, F.²

11
- 84921069139
- The pascal visual object classes challenge: A retrospective
- M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. IJCV, 2015.
- (2015) IJCV
- Everingham, M.¹ Eslami, S.M.A.² Van Gool, L.³ Williams, C.K.I.⁴ Winn, J.⁵ Zisserman, A.⁶

12
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. Platt, L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollar, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰ Zitnick, L.¹¹ Zweig, G.¹²

13
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
- (2010) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

14
- 84938217896
- arXiv, abs/1403.1840
- Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. arXiv, abs/1403.1840, 2014.
- (2014) Multi-scale Orderless Pooling of Deep Convolutional Activation Features
- Gong, Y.¹ Wang, L.² Guo, R.³ Lazebnik, S.⁴

15
- 84986274465
- Deep residual learning for image recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

16
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 1997.
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

17
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 2013.
- (2013) JAIR
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

18
- 84913555165
- arXiv, abs/1408.5093
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv, abs/1408.5093, 2014.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

19
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

20
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, 2014.
- (2014) NIPS
- Karpathy, A.¹ Joulin, A.² Fei-Fei, L.³

21
- 84961376850
- Convolutional neural networks for sentence classification
- Y. Kim. Convolutional neural networks for sentence classification. In EMNLP, 2014.
- (2014) EMNLP
- Kim, Y.¹

22
- 84919921461
- Multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language models. In ICML, 2014.
- (2014) ICML
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

23
- 84944113729
- arXiv, abs/1411.2539, 2014
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. arXiv, abs/1411.2539, 2014.
- Unifying Visual-semantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

24
- 84913582676
- Convolutional network features for scene recognition
- M. Koskela and J. Laaksonen. Convolutional network features for scene recognition. In ACMMM, 2014.
- (2014) ACMMM
- Koskela, M.¹ Laaksonen, J.²

25
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In CoNLL, 2011.
- (2011) CoNLL
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

26
- 26944501715
- Rouge: A package for automatic evaluation of summaries
- S. S. Marie-Francine Moens, editor
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In S. S. Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, 2004.
- (2004) Text Summarization Branches Out: Proceedings of the ACL-04 Workshop
- Lin, C.-Y.¹

27
- 84937834115
- Microsoft COCO: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

28
- 84898956512
- Distributed representations of words and phrases and their compositionality
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
- (2013) NIPS
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

29
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

30
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

31
- 84961289992
- Glove: Global vectors for word representation
- J. Pennington, R. Socher, and C. D. Manning. Glove: Global vectors for word representation. In EMNLP, 2014.
- (2014) EMNLP
- Pennington, J.¹ Socher, R.² Manning, C.D.³

32
- 85090348677
- Collecting image annotations using amazon's mechanical turk
- C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier. Collecting image annotations using amazon's mechanical turk. In NAACL HLT, 2010.
- (2010) NAACL HLT
- Rashtchian, C.¹ Young, P.² Hodosh, M.³ Hockenmaier, J.⁴

33
- 84955283951
- arXiv, abs/1506.01497
- S. Ren, K. He, R. Girshick, and J. Sun. Faster R-CNN: Towards real-time object detection with region proposal networks. arXiv, abs/1506.01497, 2015.
- (2015) Faster R-CNN: Towards Real-time Object Detection with Region Proposal Networks
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

34
- 84908537903
- Cnn features off-the-shelf: An astounding baseline for recognition
- A. Sharif Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: An astounding baseline for recognition. In CVPR Workshops, 2014.
- (2014) CVPR Workshops
- Sharif Razavian, A.¹ Azizpour, H.² Sullivan, J.³ Carlsson, S.⁴

35
- 84977650097
- Video captioning with recurrent networks based on frame-and video-level features and visual content classification
- abs/1512.02949
- R. Shetty and J. Laaksonen. Video captioning with recurrent networks based on frame-and video-level features and visual content classification. ICCV Workshop on LSMDC, abs/1512.02949, 2015.
- (2015) ICCV Workshop on LSMDC
- Shetty, R.¹ Laaksonen, J.²

36
- 84994666053
- Frame-and segment-level features and candidate pool evaluation for video caption generation
- R. Shetty and J. Laaksonen. Frame-and segment-level features and candidate pool evaluation for video caption generation. In ACMMM Multimedia Grand Challenge Solutions, 2016.
- (2016) ACMMM Multimedia Grand Challenge Solutions
- Shetty, R.¹ Laaksonen, J.²

37
- 84964983441
- arXiv, abs/1409.4842
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv, abs/1409.4842, 2014.
- (2014) Going Deeper with Convolutions
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

38
- 84893343292
- Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude
- T. Tieleman and G. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. Coursera: Neural Networks for Machine Learning, 2012.
- (2012) Coursera: Neural Networks for Machine Learning
- Tieleman, T.¹ Hinton, G.²

39
- 84956980995
- CIDEr: Consensus-based image description evaluation
- R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. In CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

40
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

41
- 84924067462
- Sun database: Exploring a large collection of scene categories
- J. Xiao, K. A. Ehinger, J. Hays, A. Torralba, and A. Oliva. Sun database: Exploring a large collection of scene categories. IJCV, 2014.
- (2014) IJCV
- Xiao, J.¹ Ehinger, K.A.² Hays, J.³ Torralba, A.⁴ Oliva, A.⁵

42
- 77955988947
- SUN database: Large-scale scene recognition from abbey to zoo
- J. Xiao, J. Hays, K. Ehinger, A. Oliva, and A. Torralba. SUN database: Large-scale scene recognition from abbey to zoo. In CVPR, 2010.
- (2010) CVPR
- Xiao, J.¹ Hays, J.² Ehinger, K.³ Oliva, A.⁴ Torralba, A.⁵

43
- 84939821074
- arXiv, abs/1502.03044
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv, abs/1502.03044, 2015.
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

44
- 84995439884
- arXiv, abs/1603.03925
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. arXiv, abs/1603.03925, 2016.
- (2016) Image Captioning with Semantic Attention
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

45
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014.
- (2014) TACL
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

46
- 84944053926
- arXiv, abs/1409.2329
- W. Zaremba, I. Sutskever, and O. Vinyals. Recurrent neural network regularization. arXiv, abs/1409.2329, 2014.
- (2014) Recurrent Neural Network Regularization
- Zaremba, W.¹ Sutskever, I.² Vinyals, O.³

47
- 84937964578
- Learning deep features for scene recognition using places database
- B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning deep features for scene recognition using places database. In NIPS, 2014.
- (2014) NIPS
- Zhou, B.¹ Lapedriza, A.² Xiao, J.³ Torralba, A.⁴ Oliva, A.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.