SCOPUS 정보 검색 플랫폼

MM 2016 - Proceedings of the 2016 ACM Multimedia Conference

Volumn , Issue , 2016, Pages 357-361

Attention-based LSTM with semantic consistency for videos captioning

(6) Guo, Zhao a Gao, Lianli a Song, Jingkuan b Xu, Xing a Shao, Jie a Shen, Heng Tao a,c

a UNIVERSITY OF ELECTRONIC SCIENCE AND TECHNOLOGY OF CHINA (China)

b Columbia University ^* (United States)

c UNIVERSITY OF QUEENSLAND (Australia)

Author keywords

Attention mechanism; LSTM; Multimodal embedding; Semantic consistence; Video description

Indexed keywords

NEURAL NETWORKS;

ATTENTION MECHANISMS; CONVOLUTIONAL NEURAL NETWORK; LONG SHORT TERM MEMORY; LSTM; MULTI-MODAL; SEMANTIC CONSISTENCY; STATIC REPRESENTATION; VIDEO DESCRIPTION;

SEMANTICS;

EID: 84994560125 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2964284.2967242 Document Type: Conference Paper

Times cited : (57)

References (27)

1
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, volume 29, pages 65-72, 2005.
- (2005) Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation And/or Summarization , vol.29 , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

2
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- Association for Computational Linguistics
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, pages 190-200. Association for Computational Linguistics, 2011.
- (2011) ACL , pp. 190-200
- Chen, D.L.¹ Dolan, W.B.²

3
- 84944115859
- arXiv preprint arXiv: 1411.5654
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. arXiv preprint arXiv:1411.5654, 2014.
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

4
- 84946763507
- Describing multimedia content using attention-based encoder-decoder networks
- K. Cho, A. Courville, and Y. Bengio. Describing multimedia content using attention-based encoder-decoder networks. Multimedia, IEEE Transactions on, 17(11):1875-1886, 2015.
- (2015) Multimedia IEEE Transactions on , vol.17 , Issue.11 , pp. 1875-1886
- Cho, K.¹ Courville, A.² Bengio, Y.³

5
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, T. Darrell, and K. Saenko. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, pages 2625-2634, 2015.
- (2015) CVPR , pp. 2625-2634
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Darrell, T.⁶ Saenko, K.⁷

6
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015.
- (2015) CVPR , pp. 1473-1482
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰

7
- 84959233699
- Optimal graph learning with partial tags and multiple features for image and video annotation
- L. Gao, J. Song, F. Nie, Y. Yan, N. Sebe, and H. T. Shen. Optimal graph learning with partial tags and multiple features for image and video annotation. In CVPR, pages 4371-4379, 2015.
- (2015) CVPR , pp. 4371-4379
- Gao, L.¹ Song, J.² Nie, F.³ Yan, Y.⁴ Sebe, N.⁵ Shen, H.T.⁶

8
- 84994636856
- Graph-without-cut: An ideal graph learning for image segmentation
- L. Gao, J. Song, F. Nie, F. Zou, N. Sebe, and H. T. Shen. Graph-without-cut: An ideal graph learning for image segmentation. In AAAI, pages 1188-1194, 2016.
- (2016) AAAI , pp. 1188-1194
- Gao, L.¹ Song, J.² Nie, F.³ Zou, F.⁴ Sebe, N.⁵ Shen, H.T.⁶

9
- 84994636831
- arXiv preprint arXiv: 1509.04942
- X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding long-short term memory for image caption generation. arXiv preprint arXiv:1509.04942, 2015.
- (2015) Guiding Long-short Term Memory for Image Caption Generation
- Jia, X.¹ Gavves, E.² Fernando, B.³ Tuytelaars, T.⁴

10
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and F. F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, pages 1889-1897, 2014.
- (2014) NIPS , pp. 1889-1897
- Karpathy, A.¹ Joulin, A.² Li, F.F.F.³

11
- 84962850062
- Summarization-based video caption via deep neural networks
- ACM
- G. Li, S. Ma, and Y. Han. Summarization-based video caption via deep neural networks. In ACM Multimedia, pages 1191-1194. ACM, 2015.
- (2015) ACM Multimedia , pp. 1191-1194
- Li, G.¹ Ma, S.² Han, Y.³

12
- 84951072975
- arXiv preprint arXiv: 1410.1090
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. arXiv preprint arXiv:1410.1090, 2014.
- (2014) Explain Images with Multimodal Recurrent Neural Networks
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

13
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- Association for Computational Linguistics
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311-318. Association for Computational Linguistics, 2002.
- (2002) ACL , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

14
- 84888343222
- Effective multiple feature hashing for large-scale near-duplicate video retrieval
- J. Song, Y. Yang, Z. Huang, H. T. Shen, and J. Luo. Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimedia, 15(8):1997-2008, 2013.
- (2013) IEEE Trans. Multimedia , vol.15 , Issue.8 , pp. 1997-2008
- Song, J.¹ Yang, Y.² Huang, Z.³ Shen, H.T.⁴ Luo, J.⁵

15
- 84880548516
- Inter-media hashing for large-scale retrieval from heterogeneous data sources
- J. Song, Y. Yang, Y. Yang, Z. Huang, and H. T. Shen. Inter-media hashing for large-scale retrieval from heterogeneous data sources. In SIGMOD, pages 785-796, 2013.
- (2013) SIGMOD , pp. 785-796
- Song, J.¹ Yang, Y.² Yang, Y.³ Huang, Z.⁴ Shen, H.T.⁵

16
- 84884955228
- arXiv preprint arXiv: 1212.0402
- K. Soomro, A. R. Zamir, and M. Shah. Ucf101: A dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402, 2012.
- (2012) Ucf101: A Dataset of 101 Human Actions Classes from Videos in the Wild
- Soomro, K.¹ Zamir, A.R.² Shah, M.³

17
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, pages 1-9, 2015.
- (2015) CVPR , pp. 1-9
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

18
- 84965114137
- arXiv preprint arXiv: 1412.0767
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. arXiv preprint arXiv:1412.0767, 2014.
- (2014) Learning Spatiotemporal Features with 3d Convolutional Networks
- Tran, D.¹ Bourdev, L.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

19
- 84956980995
- Cider: Consensus-based image description evaluation
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, pages 4566-4575, 2015.
- (2015) CVPR , pp. 4566-4575
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

20
- 84973882730
- Sequence to sequence-video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, pages 4534-4542, 2015.
- (2015) ICCV , pp. 4534-4542
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

21
- 84944069490
- arXiv preprint arXiv: 1412.4729
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. arXiv preprint arXiv:1412.4729, 2014.
- (2014) Translating Videos to Natural Language Using Deep Recurrent Neural Networks
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

22
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015.
- (2015) CVPR , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

23
- 84986285188
- arXiv preprint arXiv: 1506.01144
- Q. Wu, C. Shen, A. V. D. Hengel, L. Liu, and A. Dick. Image captioning with an intermediate attributes layer. arXiv preprint arXiv:1506.01144, 2015.
- (2015) Image Captioning with An Intermediate Attributes Layer
- Wu, Q.¹ Shen, C.² Hengel, A.V.D.³ Liu, L.⁴ Dick, A.⁵

24
- 84939821074
- arXiv preprint arXiv: 1502.03044
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

25
- 84940762015
- Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
- Citeseer
- R. Xu, C. Xiong, W. Chen, and J. J. Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI, pages 2346-2352. Citeseer, 2015.
- (2015) AAAI , pp. 2346-2352
- Xu, R.¹ Xiong, C.² Chen, W.³ Corso, J.J.⁴

26
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, pages 4507-4515, 2015.
- (2015) ICCV , pp. 4507-4515
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

27
- 84969736572
- arXiv preprint arXiv: 1212.5701
- M. D. Zeiler. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
- (2012) Adadelta: An Adaptive Learning Rate Method
- Zeiler, M.D.¹

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.