메뉴 건너뛰기




Volumn 19, Issue 9, 2017, Pages 2045-2055

Video Captioning with Attention-Based LSTM and Semantic Consistency

Author keywords

Attention mechanism; embedding; long short term memory (LSTM); video captioning

Indexed keywords

LONG SHORT-TERM MEMORY; NEURAL NETWORKS;

EID: 85028835252     PISSN: 15209210     EISSN: None     Source Type: Journal    
DOI: 10.1109/TMM.2017.2729019     Document Type: Article
Times cited : (654)

References (52)
  • 1
    • 84991618952 scopus 로고    scopus 로고
    • Optimized graph learning using partial tags and multiple features for image and video annotation
    • Nov.
    • J. Song et al., "Optimized graph learning using partial tags and multiple features for image and video annotation, " IEEE Trans. Image Process., vol. 25, no. 11, pp. 4999-5011, Nov. 2016.
    • (2016) IEEE Trans. Image Process. , vol.25 , Issue.11 , pp. 4999-5011
    • Song, J.1
  • 2
    • 84897584700 scopus 로고    scopus 로고
    • Video-to-shot tag propagation by graph sparse group lasso
    • Apr.
    • X. Zhu, Z. Huang, J. Cui, and H. T. Shen, "Video-to-shot tag propagation by graph sparse group lasso, " IEEE Trans. Multimedia, vol. 15, no. 3, pp. 633-646, Apr. 2013.
    • (2013) IEEE Trans. Multimedia , vol.15 , Issue.3 , pp. 633-646
    • Zhu, X.1    Huang, Z.2    Cui, J.3    Shen, H.T.4
  • 3
    • 85027941917 scopus 로고    scopus 로고
    • Efficient motion and disparity estimation optimization for low complexity multiview video coding
    • Jun.
    • Z. Pan, Y. Zhang, and S. Kwong, "Efficient motion and disparity estimation optimization for low complexity multiview video coding, " IEEE Trans. Broadcast., vol. 61, no. 2, pp. 166-176, Jun. 2015.
    • (2015) IEEE Trans. Broadcast. , vol.61 , Issue.2 , pp. 166-176
    • Pan, Z.1    Zhang, Y.2    Kwong, S.3
  • 6
    • 85029549046 scopus 로고    scopus 로고
    • Quantization-based hashing: A general framework for scalable image and video retrieval
    • J. Song, L. Gao, L. Liu, X. Zhu, and N. Sebe, "Quantization-based hashing: A general framework for scalable image and video retrieval, " Pattern Recog., 2017.
    • (2017) Pattern Recog.
    • Song, J.1    Gao, L.2    Liu, L.3    Zhu, X.4    Sebe, N.5
  • 7
    • 84986296735 scopus 로고    scopus 로고
    • You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images
    • C. Gan, T. Yao, K. Yang, Y. Yang, and T. Mei, "You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 923-932.
    • (2016) Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. , pp. 923-932
    • Gan, C.1    Yao, T.2    Yang, K.3    Yang, Y.4    Mei, T.5
  • 9
    • 84994560125 scopus 로고    scopus 로고
    • Attention-based LSTM with semantic consistency for videos captioning
    • Z. Guo et al, "Attention-based LSTM with semantic consistency for videos captioning, " in Proc. ACM Multimedia Conf., 2016, pp. 357-361.
    • (2016) Proc. ACM Multimedia Conf. , pp. 357-361
    • Guo, Z.1
  • 13
    • 84970002232 scopus 로고    scopus 로고
    • Show, attend and tell: Neural image caption generation with visual attention
    • K. Xu et al., "Show, attend and tell: Neural image caption generation with visual attention, " in Proc. Int. Conf. Mach. Learn., 2015, pp. 2048-2057.
    • (2015) Proc. Int. Conf. Mach. Learn. , pp. 2048-2057
    • Xu, K.1
  • 16
    • 84959876769 scopus 로고    scopus 로고
    • Translating videos to natural language using deep recurrent neural networks
    • Denver, Colorado, USA, May 31-Jun. 5
    • S. Venugopalan et al, "Translating videos to natural language using deep recurrent neural networks, " in Proc. Conf. North Amer. Chapter Assoc, Comput. Linguistics, Human Lang. Technol, Denver, Colorado, USA, May 31-Jun. 5, 2015, pp. 1494-1504.
    • (2015) Proc. Conf. North Amer. Chapter Assoc, Comput. Linguistics, Human Lang. Technol , pp. 1494-1504
    • Venugopalan, S.1
  • 18
    • 84973884896 scopus 로고    scopus 로고
    • Describing videos by exploiting temporal structure
    • L. Yao et al, "Describing videos by exploiting temporal structure, " in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 4507-4515.
    • (2015) Proc. IEEE Int. Conf. Comput. Vis., Dec. , pp. 4507-4515
    • Yao, L.1
  • 19
    • 84962850062 scopus 로고    scopus 로고
    • Summarization-based video caption via deep neural networks
    • G. Li, S. Ma, and Y Han, "Summarization-based video caption via deep neural networks, " in Proc. ACM Multimedia Conf, 2015, pp. 1191-1194.
    • (2015) Proc. ACM Multimedia Conf , pp. 1191-1194
    • Li, G.1    Ma, S.2    Han, Y.3
  • 21
    • 0031573117 scopus 로고    scopus 로고
    • Long short-term memory
    • S. Hochreiter and J. Schmidhuber, "Long short-term memory, " Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
    • (1997) Neural Comput. , vol.9 , Issue.8 , pp. 1735-1780
    • Hochreiter, S.1    Schmidhuber, J.2
  • 23
    • 84994636856 scopus 로고    scopus 로고
    • Graph-without-cut: An ideal graph learning for image segmentation
    • L. Gao et al, "Graph-without-cut: An ideal graph learning for image segmentation, " in Proc. AAAI, 2016, pp. 1188-1194.
    • (2016) Proc. AAAI , pp. 1188-1194
    • Gao, L.1
  • 24
    • 84994613586 scopus 로고    scopus 로고
    • Joint graph learning and video segmentation via multiple cues and topology calibration
    • J. Song et al., "Joint graph learning and video segmentation via multiple cues and topology calibration, " in Proc. ACM Multimedia Conf, 2016, pp. 831-840.
    • (2016) Proc. ACM Multimedia Conf , pp. 831-840
    • Song, J.1
  • 25
    • 84888343222 scopus 로고    scopus 로고
    • Effective multiple feature hashing for large-scale near-duplicate video retrieval
    • Dec.
    • J. Song, Y Yang, Z. Huang, H. T. Shen, and J. Luo, "Effective multiple feature hashing for large-scale near-duplicate video retrieval, " IEEE Trans. Multimedia, vol. 15, no. 8, pp. 1997-2008, Dec. 2013.
    • (2013) IEEE Trans. Multimedia , vol.15 , Issue.8 , pp. 1997-2008
    • Song, J.1    Yang, Y.2    Huang, Z.3    Shen, H.T.4    Luo, J.5
  • 26
    • 84973863239 scopus 로고    scopus 로고
    • Human action recognition using factorized spatio-temporal convolutional networks
    • L. Sun, K. Jia, D.-Y Yeung, and B. E. Shi, "Human action recognition using factorized spatio-temporal convolutional networks, " in Proc. IEEE Int. Conf. Comput. Vis., Dec. 2015, pp. 4597-4605.
    • (2015) Proc. IEEE Int. Conf. Comput. Vis., Dec. , pp. 4597-4605
    • Sun, L.1    Jia, K.2    Yeung, D.-Y.3    Shi, B.E.4
  • 28
    • 84959236502 scopus 로고    scopus 로고
    • Long-term recurrent convolutional networks for visual recognition and description
    • J. Donahue et al., "Long-term recurrent convolutional networks for visual recognition and description, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2015, pp. 2625-2634.
    • (2015) Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. , pp. 2625-2634
    • Donahue, J.1
  • 39
    • 85012903188 scopus 로고    scopus 로고
    • Cross-heterogeneous-database age estimation through correlation representation learning
    • Q. Tian and S. Chen, "Cross-heterogeneous-database age estimation through correlation representation learning, " Neurocomputing, vol. 238, pp. 286-295, 2017.
    • (2017) Neurocomputing , vol.238 , pp. 286-295
    • Tian, Q.1    Chen, S.2
  • 41
    • 84986290372 scopus 로고    scopus 로고
    • Hierarchical recurrent neural encoder for video representation with application to captioning
    • P. Pan, Z. Xu, Y. Yang, F. Wu, and Y. Zhuang, "Hierarchical recurrent neural encoder for video representation with application to captioning, " in Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. 2016, pp. 1029-1038.
    • (2016) Proc. IEEE Conf. Comput. Vis. Pattern Recog., Jun. , pp. 1029-1038
    • Pan, P.1    Xu, Z.2    Yang, Y.3    Wu, F.4    Zhuang, Y.5
  • 43
    • 84990990335 scopus 로고    scopus 로고
    • Fast reference frame selection based on content similarity for low complexity HEVC encoder
    • Z. Pan, P. Jin, J. Lei, Y. Zhang, X. Sun, and S. Kwong, "Fast reference frame selection based on content similarity for low complexity HEVC encoder, " J. Vis. Commun. Image Represent., vol. 40, pp. 516-524, 2016.
    • (2016) J. Vis. Commun. Image Represent. , vol.40 , pp. 516-524
    • Pan, Z.1    Jin, P.2    Lei, J.3    Zhang, Y.4    Sun, X.5    Kwong, S.6
  • 44
    • 0028392483 scopus 로고
    • Learning long-term dependencies with gradient descent is difficult
    • Mar.
    • Y. Bengio, P. Simard, and P. Frasconi, "Learning long-term dependencies with gradient descent is difficult, " IEEE Trans. Neural Netw., vol. 5, no. 2, pp. 157-166, Mar. 1994.
    • (1994) IEEE Trans. Neural Netw. , vol.5 , Issue.2 , pp. 157-166
    • Bengio, Y.1    Simard, P.2    Frasconi, P.3
  • 51
    • 84940762015 scopus 로고    scopus 로고
    • Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
    • R. Xu, C. Xiong, W. Chen, and J. J. Corso, "Jointly modeling deep video and compositional text to bridge vision and language in a unified framework, " in Proc. Assoc. Adv. Artif. Intell., 2015, pp. 2346-2352.
    • (2015) Proc. Assoc. Adv. Artif. Intell. , pp. 2346-2352
    • Xu, R.1    Xiong, C.2    Chen, W.3    Corso, J.J.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.