SCOPUS 정보 검색 플랫폼 - 논문 보기

메뉴 건너뛰기

MM 2016 - Proceedings of the 2016 ACM Multimedia Conference

Volumn , Issue , 2016, Pages 1073-1076

Frame-and segment-level features and candidate pool evaluation for video caption generation

(2) Shetty, Rakshith a Laaksonen, Jorma a

a AALTO UNIVERSITY (Finland)

Author keywords

[No Author keywords available]

Indexed keywords

AUTOMATIC EVALUATION; DIVERSE FEATURES; DOMAIN EXPERTS; ENCODER-DECODER; HUMAN EVALUATION; NOCV1; VIDEO CAPTIONS; VIDEO CONTENTS; VIDEO FEATURES;

EID: 84994666053 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2964284.2984062 Document Type: Conference Paper

Times cited : (106)

References (17)

1
- 85198028989
- Imagenet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR, 2009.
- (2009) CVPR
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

2
- 84938217896
- arXiv.org: 1403. 1840
- Y. Gong, L. Wang, R. Guo, and S. Lazebnik. Multi-scale orderless pooling of deep convolutional activation features. arXiv.org:1403.1840, 2014.
- (2014) Multi-scale Orderless Pooling of Deep Convolutional Activation Features
- Gong, Y.¹ Wang, L.² Guo, R.³ Lazebnik, S.⁴

3
- 84958589374
- arXiv, abs/1512.03385
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv, abs/1512.03385, 2015.
- (2015) Deep Residual Learning for Image Recognition
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

4
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

5
- 84959211977
- A dataset for movie description
- A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele. A dataset for movie description. In CVPR, 2015.
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

6
- 84994600590
- abs/1605.03705
- A. Rohrbach, A. Torabi, M. Rohrbach, N. Tandon, C. J. Pal, H. Larochelle, A. C. Courville, and B. Schiele. Movie description. arXiv, abs/1605.03705, 2016.
- (2016) Movie Description. ArXiv
- Rohrbach, A.¹ Torabi, A.² Rohrbach, M.³ Tandon, N.⁴ Pal, C.J.⁵ Larochelle, H.⁶ Courville, A.C.⁷ Schiele, B.⁸

7
- 84994600605
- (accessed July 1, 2016)
- Y. C. M. Ruggero, Ronchi, and T.-Y. Lin. Microsoft COCO 1st captioning challenge. http://lsun.cs.princeton.edu/slides/caption open.pdf, 2016 (accessed July 1, 2016).
- (2016) Microsoft COCO 1st Captioning Challenge
- Ruggero, Y.C.M.¹ Ronchi² Lin, T.-Y.³

8
- 84977650097
- Video captioning with recurrent networks based on frame-and video-level features and visual content classification
- arXiv abs/1512.02949
- R. Shetty and J. Laaksonen. Video captioning with recurrent networks based on frame-and video-level features and visual content classification. ICCV Workshop on LSMDC, arXiv abs/1512.02949, 2015.
- (2015) ICCV Workshop on LSMDC
- Shetty, R.¹ Laaksonen, J.²

9
- 84964983441
- arXiv, abs/1409.4842
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv, abs/1409.4842, 2014.
- (2014) Going Deeper with Convolutions
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

10
- 84959246420
- arXiv, abs/1503.01070
- A. Torabi, P. Chris, L. Hugo, and C. Aaron. Using descriptive video services to create a large data source for video annotation research. arXiv, abs/1503.01070, 2015.
- (2015) Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
- Torabi, A.¹ Chris, P.² Hugo, L.³ Aaron, C.⁴

11
- 84969504307
- arXiv, abs/1412.0767
- D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: generic features for video analysis. arXiv, abs/1412.0767, 2014.
- (2014) C3D: Generic Features for Video Analysis
- Tran, D.¹ Bourdev, L.D.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

12
- 84956980995
- Cider: Consensus-based image description evaluation
- June
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, June 2015.
- (2015) CVPR
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

13
- 84973882730
- Sequence to sequence-video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In CVPR, 2015.
- (2015) CVPR
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

14
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

15
- 80052877143
- Action recognition by dense trajectories
- H. Wang, A. Kläser, C. Schmid, and C. Liu. Action recognition by dense trajectories. In CVPR, 2011.
- (2011) CVPR
- Wang, H.¹ Kläser, A.² Schmid, C.³ Liu, C.⁴

16
- 84898805910
- Action recognition with improved trajectories
- H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
- (2013) ICCV
- Wang, H.¹ Schmid, C.²

17
- 84986260127
- MSR-VTT: A large video description dataset for bridging video and language
- J. Xu, T. Mei, T. Yao, and Y. Rui. MSR-VTT: A large video description dataset for bridging video and language. In CVPR, 2016.
- (2016) CVPR
- Xu, J.¹ Mei, T.² Yao, T.³ Rui, Y.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.