SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 984-992

Video captioning with transferred semantic attributes

(4) Pan, Yingwei a Yao, Ting a Li, Houqiang a Mei, Tao b

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b MICROSOFT RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; SEMANTICS;

CONVOLUTIONAL NEURAL NETWORK; DEEP ARCHITECTURES; NATURAL LANGUAGES; RECENT PROGRESS; RECURRENT NEURAL NETWORK (RNNS); SEMANTIC ATTRIBUTE; STATE-OF-THE-ART METHODS; VISION COMMUNITIES;

LONG SHORT-TERM MEMORY;

EID: 85029372390 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.111 Document Type: Conference Paper

Times cited : (308)

References (38)

1
- 85083954507
- Delving deeper into convolutional networks for learning video representations
- N. Ballas, L. Yao, C. Pal, and A. Courville. Delving deeper into convolutional networks for learning video representations. In ICLR, 2016.
- (2016) ICLR
- Ballas, N.¹ Yao, L.² Pal, C.³ Courville, A.⁴

2
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In ACL workshop, 2005.
- (2005) ACL Workshop
- Banerjee, S.¹ Lavie, A.²

3
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
- (2011) ACL
- Chen, D.L.¹ Dolan, W.B.²

4
- 84952349295
- arXiv preprint
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
- (2015) Microsoft COCO Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollár, P.⁶ Zitnick, C.L.⁷

5
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

6
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, et al. From captions to visual concepts and back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵

7
- 84986296735
- You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images
- C. Gan, T. Yao, K. Yang, Y. Yang, and T. Mei. You lead, we exceed: Labor-free video concept learning by jointly exploiting web videos and images. In CVPR, 2016.
- (2016) CVPR
- Gan, C.¹ Yao, T.² Yang, K.³ Yang, Y.⁴ Mei, T.⁵

8
- 84898773262
- Y-outube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
- S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko. Y-outube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, 2013.
- (2013) ICCV
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

9
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 1997.
- (1997) Neural Computation
- Hochreiter, S.¹ Schmidhuber, J.²

10
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

11
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. IJCV, 2002.
- (2002) IJCV
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

12
- 84937834115
- Microsoft coco: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ra-manan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ra-Manan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

13
- 84990842753
- arXiv preprint
- P. Pan, Z. Xu, Y. Yang, F. Wu, and Y. Zhuang. Hierarchical recurrent neural encoder for video representation with application to captioning. arXiv preprint arXiv:1511.03476, 2015.
- (2015) Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning
- Pan, P.¹ Xu, Z.² Yang, Y.³ Wu, F.⁴ Zhuang, Y.⁵

14
- 85006171438
- Learning deep intrinsic video representation by exploring temporal coherence and graph structure
- Y. Pan, Y. Li, T. Yao, T. Mei, H. Li, and Y. Rui. Learning deep intrinsic video representation by exploring temporal coherence and graph structure. In IJCAI, 2016.
- (2016) IJCAI
- Pan, Y.¹ Li, Y.² Yao, T.³ Mei, T.⁴ Li, H.⁵ Rui, Y.⁶

15
- 84986332702
- Jointly modeling embedding and translation to bridge video and language
- Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui. Jointly modeling embedding and translation to bridge video and language. In CVPR, 2016.
- (2016) CVPR
- Pan, Y.¹ Mei, T.² Yao, T.³ Li, H.⁴ Rui, Y.⁵

16
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

17
- 84856670612
- Relative attributes
- D. Parikh and K. Grauman. Relative attributes. In ICCV, 2011.
- (2011) ICCV
- Parikh, D.¹ Grauman, K.²

18
- 84973887740
- The long-short story of movie description
- A. Rohrbach, M. Rohrbach, and B. Schiele. The long-short story of movie description. In GCPR, 2015.
- (2015) GCPR
- Rohrbach, A.¹ Rohrbach, M.² Schiele, B.³

19
- 84959211977
- A dataset for movie description
- A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele. A dataset for movie description. In CVPR, 2015.
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

20
- 84898775239
- Translating video content to natural language descriptions
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
- (2013) ICCV
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

21
- 84947041871
- ImageNet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
- (2015) IJCV
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

22
- 84977650097
- Video captioning with recurrent networks based on frame-and video-level features and visual content classification
- R. Shetty and J. Laaksonen. Video captioning with recurrent networks based on frame-and video-level features and visual content classification. In ICCV workshop, 2015.
- (2015) ICCV Workshop
- Shetty, R.¹ Laaksonen, J.²

23
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

24
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
- (2014) NIPS
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

25
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

26
- 84959246420
- arXiv preprint
- A. Torabi, C. Pal, H. Larochelle, and A. Courville. Using descriptive video services to create a large data source for video annotation research. arXiv preprint arXiv:1503.01070, 2015.
- (2015) Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
- Torabi, A.¹ Pal, C.² Larochelle, H.³ Courville, A.⁴

27
- 84973865953
- Learning spatiotemporal features with 3d convolutional networks
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
- (2015) ICCV
- Tran, D.¹ Bourdev, L.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

28
- 84956980995
- Cider: Consensus-based image description evaluation
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

29
- 84973882730
- Sequence to sequence - Video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence - video to text. In ICCV, 2015.
- (2015) ICCV
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

30
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL HLT, 2015.
- (2015) NAACL HLT
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

31
- 84986301177
- What value do explicit high level concepts have in vision to language problems?
- Q. Wu, C. Shen, L. Liu, A. Dick, and A. v. d. Hengel. What value do explicit high level concepts have in vision to language problems? In CVPR, 2016.
- (2016) CVPR
- Wu, Q.¹ Shen, C.² Liu, L.³ Dick, A.⁴ Hengel, A.⁵

32
- 84986260127
- MSR-VTT: A large video description dataset for bridging video and language
- J. Xu, T. Mei, T. Yao, and Y. Rui. MSR-VTT: A large video description dataset for bridging video and language. In CVPR, 2016.
- (2016) CVPR
- Xu, J.¹ Mei, T.² Yao, T.³ Rui, Y.⁴

33
- 84952349307
- Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
- R. Xu, C. Xiong, W. Chen, and J. J. Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI, 2015.
- (2015) AAAI
- Xu, R.¹ Xiong, C.² Chen, W.³ Corso, J.J.⁴

34
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015.
- (2015) ICCV
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

35
- 85029380574
- arXiv preprint
- T. Yao, Y. Pan, Y. Li, Z. Qiu, and T. Mei. Boosting image captioning with attributes. arXiv preprint arXiv:1611.01646, 2016.
- (2016) Boosting Image Captioning with Attributes
- Yao, T.¹ Pan, Y.² Li, Y.³ Qiu, Z.⁴ Mei, T.⁵

36
- 84986317307
- Image captioning with semantic attention
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In CVPR, 2016.
- (2016) CVPR
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

37
- 84986275061
- Video paragraph captioning using hierarchical recurrent neural networks
- H. Yu, J. Wang, Z. Huang, Y. Yang, and W. Xu. Video paragraph captioning using hierarchical recurrent neural networks. In CVPR, 2016.
- (2016) CVPR
- Yu, H.¹ Wang, J.² Huang, Z.³ Yang, Y.⁴ Xu, W.⁵

38
- 84864049528
- Multiple instance boosting for object detection
- C. Zhang, J. C. Platt, and P. A. Viola. Multiple instance boosting for object detection. In NIPS, 2005.
- (2005) NIPS
- Zhang, C.¹ Platt, J.C.² Viola, P.A.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.