SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 4594-4602

Jointly modeling embedding and translation to bridge video and language

(5) Pan, Yingwei a Mei, Tao b Yao, Ting b Li, Houqiang a Rui, Yong b

a UNIVERSITY OF SCIENCE AND TECHNOLOGY OF CHINA (China)

b MICROSOFT RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; PATTERN RECOGNITION; SEMANTICS;

LONG SHORT TERM MEMORY; MODEL EMBEDDING; NATURAL LANGUAGES; STATE-OF-THE-ART TECHNIQUES; SUBJECT VERB OBJECTS (SVO); UNIFIED FRAMEWORK; VISUAL INTERPRETATION; VISUAL SEMANTICS;

MODELING LANGUAGES;

EID: 84986332702 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.497 Document Type: Conference Paper

Times cited : (620)

References (35)

1
- 85083953689
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
- (2015) Neural machine translation by jointly learning to align and translate. in ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005.
- (2005) Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation And/or Summarization
- Banerjee, S.¹ Lavie, A.²

3
- 0041876117
- Matching words and pictures
- K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003.
- (2003) JMLR
- Barnard, K.¹ Duygulu, P.² Forsyth, D.³ De Freitas, N.⁴ Blei, D.M.⁵ Jordan, M.I.⁶

4
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
- (2011) ACL
- Chen, D.L.¹ Dolan, W.B.²

5
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

6
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollar, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰ Zitnick, C.L.¹¹ Zweig, G.¹²

7
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
- (2010) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

8
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
- S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV, 2013.
- (2013) ICCV
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

9
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

10
- 84856653718
- Learning cross-modality similarity for multinomial data
- Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, 2011.
- (2011) ICCV
- Jia, Y.¹ Salzmann, M.² Darrell, T.³

11
- 84911364368
- Large-scale video classification with convolutional neural networks
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

12
- 84919921461
- Multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language models. In ICML, 2014.
- (2014) ICML
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

13
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
- (2015) TACL
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

14
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

15
- 84887601544
- Babytalk: Understanding and generating simple image descriptions
- G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. IEEE Trans. on PAMI, 2013.
- (2013) IEEE Trans. on PAMI
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.L.⁸

16
- 84951072975
- Explain images with multimodal recurrent neural networks
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. In NIPS Workshop on Deep Learning, 2014.
- (2014) NIPS Workshop on Deep Learning
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.L.⁵

17
- 84893956152
- Multimedia search reranking: A literature survey
- T. Mei, Y. Rui, S. Li, and Q. Tian. Multimedia search reranking: A literature survey. ACM Computing Surveys (CSUR), 2014.
- (2014) ACM Computing Surveys (CSUR)
- Mei, T.¹ Rui, Y.² Li, S.³ Tian, Q.⁴

18
- 84904538400
- Clickthrough-based cross-view learning for image search
- Y. Pan, T. Yao, T. Mei, H. Li, C.-W. Ngo, and Y. Rui. Clickthrough-based cross-view learning for image search. In SIGIR, 2014.
- (2014) SIGIR
- Pan, Y.¹ Yao, T.² Mei, T.³ Li, H.⁴ Ngo, C.-W.⁵ Rui, Y.⁶

19
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

20
- 84959211977
- A dataset for movie description
- A. Rohrbach, M. Rohrbach, N. Tandon, and B. Schiele. A dataset for movie description. In CVPR, 2015.
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

21
- 84898775239
- Translating video content to natural language descriptions
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
- (2013) ICCV
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

22
- 84947041871
- Imagenet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. IJCV, 2015.
- (2015) IJCV
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰ Berg, A.C.¹¹ Fei-Fei, L.¹²

23
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

24
- 77955998009
- Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
- R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.
- (2010) CVPR
- Socher, R.¹ Fei-Fei, L.²

25
- 84937522268
- Going deeper with convolutions
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

26
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, 2014.
- (2014) COLING
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.⁵

27
- 84959246420
- arXiv preprint arXiv:1503.01070
- A. Torabi, C. Pal, H. Larochelle, and A. Courville. Using descriptive video services to create a large data source for video annotation research. arXiv preprint arXiv:1503.01070, 2015.
- (2015) Using Descriptive Video Services to Create A Large Data Source for Video Annotation Research
- Torabi, A.¹ Pal, C.² Larochelle, H.³ Courville, A.⁴

28
- 84973865953
- Learning spatiotemporal features with 3d convolutional networks
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3d convolutional networks. In ICCV, 2015.
- (2015) ICCV
- Tran, D.¹ Bourdev, L.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

29
- 84973882730
- Sequence to sequence-video to text
- S. Venugopalan, M. Rohrbach, J. Donahue, R. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In ICCV, 2015.
- (2015) ICCV
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

30
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL HLT, 2015.
- (2015) NAACL HLT
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

31
- 84946747440
- Show and tell: A neural image caption generator
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

32
- 84962921420
- Modeling spatial-temporal clues in a hybrid deep learning framework for video classification
- Z. Wu, X. Wang, Y.-G. Jiang, H. Ye, and X. Xue. Modeling spatial-temporal clues in a hybrid deep learning framework for video classification. MM, 2015.
- (2015) MM
- Wu, Z.¹ Wang, X.² Jiang, Y.-G.³ Ye, H.⁴ Xue, X.⁵

33
- 84952349307
- Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
- R. Xu, C. Xiong, W. Chen, and J. J. Corso. Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In AAAI, 2015.
- (2015) AAAI
- Xu, R.¹ Xiong, C.² Chen, W.³ Corso, J.J.⁴

34
- 84973884896
- Describing videos by exploiting temporal structure
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In ICCV, 2015.
- (2015) ICCV
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

35
- 84973907734
- Learning query and image similarities with ranking canonical correlation analysis
- T. Yao, T. Mei, and C.-W. Ngo. Learning query and image similarities with ranking canonical correlation analysis. In ICCV, 2015.
- (2015) ICCV
- Yao, T.¹ Mei, T.² Ngo, C.-W.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.