SCOPUS 정보 검색 플랫폼

COLING 2014 - 25th International Conference on Computational Linguistics, Proceedings of COLING 2014: Technical Papers

Volumn , Issue , 2014, Pages 1218-1227

Integrating language and vision to generate natural language descriptions of videos in the wild

(5) Thomason, Jesse a Venugopalan, Subhashini a Guadarrama, Sergio b Saenko, Kate c Mooney, Raymond a

a UNIVERSITY OF TEXAS AT AUSTIN (United States)

b UNIVERSITY OF CALIFORNIA (United States)

c UNIVERSITY OF MASSACHUSETTS LOWELL (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; LINGUISTICS; MODELING LANGUAGES; NATURAL LANGUAGE PROCESSING SYSTEMS; VIDEO SIGNAL PROCESSING; VISUAL LANGUAGES;

INDIVIDUAL COMPONENTS; N-GRAM LANGUAGE MODELS; NATURAL LANGUAGE PROCESSING; NATURAL LANGUAGES; PROBABILISTIC KNOWLEDGE; REAL WORLD VIDEOS; TEXTUAL DESCRIPTION; VISUAL RECOGNITION;

COMPUTATIONAL LINGUISTICS;

EID: 84959932469 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (171)

References (28)

1
- 84885996388
- Video in sentences out
- Andrei Barbu, Alexander Bridge, Zachary Burchill, Dan Coroian, Sven Dickinson, Sanja Fidler, Aaron Michaux, Sam Mussman, Siddharth Narayanaswamy, Dhaval Salvi, Lara Schmidt, Jiangnan Shangguan, Jeffrey Mark Siskind, Jarrell Waggoner, Song Wang, Jinlian Wei, Yifan Yin, and Zhiqi Zhang. 2012. Video in sentences out. In Association for Uncertainty in Artificial Intelligence (UAI).
- (2012) Association for Uncertainty in Artificial Intelligence (UAI)
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰ Schmidt, L.¹¹ Shangguan, J.¹² Siskind, J.M.¹³ Waggoner, J.¹⁴ Wang, S.¹⁵ Wei, J.¹⁶ Yin, Y.¹⁷ Zhang, Z.¹⁸

2
- 84959869628
- Workshop on vision and language
- NAACL
- Tamara Berg and Julia Hockenmaier. 2013. Workshop on vision and language. In Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL). NAACL.
- (2013) Proceedings of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)
- Berg, T.¹ Hockenmaier, J.²

3
- 79955702502
- Libsvm: A library for support vector machines
- Chih-Chung Chang and Chih-Jen Lin. 2011. Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST), 2(3):27.
- (2011) ACM Transactions on Intelligent Systems and Technology (TIST) , vol.2 , Issue.3 , pp. 27
- Chang, C.-C.¹ Lin, C.-J.²

4
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- Association for Computational Linguistics
- David L. Chen and William B. Dolan. 2011. Collecting highly parallel data for paraphrase evaluation. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-Volume 1, pages 190-200. Association for Computational Linguistics.
- (2011) Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , vol.1 , pp. 190-200
- Chen, D.L.¹ Dolan, W.B.²

5
- 85024115120
- An empirical study of smoothing techniques for language modeling
- Association for Computational Linguistics
- Stanley F. Chen and Joshua Goodman. 1996. An empirical study of smoothing techniques for language modeling. In Proceedings of the 34th annual meeting on Association for Computational Linguistics (ACL), pages 310-318. Association for Computational Linguistics.
- (1996) Proceedings of the 34th Annual Meeting on Association for Computational Linguistics (ACL) , pp. 310-318
- Chen, S.F.¹ Goodman, J.²

6
- 84887345951
- A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- Pradipto Das, Chenliang Xu, Richard F. Doell, and Jason J. Corso. 2013. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2013) IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Das, P.¹ Xu, C.² Doell, R.F.³ Corso, J.J.⁴

7
- 84946590544
- Construction and analysis of a large scale image ontology
- Jia Deng, Kai Li, Minh Do, Hao Su, and Li Fei-Fei. 2009. Construction and analysis of a large scale image ontology. Vision Sciences Society.
- (2009) Vision Sciences Society
- Deng, J.¹ Li, K.² Do, M.³ Su, H.⁴ Fei-Fei, L.⁵

8
- 84866674680
- Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition
- Jia Deng, Jonathan Krause, Alex Berg, and Li Fei-Fei. 2012. Hedging Your Bets: Optimizing Accuracy-Specificity Trade-offs in Large Scale Visual Recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2012) IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Deng, J.¹ Krause, J.² Berg, A.³ Fei-Fei, L.⁴

9
- 84904482223
- arXiv preprint arXiv:1310.1531
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2013. Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531.
- (2013) Decaf: A Deep Convolutional Activation Feature for Generic Visual Recognition
- Donahue, J.¹ Jia, Y.² Vinyals, O.³ Hoffman, J.⁴ Zhang, N.⁵ Tzeng, E.⁶ Darrell, T.⁷

10
- 77951298115
- The pascal visual object classes (voc) challenge
- June
- Mark Everingham, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserman. 2010. The pascal visual object classes (voc) challenge. International Journal of Computer Vision (IJCV), 88(2):303-338, June.
- (2010) International Journal of Computer Vision (IJCV) , vol.88 , Issue.2 , pp. 303-338
- Everingham, M.¹ Gool, L.V.² Williams, C.K.I.³ Winn, J.⁴ Zisserman, A.⁵

11
- 84874541449
- Automatic caption generation for news images
- Yansong Feng and Mirella Lapata. 2013. Automatic caption generation for news images. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 35(4):797-812.
- (2013) IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) , vol.35 , Issue.4 , pp. 797-812
- Feng, Y.¹ Lapata, M.²

12
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
- December
- Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2013. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In IEEE International Conference on Computer Vision (ICCV), December.
- (2013) IEEE International Conference on Computer Vision (ICCV)
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

13
- 84893398951
- Generating natural-language video descriptions using text-mined knowledge
- Niveda Krishnamoorthy, Girish Malkarnenkar, Raymond J. Mooney, Kate Saenko, and Sergio Guadarrama. 2013. Generating natural-language video descriptions using text-mined knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), pages 541-547.
- (2013) Proceedings of the AAAI Conference on Artificial Intelligence (AAAI) , pp. 541-547
- Krishnamoorthy, N.¹ Malkarnenkar, G.² Mooney, R.J.³ Saenko, K.⁴ Guadarrama, S.⁵

14
- 80052901011
- Baby talk: Understanding and generating image descriptions
- IEEE
- Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Alexander Berg, Yejin Choi, and Tamara Berg. 2011. Baby talk: Understanding and generating image descriptions. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE.
- (2011) IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Berg, A.⁵ Choi, Y.⁶ Berg, T.⁷

15
- 33845572523
- Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories
- IEEE
- Svetlana Lazebnik, Cordelia Schmid, and Jean Ponce. 2006. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2169-2178. IEEE.
- (2006) IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , vol.2 , pp. 2169-2178
- Lazebnik, S.¹ Schmid, C.² Ponce, J.³

16
- 85162513516
- Object bank: A high-level image representation for scene classication and semantic feature sparsification
- Li-Jia Li, Hao Su, Eric Xing, and Li Fei-Fei. 2010. Object bank: A high-level image representation for scene classication and semantic feature sparsification. In Advances in Neural Information Processing Systems (NIPS).
- (2010) Advances in Neural Information Processing Systems (NIPS)
- Li, L.-J.¹ Su, H.² Xing, E.³ Fei-Fei, L.⁴

17
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- Stroudsburg, PA, USA. Association for Computational Linguistics
- Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL), pages 220-228, Stroudsburg, PA, USA. Association for Computational Linguistics.
- (2011) Proceedings of the Fifteenth Conference on Computational Natural Language Learning (CoNLL) , pp. 220-228
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

18
- 84878772402
- Improving video activity recognition using object recognition and text mining
- Tanvi S. Motwani and Raymond J. Mooney. 2012. Improving video activity recognition using object recognition and text mining. In Proceedings of the European Conference on Artificial Intelligence (ECAI), pages 600-605.
- (2012) Proceedings of the European Conference on Artificial Intelligence (ECAI) , pp. 600-605
- Motwani, T.S.¹ Mooney, R.J.²

19
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- Vicente Ordonez, Girish Kulkarni, and Tamara L. Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In Advances in Neural Information Processing Systems (NIPS), volume 24, pages 1143-1151.
- (2011) Advances in Neural Information Processing Systems (NIPS) , vol.24 , pp. 1143-1151
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

20
- 84455177089
- Faster and smaller n-gram language models
- Association for Computational Linguistics
- Adam Pauls and Dan Klein. 2011. Faster and smaller n-gram language models. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 258-267. Association for Computational Linguistics.
- (2011) Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , pp. 258-267
- Pauls, A.¹ Klein, D.²

21
- 0003243224
- Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods
- MIT Press
- John C. Platt. 1999. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances In Large Margin Classifiers, pages 61-74. MIT Press.
- (1999) Advances in Large Margin Classifiers , pp. 61-74
- Platt, J.C.¹

22
- 84898775239
- Translating video content to natural language descriptions
- Marcus Rohrbach, QiuWei, Ivan Titov, Stefan Thater, Manfred Pinkal, and Bernt Schiele. 2013. Translating video content to natural language descriptions. In IEEE International Conference on Computer Vision (ICCV).
- (2013) IEEE International Conference on Computer Vision (ICCV)
- Rohrbach, M.¹ Wei, D.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

23
- 84908684165
- arXiv preprint arXiv:1403.6173
- Anna Senina, Marcus Rohrbach, Wei Qiu, Annemarie Friedrich, Sikandar Amin, Mykhaylo Andriluka, Manfred Pinkal, and Bernt Schiele. 2014. Coherent multi-sentence video description with variable level of detail. arXiv preprint arXiv:1403.6173.
- (2014) Coherent Multi-sentence Video Description with Variable Level of Detail
- Senina, A.¹ Rohrbach, M.² Qiu, W.³ Friedrich, A.⁴ Amin, S.⁵ Andriluka, M.⁶ Pinkal, M.⁷ Schiele, B.⁸

24
- 80052877143
- Action recognition by dense trajectories
- IEEE
- Heng Wang, Alexander Klaser, Cordelia Schmid, and Cheng-Lin Liu. 2011. Action recognition by dense trajectories. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3169-3176. IEEE.
- (2011) IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 3169-3176
- Wang, H.¹ Klaser, A.² Schmid, C.³ Liu, C.-L.⁴

25
- 77955988947
- Sun database: Large scale scene recognition from abbey to zoo
- Jianxiong Xiao, James Hays, Krista A. Ehinger, Aude Oliva, and Antonio Torralba. 2010. Sun database: Largescale scene recognition from abbey to zoo. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3485-3492.
- (2010) IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 3485-3492
- Xiao, J.¹ Hays, J.² Ehinger, K.A.³ Oliva, A.⁴ Torralba, A.⁵

26
- 80053258778
- Corpus-guided sentence generation of natural images
- Yezhou Yang, Ching Lik Teo, Hal Daume, and Yiannis Aloimonos. 2011. Corpus-guided sentence generation of natural images. In Conference on Emperical Methods in Natural Language Processing (EMNLP), pages 444-454.
- (2011) Conference on Emperical Methods in Natural Language Processing (EMNLP) , pp. 444-454
- Yang, Y.¹ Teo, C.L.² Daume, H.³ Aloimonos, Y.⁴

27
- 84897743886
- Grounded language learning from video described with sentences
- Haonan Yu and Jeffrey Mark Siskind. 2013. Grounded language learning from video described with sentences. In Proceedings of the Association for Computational Linguistics (ACL), pages 53-63.
- (2013) Proceedings of the Association for Computational Linguistics (ACL) , pp. 53-63
- Yu, H.¹ Siskind, J.M.²

28
- 33846580425
- Local features and kernels for classification of texture and object categories: A comprehensive study
- Jianguo Zhang, Marcin Marszałek, Svetlana Lazebnik, and Cordelia Schmid. 2007. Local features and kernels for classification of texture and object categories: A comprehensive study. International Journal of Computer Vision (IJCV), 73(2):213-238.
- (2007) International Journal of Computer Vision (IJCV) , vol.73 , Issue.2 , pp. 213-238
- Zhang, J.¹ Marszałek, M.² Lazebnik, S.³ Schmid, C.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.