SCOPUS 정보 검색 플랫폼

Proceedings of the National Conference on Artificial Intelligence

Volumn 3, Issue , 2015, Pages 2346-2352

Jointly modeling deep video and compositional text to bridge vision and language in a unified framework

(4) Xu, Ran a Xiong, Caiming b Chen, Wei a Corso, Jason J c

a State University of New York (United States)

b UNIVERSITY OF CALIFORNIA (United States)

c UNIVERSITY OF MICHIGAN (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; COMPUTATIONAL LINGUISTICS; MODELING LANGUAGES; NATURAL LANGUAGE PROCESSING SYSTEMS; SEMANTICS; VECTOR SPACES; VISUAL LANGUAGES;

COMPOSITIONAL SEMANTICS; DEEP NEURAL NETWORKS; DEPENDENCY TREES; LANGUAGE MODEL; NATURAL LANGUAGE GENERATION; SEMANTIC INFORMATION; UNIFIED FRAMEWORK; VIDEO RETRIEVAL;

UNIFIED MODELING LANGUAGE;

EID: 84940762015 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (198)

References (36)

1
- 84885996388
- Video in sentences out
- Barbu, A.; Bridge, A.; Burchill, Z.; Coroian, D.; Dickinson, S.; Fidler, S.; Michaux, A.; Mussman, S.; Narayanaswamy, S.; Salvi, D.; Schmidt, L.; Shangguan, J.; rey Mark Siskind, 1; Waggoner, J.; Wang, S.; Wei, J.; Yin, Y; and Zhang, Z. 2012. Video in sentences out. In UAL
- (2012) UAL
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰ Schmidt, L.¹¹ Shangguan, J.¹² Rey Mark Siskind, J.¹³ Waggoner, J.¹⁴ Wang, S.¹⁵ Wei, J.¹⁶ Yin, Y.¹⁷ Zhang, Z.¹⁸

2
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- Chen, D. L., and Dolan, W. B. 2011. Collecting highly parallel data for paraphrase evaluation. In ACL.
- (2011) ACL
- Chen, D.L.¹ Dolan, W.B.²

3
- 84887345951
- A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- Das, P.; Xu, C; Doell, R. F.; and Corso, J. J. 2013. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR.
- (2013) CVPR
- Das, P.¹ Xu, C.² Doell, R.F.³ Corso, J.J.⁴

4
- 84874280480
- Translating related words to videos and back through latent topics
- Das, P.; Srihari, R. K.; and Corso, J. J. 2013. Translating related words to videos and back through latent topics. In WS DM.
- (2013) WS DM
- Das, P.¹ Srihari, R.K.² Corso, J.J.³

5
- 84946590544
- Construction and analysis of a large scale image ontology
- Deng, J.; Li, K.; Do, M.; Su, H.; and Fei-Fei, L. 2009. Construction and analysis of a large scale image ontology. In Vision Science Society.
- (2009) Vision Science Society
- Deng, J.¹ Li, K.² Do, M.³ Su, H.⁴ Fei-Fei, L.⁵

6
- 84904482223
- arXiv:1310.1531
- Donahua, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; and Darrell, T. 2013. Decaf: A deep convoiutional activation feature for generic visual recognition. In arXiv:1310.1531.
- (2013) Decaf: A Deep Convoiutional Activation Feature for Generic Visual Recognition
- Donahua, J.¹ Jia, Y.² Vinyals, O.³ Hoffman, J.⁴ Zhang, N.⁵ Tzeng, E.⁶ Darrell, T.⁷

7
- 77955422240
- Object detection with discriminatively trained part based models
- Felzenszwalb, P. F.; Girshick, R. B.; McAllester, D.; and Ramanan, D. 2010. Object detection with discriminatively trained part based models. TPAMI.
- (2010) TPAMI
- Felzenszwalb, P.F.¹ Girshick, R.B.² McAllester, D.³ Ramanan, D.⁴

8
- 84898958665
- Devise: A deep visual-semantic embedding model
- Frome, A.; Corrado, G. S.; Shlens, J.; Bengio, S.; Dean, J.; Mikolov, T.; et al. 2013. Devise: A deep visual-semantic embedding model. In NIPS.
- (2013) NIPS
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

9
- 0029727454
- Learning task-dependent distributed representations by backpropagation through structure
- Goller, C, and Kuchler, A. 1996. Learning task-dependent distributed representations by backpropagation through structure. In International Conference on Neural Networks.
- (1996) International Conference on Neural Networks
- Goller, C.¹ Kuchler, A.²

10
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
- Guadarrama, S.; Krishnamoorthy, N.; Malkarnenkar, G.; Venu-gopalan, S.; Mooney, R.; Darrell, T.; and Saenko, K. 2013. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV.
- (2013) ICCV
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venu-Gopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

11
- 70450202741
- Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
- Gupta, A.; Srinivasan, P.; Shi, J.; and Davis, L. S. 2009. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In CVPR.
- (2009) CVPR
- Gupta, A.¹ Srinivasan, P.² Shi, J.³ Davis, L.S.⁴

12
- 84911364368
- Large-scale video classification with convoiutional neural networks
- Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; and Fei-Fei, L. 2014. Large-scale video classification with convoiutional neural networks. In CVPR.
- (2014) CVPR
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

13
- 84977906791
- Accurate unlexicalized parsing
- Klein, D., and Manning, C. D. 2013. Accurate unlexicalized parsing. In ACL.
- (2013) ACL
- Klein, D.¹ Manning, C.D.²

14
- 84893398951
- Generating natural-language video descriptions using text-mined knowledge
- Krishnamoorthy, N.; Malkarnenkar, G.; Mooney, R. J.; Saenko, K.; and Guadarrama, S. 2013. Generating natural-language video descriptions using text-mined knowledge. In AAAI.
- (2013) AAAI
- Krishnamoorthy, N.¹ Malkarnenkar, G.² Mooney, R.J.³ Saenko, K.⁴ Guadarrama, S.⁵

15
- 84876231242
- Lmagenet classification with deep convolutional neural networks
- Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. lmagenet classification with deep convolutional neural networks. In NIPS.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

16
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- Kulkarni, G.; Premraj, V.; Dhar, S.; Li, S.; Choi, Y.; Berg, A. C; and Berg, T. L. 2011. Baby talk: Understanding and generating simple image descriptions. In CVPR.
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

17
- 84856653481
- Object bank: A high-level image representation for scene classification and semantic feature sparsification
- Li, L.-.I.; Su, H.; Xing, E. P.; and Fei-Fei, L. 2011. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS.
- (2011) NIPS
- Li, L.-I.¹ Su, H.² Xing, E.P.³ Fei-Fei, L.⁴

18
- 84898956512
- Distributed representations of words and phrases and their compositionality
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In MPS.
- (2013) MPS
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.⁴ Dean, J.⁵

19
- 84976702763
- Wordnet: A lexical database for english
- Miller, G. A. 1995. Wordnet: A lexical database for english. In Communications of the ACM, 39-41.
- (1995) Communications of the ACM , pp. 39-41
- Miller, G.A.¹

20
- 84959182849
- Improving video activity recognition using object recognition and text mining
- Motwani, T., and Mooney, R. 2012. improving video activity recognition using object recognition and text mining. In ECAL.
- (2012) ECAL
- Motwani, T.¹ Mooney, R.²

21
- 77949676020
- Video annotation through search and graph reinforcement mining
- Moxley, E.; Mei, T.; and Manjunath, B. S. 2010. Video annotation through search and graph reinforcement mining. In IEEE Transactions on Multimedia.
- (2010) IEEE Transactions on Multimedia
- Moxley, E.¹ Mei, T.² Manjunath, B.S.³

22
- 84898465467
- Evaluation of dimensionality reduction methods for image auto-annotation
- Nakayama, H.; Harada, T.; and Kuniyoshi, Y. 2010. Evaluation of dimensionality reduction methods for image auto-annotation. In BMVC.
- (2010) BMVC
- Nakayama, H.¹ Harada, T.² Kuniyoshi, Y.³

23
- 33645236134
- Word-net: similarity - Measuring the relatedness of concepts
- Pedersen, T.; Patwardhan, S.; and Michelizzi, J. 2004. Word-net: similarity - measuring the relatedness of concepts. In HLT-NAACL.
- (2004) HLT-NAACL
- Pedersen, T.¹ Patwardhan, S.² Michelizzi, J.³

24
- 85123966307
- Distributional clustering of english words
- Pereira, F.; Tishby, N.; and Lee, L. 1993. Distributional clustering of english words. In ACL.
- (1993) ACL
- Pereira, F.¹ Tishby, N.² Lee, L.³

25
- 84898775557
- Video event understanding using natural language description
- Ramanathan, V.; Liang, P.; and Fei-Fei, L. 2013. Video event understanding using natural language description. In ICCV.
- (2013) ICCV
- Ramanathan, V.¹ Liang, P.² Fei-Fei, L.³

26
- 84898775239
- Translating video content to natural language descriptions
- Rohrbach, M.; Qiu, W.; Titov, I.; Thater, S.; Pinkal, M.; and Schiele, B. 2013. Translating video content to natural language descriptions. In ICCV.
- (2013) ICCV
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

27
- 84866718894
- Action bank: A high-level representation of activity in video
- Sadanand, S., and Corso, J. J. 2012. Action bank: A high-level representation of activity in video. In CVPR.
- (2012) CVPR
- Sadanand, S.¹ Corso, J.J.²

28
- 77955998009
- Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
- Socher, R., and Fei-Fei, L. 2010. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR.
- (2010) CVPR
- Socher, R.¹ Fei-Fei, L.²

29
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- Socher, R.: Karpathy, A.; Le, Q. V.; Manning, C. D.; and Ng, A. Y. 2013. Grounded compositional semantics for finding and describing images with sentences. In Transactions of the ACL.
- (2013) Transactions of the ACL
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

30
- 84455173075
- Multiple feature hashing for real-time large scale near-duplicate video retrieval
- Song, J.; Yang, Y.; and Huang, Z. 2011. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM International Conference on Multimedia.
- (2011) ACM International Conference on Multimedia
- Song, J.¹ Yang, Y.² Huang, Z.³

31
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- Thomason, J.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; and Mooney, R. 2014. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING.
- (2014) COLING
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.⁵

32
- 84887356306
- Spatiotemporal de-formable part models for action detection
- Tian, Y; Sukthankar, R.; and Shah, M. 2013. Spatiotemporal de-formable part models for action detection. In CVPR.
- (2013) CVPR
- Tian, Y.¹ Sukthankar, R.² Shah, M.³

33
- 80052877143
- Action recognition by dense trajectories
- Wang, H.; Kläser, A.; Schmid, C; and Liu, C.-L. 2011. Action recognition by dense trajectories. In CVPR.
- (2011) CVPR
- Wang, H.¹ Kläser, A.² Schmid, C.³ Liu, C.-L.⁴

34
- 84887428782
- Annotation for free: Video tagging by mining user search behavior
- Yao, T; Mei, T.; Ngo, C.-W.; and Li, S. 2013. Annotation for free: Video tagging by mining user search behavior. In ACM International Conference on Multimedia.
- (2013) ACM International Conference on Multimedia
- Yao, T.¹ Mei, T.² Ngo, C.-W.³ Li, S.⁴

35
- 84897743886
- Grounded language learning from video described with sentences
- Yu, H., and Siskind, J. M. 2013. Grounded language learning from video described with sentences. In ACL.
- (2013) ACL
- Yu, H.¹ Siskind, J.M.²

36
- 84898795297
- From ademes to action: A strongly-supervised representation for detailed action understanding
- Zhang, W.; Zhu, M.; and Derpanis, K. G. 2013. From ademes to action: A strongly-supervised representation for detailed action understanding. In ICCV.
- (2013) ICCV
- Zhang, W.¹ Zhu, M.² Derpanis, K.G.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.