메뉴 건너뛰기




Volumn 3, Issue , 2015, Pages 2346-2352

Jointly modeling deep video and compositional text to bridge vision and language in a unified framework

Author keywords

[No Author keywords available]

Indexed keywords

ARTIFICIAL INTELLIGENCE; COMPUTATIONAL LINGUISTICS; MODELING LANGUAGES; NATURAL LANGUAGE PROCESSING SYSTEMS; SEMANTICS; VECTOR SPACES; VISUAL LANGUAGES;

EID: 84940762015     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: None     Document Type: Conference Paper
Times cited : (198)

References (36)
  • 2
    • 84859089502 scopus 로고    scopus 로고
    • Collecting highly parallel data for paraphrase evaluation
    • Chen, D. L., and Dolan, W. B. 2011. Collecting highly parallel data for paraphrase evaluation. In ACL.
    • (2011) ACL
    • Chen, D.L.1    Dolan, W.B.2
  • 3
    • 84887345951 scopus 로고    scopus 로고
    • A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
    • Das, P.; Xu, C; Doell, R. F.; and Corso, J. J. 2013. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR.
    • (2013) CVPR
    • Das, P.1    Xu, C.2    Doell, R.F.3    Corso, J.J.4
  • 4
    • 84874280480 scopus 로고    scopus 로고
    • Translating related words to videos and back through latent topics
    • Das, P.; Srihari, R. K.; and Corso, J. J. 2013. Translating related words to videos and back through latent topics. In WS DM.
    • (2013) WS DM
    • Das, P.1    Srihari, R.K.2    Corso, J.J.3
  • 9
    • 0029727454 scopus 로고    scopus 로고
    • Learning task-dependent distributed representations by backpropagation through structure
    • Goller, C, and Kuchler, A. 1996. Learning task-dependent distributed representations by backpropagation through structure. In International Conference on Neural Networks.
    • (1996) International Conference on Neural Networks
    • Goller, C.1    Kuchler, A.2
  • 11
    • 70450202741 scopus 로고    scopus 로고
    • Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
    • Gupta, A.; Srinivasan, P.; Shi, J.; and Davis, L. S. 2009. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In CVPR.
    • (2009) CVPR
    • Gupta, A.1    Srinivasan, P.2    Shi, J.3    Davis, L.S.4
  • 13
    • 84977906791 scopus 로고    scopus 로고
    • Accurate unlexicalized parsing
    • Klein, D., and Manning, C. D. 2013. Accurate unlexicalized parsing. In ACL.
    • (2013) ACL
    • Klein, D.1    Manning, C.D.2
  • 15
    • 84876231242 scopus 로고    scopus 로고
    • Lmagenet classification with deep convolutional neural networks
    • Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. lmagenet classification with deep convolutional neural networks. In NIPS.
    • (2012) NIPS
    • Krizhevsky, A.1    Sutskever, I.2    Hinton, G.E.3
  • 17
    • 84856653481 scopus 로고    scopus 로고
    • Object bank: A high-level image representation for scene classification and semantic feature sparsification
    • Li, L.-.I.; Su, H.; Xing, E. P.; and Fei-Fei, L. 2011. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS.
    • (2011) NIPS
    • Li, L.-I.1    Su, H.2    Xing, E.P.3    Fei-Fei, L.4
  • 18
    • 84898956512 scopus 로고    scopus 로고
    • Distributed representations of words and phrases and their compositionality
    • Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In MPS.
    • (2013) MPS
    • Mikolov, T.1    Sutskever, I.2    Chen, K.3    Corrado, G.4    Dean, J.5
  • 19
    • 84976702763 scopus 로고
    • Wordnet: A lexical database for english
    • Miller, G. A. 1995. Wordnet: A lexical database for english. In Communications of the ACM, 39-41.
    • (1995) Communications of the ACM , pp. 39-41
    • Miller, G.A.1
  • 20
    • 84959182849 scopus 로고    scopus 로고
    • Improving video activity recognition using object recognition and text mining
    • Motwani, T., and Mooney, R. 2012. improving video activity recognition using object recognition and text mining. In ECAL.
    • (2012) ECAL
    • Motwani, T.1    Mooney, R.2
  • 22
    • 84898465467 scopus 로고    scopus 로고
    • Evaluation of dimensionality reduction methods for image auto-annotation
    • Nakayama, H.; Harada, T.; and Kuniyoshi, Y. 2010. Evaluation of dimensionality reduction methods for image auto-annotation. In BMVC.
    • (2010) BMVC
    • Nakayama, H.1    Harada, T.2    Kuniyoshi, Y.3
  • 23
    • 33645236134 scopus 로고    scopus 로고
    • Word-net: similarity - Measuring the relatedness of concepts
    • Pedersen, T.; Patwardhan, S.; and Michelizzi, J. 2004. Word-net: similarity - measuring the relatedness of concepts. In HLT-NAACL.
    • (2004) HLT-NAACL
    • Pedersen, T.1    Patwardhan, S.2    Michelizzi, J.3
  • 24
    • 85123966307 scopus 로고
    • Distributional clustering of english words
    • Pereira, F.; Tishby, N.; and Lee, L. 1993. Distributional clustering of english words. In ACL.
    • (1993) ACL
    • Pereira, F.1    Tishby, N.2    Lee, L.3
  • 25
    • 84898775557 scopus 로고    scopus 로고
    • Video event understanding using natural language description
    • Ramanathan, V.; Liang, P.; and Fei-Fei, L. 2013. Video event understanding using natural language description. In ICCV.
    • (2013) ICCV
    • Ramanathan, V.1    Liang, P.2    Fei-Fei, L.3
  • 27
    • 84866718894 scopus 로고    scopus 로고
    • Action bank: A high-level representation of activity in video
    • Sadanand, S., and Corso, J. J. 2012. Action bank: A high-level representation of activity in video. In CVPR.
    • (2012) CVPR
    • Sadanand, S.1    Corso, J.J.2
  • 28
    • 77955998009 scopus 로고    scopus 로고
    • Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
    • Socher, R., and Fei-Fei, L. 2010. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR.
    • (2010) CVPR
    • Socher, R.1    Fei-Fei, L.2
  • 30
    • 84455173075 scopus 로고    scopus 로고
    • Multiple feature hashing for real-time large scale near-duplicate video retrieval
    • Song, J.; Yang, Y.; and Huang, Z. 2011. Multiple feature hashing for real-time large scale near-duplicate video retrieval. In ACM International Conference on Multimedia.
    • (2011) ACM International Conference on Multimedia
    • Song, J.1    Yang, Y.2    Huang, Z.3
  • 31
    • 84959932469 scopus 로고    scopus 로고
    • Integrating language and vision to generate natural language descriptions of videos in the wild
    • Thomason, J.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; and Mooney, R. 2014. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING.
    • (2014) COLING
    • Thomason, J.1    Venugopalan, S.2    Guadarrama, S.3    Saenko, K.4    Mooney, R.5
  • 32
    • 84887356306 scopus 로고    scopus 로고
    • Spatiotemporal de-formable part models for action detection
    • Tian, Y; Sukthankar, R.; and Shah, M. 2013. Spatiotemporal de-formable part models for action detection. In CVPR.
    • (2013) CVPR
    • Tian, Y.1    Sukthankar, R.2    Shah, M.3
  • 35
    • 84897743886 scopus 로고    scopus 로고
    • Grounded language learning from video described with sentences
    • Yu, H., and Siskind, J. M. 2013. Grounded language learning from video described with sentences. In ACL.
    • (2013) ACL
    • Yu, H.1    Siskind, J.M.2
  • 36
    • 84898795297 scopus 로고    scopus 로고
    • From ademes to action: A strongly-supervised representation for detailed action understanding
    • Zhang, W.; Zhu, M.; and Derpanis, K. G. 2013. From ademes to action: A strongly-supervised representation for detailed action understanding. In ICCV.
    • (2013) ICCV
    • Zhang, W.1    Zhu, M.2    Derpanis, K.G.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.