메뉴 건너뛰기




Volumn 2016-December, Issue , 2016, Pages 4631-4640

MovieQA: Understanding stories in movies through question-answering

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; SEMANTICS;

EID: 84986296727     PISSN: 10636919     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1109/CVPR.2016.501     Document Type: Conference Paper
Times cited : (806)

References (50)
  • 2
    • 84887366672 scopus 로고    scopus 로고
    • Semisupervised Learning with Constraints for Person Identification in Multimedia Data
    • M. Baeuml, M. Tapaswi, and R. Stiefelhagen. Semisupervised Learning with Constraints for Person Identification in Multimedia Data. In CVPR, 2013.
    • (2013) CVPR
    • Baeuml, M.1    Tapaswi, M.2    Stiefelhagen, R.3
  • 5
    • 84859089502 scopus 로고    scopus 로고
    • Collecting highly parallel data for paraphrase evaluation
    • D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
    • (2011) ACL
    • Chen, D.L.1    Dolan, W.B.2
  • 7
    • 70450145539 scopus 로고    scopus 로고
    • Movie/script: Alignment and parsing of video and text transcription
    • T. Cour, C. Jordan, E. Miltsakaki, and B. Taskar. Movie/Script: Alignment and Parsing of Video and Text Transcription. In ECCV, 2008.
    • (2008) ECCV
    • Cour, T.1    Jordan, C.2    Miltsakaki, E.3    Taskar, B.4
  • 8
    • 84887345951 scopus 로고    scopus 로고
    • A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching
    • P. Das, C. Xu, R. F. Doell, and J. J. Corso. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching. CVPR, 2013.
    • (2013) CVPR
    • Das, P.1    Xu, C.2    Doell, R.F.3    Corso, J.J.4
  • 9
    • 84887345951 scopus 로고    scopus 로고
    • A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
    • P. Das, C. Xu, R. F. Doell, and J. J. Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR, 2013.
    • (2013) CVPR
    • Das, P.1    Xu, C.2    Doell, R.F.3    Corso, J.J.4
  • 13
    • 84946734827 scopus 로고    scopus 로고
    • Deep visual-semantic alignments for generating image descriptions
    • A. Karpathy and L. Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. In CVPR, 2015.
    • (2015) CVPR
    • Karpathy, A.1    Fei-Fei, L.2
  • 15
    • 84952349298 scopus 로고    scopus 로고
    • Unifying visual-semantic embeddings with multimodal neural language models
    • R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. TACL, 2015.
    • (2015) TACL
    • Kiros, R.1    Salakhutdinov, R.2    Zemel, R.S.3
  • 17
    • 84911370987 scopus 로고    scopus 로고
    • What are you talking about? Text-to-image coreference
    • C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? Text-to-Image Coreference. In CVPR, 2014.
    • (2014) CVPR
    • Kong, C.1    Lin, D.2    Bansal, M.3    Urtasun, R.4    Fidler, S.5
  • 21
    • 84911442106 scopus 로고    scopus 로고
    • Visual semantic search: Retrieving videos via complex textual queries
    • D. Lin, S. Fidler, C. Kong, and R. Urtasun. Visual Semantic Search: Retrieving Videos via Complex Textual Queries. CVPR, 2014.
    • (2014) CVPR
    • Lin, D.1    Fidler, S.2    Kong, C.3    Urtasun, R.4
  • 23
    • 84937822746 scopus 로고    scopus 로고
    • A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
    • M. Malinowski and M. Fritz. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In NIPS, 2014.
    • (2014) NIPS
    • Malinowski, M.1    Fritz, M.2
  • 24
    • 84973896625 scopus 로고    scopus 로고
    • Ask your neurons: A neural-based approach to answering questions about images
    • M. Malinowski, M. Rohrbach, and M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images. In ICCV, 2015.
    • (2015) ICCV
    • Malinowski, M.1    Rohrbach, M.2    Fritz, M.3
  • 26
    • 85162522202 scopus 로고    scopus 로고
    • Im2Text: Describing images using 1 million captioned photographs
    • V. Ordonez, G. Kulkarni, and T. Berg. Im2Text: Describing Images Using 1 Million Captioned Photographs. In NIPS, 2011.
    • (2011) NIPS
    • Ordonez, V.1    Kulkarni, G.2    Berg, T.3
  • 28
    • 84943782750 scopus 로고    scopus 로고
    • Linking People in Videos with" Their" Names Using Coreference Resolution
    • V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking People in Videos with "Their" Names Using Coreference Resolution. In ECCV. 2014.
    • (2014) ECCV.
    • Ramanathan, V.1    Joulin, A.2    Liang, P.3    Fei-Fei, L.4
  • 29
    • 84898775557 scopus 로고    scopus 로고
    • Video Event Understanding using Natural Language Descriptions
    • V. Ramanathan, P. Liang, and L. Fei-Fei. Video Event Understanding using Natural Language Descriptions. In ICCV, 2013.
    • (2013) ICCV
    • Ramanathan, V.1    Liang, P.2    Fei-Fei, L.3
  • 31
    • 84926345282 scopus 로고    scopus 로고
    • Mctest: A challenge dataset for the open-domain machine comprehension of text
    • M. Richardson, C. J. Burges, and E. Renshaw. Mctest: A challenge dataset for the open-domain machine comprehension of text. In EMNLP, 2013.
    • (2013) EMNLP
    • Richardson, M.1    Burges, C.J.2    Renshaw, E.3
  • 34
  • 35
    • 70450202706 scopus 로고    scopus 로고
    • Who are you?"-Learning person specific classifiers from video
    • J. Sivic, M. Everingham, and A. Zisserman. "Who are you?"-Learning person specific classifiers from video. CVPR, pages 1145-1152, 2009.
    • (2009) CVPR , pp. 1145-1152
    • Sivic, J.1    Everingham, M.2    Zisserman, A.3
  • 38
    • 84959255361 scopus 로고    scopus 로고
    • Book2Movie: Aligning video scenes with book chapters
    • M. Tapaswi, M. Bauml, and R. Stiefelhagen. Book2Movie: Aligning Video scenes with Book chapters. In CVPR, 2015.
    • (2015) CVPR
    • Tapaswi, M.1    Bauml, M.2    Stiefelhagen, R.3
  • 39
    • 84977834021 scopus 로고    scopus 로고
    • Aligning plot synopses to videos for story-based retrieval
    • M. Tapaswi, M. Bäuml, and R. Stiefelhagen. Aligning Plot Synopses to Videos for Story-based Retrieval. IJMIR, 4:3-16, 2015.
    • (2015) IJMIR , vol.4 , pp. 3-16
    • Tapaswi, M.1    Bäuml, M.2    Stiefelhagen, R.3
  • 41
    • 84944069490 scopus 로고    scopus 로고
    • Translating videos to natural language using deep recurrent neural networks
    • abs/1312.6229, cs.CV
    • S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating Videos to Natural Language Using Deep Recurrent Neural Networks. CoRR abs/1312.6229, cs.CV, 2014.
    • (2014) CoRR
    • Venugopalan, S.1    Xu, H.2    Donahue, J.3    Rohrbach, M.4    Mooney, R.J.5    Saenko, K.6
  • 43
    • 84944062514 scopus 로고    scopus 로고
    • Machine comprehension with syntax, frames, and semantics
    • H. Wang, M. Bansal, K. Gimpel, and D. McAllester. Machine Comprehension with Syntax, Frames, and Semantics. In ACL, 2015.
    • (2015) ACL
    • Wang, H.1    Bansal, M.2    Gimpel, K.3    McAllester, D.4
  • 45
    • 80053258778 scopus 로고    scopus 로고
    • Corpus-guided sentence generation of natural images
    • Y. Yang, C. L. Teo, H. Daumé, III, and Y. Aloimonos. Corpus-guided Sentence Generation of Natural Images. In EMNLP, pages 444-454, 2011.
    • (2011) EMNLP , pp. 444-454
    • Yang, Y.1    Teo, C.L.2    Daumé, H.3    Aloimonos, Y.4
  • 46
    • 84906494296 scopus 로고    scopus 로고
    • From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
    • P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In TACL, 2014.
    • (2014) TACL
    • Young, P.1    Lai, A.2    Hodosh, M.3    Hockenmaier, J.4
  • 47
    • 84959862697 scopus 로고    scopus 로고
    • Visual madlibs: Fill in the blank image generation and question answering
    • L. Yu, E. Park, A. C. Berg, and T. L. Berg. Visual Madlibs: Fill in the blank Image Generation and Question Answering. In ICCV, 2015.
    • (2015) ICCV
    • Yu, L.1    Park, E.2    Berg, A.C.3    Berg, T.L.4
  • 48
    • 84937964578 scopus 로고    scopus 로고
    • Learning deep features for scene recognition using places database
    • B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva. Learning Deep Features for Scene Recognition using Places Database. In NIPS, 2014.
    • (2014) NIPS
    • Zhou, B.1    Lapedriza, A.2    Xiao, J.3    Torralba, A.4    Oliva, A.5
  • 49
    • 84973911532 scopus 로고    scopus 로고
    • Aligning books and movies: Towards story-like visual explanations by watching movies and reading books
    • Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler. Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books. In ICCV, 2015.
    • (2015) ICCV
    • Zhu, Y.1    Kiros, R.2    Zemel, R.3    Salakhutdinov, R.4    Urtasun, R.5    Torralba, A.6    Fidler, S.7
  • 50
    • 84959182108 scopus 로고    scopus 로고
    • Adopting abstract images for semantic scene understanding
    • C. Zitnick, R. Vedantam, and D. Parikh. Adopting abstract images for semantic scene understanding. PAMI, PP, 2014.
    • (2014) PAMI
    • Zitnick, C.1    Vedantam, R.2    Parikh, D.3


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.