SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 1988-1997

CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning

(6) Johnson, Justin a,b Fei Fei, Li a Hariharan, Bharath b Zitnick, C Lawrence b Van Der Maaten, Laurens b Girshick, Ross b

a STANFORD UNIVERSITY (United States)

b FACEBOOK AI RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; STATISTICAL TESTS;

ARTIFICIAL INTELLIGENCE SYSTEMS; DIAGNOSTIC TESTS; MULTIPLE SOURCE; QUESTION ANSWERING; VISUAL DATA; VISUAL REASONING;

VISUAL LANGUAGES;

EID: 85041904911 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.215 Document Type: Conference Paper

Times cited : (1888)

References (50)

1
- 85072842417
- Analyzing the behavior of visual question answering models
- A. Agrawal, D. Batra, and D. Parikh. Analyzing the behavior of visual question answering models. In EMNLP, 2016.
- (2016) EMNLP
- Agrawal, A.¹ Batra, D.² Parikh, D.³

2
- 84993660571
- Learning to compose neural networks for question answering
- J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Learning to compose neural networks for question answering. In NAACL, 2016.
- (2016) NAACL
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

3
- 84986272553
- Neural module networks
- J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Neural module networks. In CVPR, 2016.
- (2016) CVPR
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

4
- 84973890960
- VQA: Visual question answering
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Zitnick, and D. Parikh. VQA: Visual question answering. In ICCV, 2015.
- (2015) ICCV
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.⁶ Parikh, D.⁷

5
- 84879854889
- Representation learning: A review and new perspectives
- Y. Bengio, A. Courville, and P. Vincent. Representation learning: A review and new perspectives. TPAMI, 35(8):1798-1828, 2014.
- (2014) TPAMI , vol.35 , Issue.8 , pp. 1798-1828
- Bengio, Y.¹ Courville, A.² Vincent, P.³

6
- 84992615443
- Blender Foundation, Blender Institute, Amsterdam
- Blender Online Community. Blender - a 3D modelling and rendering package. Blender Foundation, Blender Institute, Amsterdam, 2016.
- (2016) Blender - A 3D Modelling and Rendering Package

7
- 84959908834
- Deja image-captions: A corpus of expressive image descriptions in repetition
- J. Chen, P. Kuznetsova, D. Warren, and Y. Choi. Deja image-captions: A corpus of expressive image descriptions in repetition. In NAACL, 2015.
- (2015) NAACL
- Chen, J.¹ Kuznetsova, P.² Warren, D.³ Choi, Y.⁴

8
- 80051961229
- Every picture tells a story: Generating sentences for images
- A. Farhadi, M. Hejrati, A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences for images. In ECCV, 2010.
- (2010) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

9
- 84990060711
- A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In arXiv:1606.01847, 2016.
- (2016) Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
- Fukui, A.¹ Park, D.H.² Yang, D.³ Rohrbach, A.⁴ Darrell, T.⁵ Rohrbach, M.⁶

10
- 84965148420
- Are you talking to a Machine? Dataset and methods for multilingual image question answering
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine? Dataset and methods for multilingual image question answering. In NIPS, 2015.
- (2015) NIPS
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

11
- 84986266770
- Compact bilinear pooling
- Y. Gao, O. Beijbom, N. Zhang, and T. Darrell. Compact bilinear pooling. In CVPR, 2016.
- (2016) CVPR
- Gao, Y.¹ Beijbom, O.² Zhang, N.³ Darrell, T.⁴

12
- 84925422907
- Visual turing test for computer vision systems
- D. Geman, S. Geman, N. Hallonquist, and L. Younes. Visual Turing test for computer vision systems. Proceedings of the National Academy of Sciences, 112(12):3618-3623, 2015.
- (2015) Proceedings of the National Academy of Sciences , vol.112 , Issue.12 , pp. 3618-3623
- Geman, D.¹ Geman, S.² Hallonquist, N.³ Younes, L.⁴

13
- 84993949467
- Hybrid computing using a neural network with dynamic external memory
- A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, A. Badia, K. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, and D. Hassabis. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.
- (2016) Nature
- Graves, A.¹ Wayne, G.² Reynolds, M.³ Harley, T.⁴ Danihelka, I.⁵ Grabska-Barwinska, A.⁶ Colmenarejo, S.⁷ Grefenstette, E.⁸ Ramalho, T.⁹ Agapiou, J.¹⁰ Badia, A.¹¹ Hermann, K.¹² Zwols, Y.¹³ Ostrovski, G.¹⁴ Cain, A.¹⁵ King, H.¹⁶ Summerfield, C.¹⁷ Blunsom, P.¹⁸ Kavukcuoglu, K.¹⁹ Hassabis, D.²⁰ more..

14
- 84986274465
- Deep residual learning for image recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

15
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

16
- 85041926703
- Revisiting visual question answering baselines
- A. Jabri, A. Joulin, and L. van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016.
- (2016) ECCV
- Jabri, A.¹ Joulin, A.² Van Der Maaten, L.³

17
- 84959233256
- Image retrieval using scene graphs
- J. Johnson, R. Krishna, M. Stark, L.-J. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei. Image retrieval using scene graphs. In CVPR, 2015.
- (2015) CVPR
- Johnson, J.¹ Krishna, R.² Stark, M.³ Li, L.-J.⁴ Shamma, D.A.⁵ Bernstein, M.S.⁶ Fei-Fei, L.⁷

18
- 84965117324
- Inferring algorithmic patterns with stack-augmented recurrent nets
- A. Joulin and T. Mikolov. Inferring algorithmic patterns with stack-augmented recurrent nets. In NIPS, 2015.
- (2015) NIPS
- Joulin, A.¹ Mikolov, T.²

19
- 84943540775
- Referitgame: Referring to objects in photographs of natural scenes
- S. Kazemzadeh, V. Ordonez, M. Matten, and T. Berg. Referitgame: Referring to objects in photographs of natural scenes. In EMNLP, 2014.
- (2014) EMNLP
- Kazemzadeh, S.¹ Ordonez, V.² Matten, M.³ Berg, T.⁴

20
- 85083951076
- Adam: A method for stochastic optimization
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. In ICLR, 2015.
- (2015) ICLR
- Kingma, D.¹ Ba, J.²

21
- 84990070438
- Visual genome: Connecting language and vision using crowdsourced dense image annotations
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Jia-Li, D. Shamma, M. Bernstein, and L. Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 2016.
- (2016) IJCV
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Jia-Li, L.⁹ Shamma, D.¹⁰ Bernstein, M.¹¹ Fei-Fei, L.¹²

22
- 85011954581
- The winograd schema challenge
- H. J. Levesque, E. Davis, and L. Morgenstern. The Winograd schema challenge. In AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning, Volume 46, page 47, 2011.
- (2011) AAAI Spring Symposium: Logical Formalizations of Commonsense Reasoning , vol.46 , pp. 47
- Levesque, H.J.¹ Davis, E.² Morgenstern, L.³

23
- 84937834115
- Microsoft COCO: Common objects in context
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollar, P.⁷ Zitnick, C.⁸

24
- 85018917850
- Hierarchical question-image co-attention for visual question answering
- J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical question-image co-attention for visual question answering. In NIPS, 2016.
- (2016) NIPS
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

25
- 85007153677
- Learning to answer questions from image using convolutional neural network
- L. Ma, Z. Lu, and H. Li. Learning to answer questions from image using convolutional neural network. In AAAI, 2016.
- (2016) AAAI
- Ma, L.¹ Lu, Z.² Li, H.³

26
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS, 2014.
- (2014) NIPS
- Malinowski, M.¹ Fritz, M.²

27
- 84951975735
- Towards a visual turing challenge
- M. Malinowski and M. Fritz. Towards a visual Turing challenge. In NIPS 2014 Workshop on Learning Semantics, 2014.
- (2014) NIPS 2014 Workshop on Learning Semantics
- Malinowski, M.¹ Fritz, M.²

28
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
- (2015) ICCV
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

29
- 85083951332
- T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. In arXiv 1301.3781, 2013.
- (2013) Efficient Estimation of Word Representations in Vector Space
- Mikolov, T.¹ Chen, K.² Corrado, G.³ Dean, J.⁴

30
- 0011591628
- Henry Holt, New York
- O. Pfungst. Clever Hans (The horse of Mr. von Osten): A contribution to experimental animal and human psychology. Henry Holt, New York, 1911.
- (1911) Clever Hans (The Horse of Mr. von Osten): A Contribution to Experimental Animal and Human Psychology
- Pfungst, O.¹

31
- 85072826753
- Question relevance in vqa: Identifying non-visual and falsepremise questions
- A. Ray, G. Christie, M. Bansal, D. Batra, and D. Parikh. Question relevance in vqa: Identifying non-visual and falsepremise questions. In EMNLP, 2016.
- (2016) EMNLP
- Ray, A.¹ Christie, G.² Bansal, M.³ Batra, D.⁴ Parikh, D.⁵

32
- 84965170394
- Exploring models and data for image question answering
- M. Ren, R. Kiros, and R. Zemel. Exploring models and data for image question answering. In NIPS, 2015.
- (2015) NIPS
- Ren, M.¹ Kiros, R.² Zemel, R.³

33
- 84986327457
- Where to look: Focus regions for visual question answering
- K. Shih, S. Singh, and D. Hoiem. Where to look: Focus regions for visual question answering. In CVPR, 2016.
- (2016) CVPR
- Shih, K.¹ Singh, S.² Hoiem, D.³

34
- 84904163933
- Dropout: A simple way to prevent neural networks from overfitting
- N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. JMLR, 15(1):1929-1958, 2014.
- (2014) JMLR , vol.15 , Issue.1 , pp. 1929-1958
- Srivastava, N.¹ Hinton, G.E.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.⁵

35
- 84907449171
- A simple method to determine if a music information retrieval system is a horse
- B. Sturm. A simple method to determine if a music information retrieval system is a horse. IEEE Transactions on Multimedia, 16(6):1636-1644, 2014.
- (2014) IEEE Transactions on Multimedia , vol.16 , Issue.6 , pp. 1636-1644
- Sturm, B.¹

36
- 85044256987
- HORSE2016
- B. Sturm. Horse taxonomy and taxidermy. HORSE2016, 2016.
- (2016) Horse Taxonomy and Taxidermy
- Sturm, B.¹

37
- 84986296727
- Movieqa: Understanding stories in movies through question-answering
- M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler. Movieqa: Understanding stories in movies through question-answering. In CVPR, 2016.
- (2016) CVPR
- Tapaswi, M.¹ Zhu, Y.² Stiefelhagen, R.³ Torralba, A.⁴ Urtasun, R.⁵ Fidler, S.⁶

38
- 85083951707
- Towards aicomplete question answering: A set of prerequisite toy tasks
- J. Weston, A. Bordes, S. Chopra, A. Rush, B. van Merriënboer, A. Joulin, and T. Mikolov. Towards aicomplete question answering: A set of prerequisite toy tasks. In ICLR, 2016.
- (2016) ICLR
- Weston, J.¹ Bordes, A.² Chopra, S.³ Rush, A.⁴ Van Merriënboer, B.⁵ Joulin, A.⁶ Mikolov, T.⁷

39
- 85083951616
- Memory networks
- J. Weston, S. Chopra, and A. Bordes. Memory networks. In ICLR, 2015.
- (2015) ICLR
- Weston, J.¹ Chopra, S.² Bordes, A.³

40
- 0004057837
- Academic Press
- T. Winograd. Understanding Natural Language. Academic Press, 1972.
- (1972) Understanding Natural Language
- Winograd, T.¹

41
- 84990062072
- Q. Wu, C. Shen, A. van den Hengel, P. Wang, and A. Dick. Image captioning and visual question answering based on attributes and their related external knowledge. In arXiv 1603.02814, 2016.
- (2016) Image Captioning and Visual Question Answering Based on Attributes and their Related External Knowledge
- Wu, Q.¹ Shen, C.² Van Den Hengel, A.³ Wang, P.⁴ Dick, A.⁵

42
- 84999008900
- Dynamic memory networks for visual and textual question answering
- C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. ICML, 2016.
- (2016) ICML
- Xiong, C.¹ Merity, S.² Socher, R.³

43
- 85035008367
- Ask, attend, and answer: Exploring question-guided spatial attention for visual question answering
- H. Xu and K. Saenko. Ask, attend, and answer: Exploring question-guided spatial attention for visual question answering. In ECCV, 2016.
- (2016) ECCV
- Xu, H.¹ Saenko, K.²

44
- 84986334021
- Stacked attention networks for image question answering
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016.
- (2016) CVPR
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

45
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In TACL, pages 67-78, 2014.
- (2014) TACL , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

46
- 84959862697
- Visual madlibs: Fill in the blank image generation and question answering
- L. Yu, E. Park, A. Berg, and T. Berg. Visual madlibs: Fill in the blank image generation and question answering. In ICCV, 2015.
- (2015) ICCV
- Yu, L.¹ Park, E.² Berg, A.³ Berg, T.⁴

47
- 84986278354
- Yin and yang: Balancing and answering binary visual questions
- P. Zhang, Y. Goyal, D. Summers-Stay, D. Batra, and D. Parikh. Yin and yang: Balancing and answering binary visual questions. In CVPR, 2016.
- (2016) CVPR
- Zhang, P.¹ Goyal, Y.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

48
- 84986301525
- B. Zhou, Y. Tian, S. Sukhbataar, A. Szlam, and R. Fergus. Simple baseline for visual question answering. In arXiv:1512.02167, 2015.
- (2015) Simple Baseline for Visual Question Answering
- Zhou, B.¹ Tian, Y.² Sukhbataar, S.³ Szlam, A.⁴ Fergus, R.⁵

49
- 84986275767
- Visual7w: Grounded question answering in images
- Y. Zhu, O. Groth, M. Bernstein, and L. Fei-Fei. Visual7w: Grounded question answering in images. In CVPR, 2016.
- (2016) CVPR
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

50
- 84887338442
- Bringing semantics into focus using visual abstraction
- C. Zitnick and D. Parikh. Bringing semantics into focus using visual abstraction. In CVPR, 2013.
- (2013) CVPR
- Zitnick, C.¹ Parikh, D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.