SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2017-October, Issue , 2017, Pages 3008-3017

Inferring and Executing Programs for Visual Reasoning

(7) Johnson, Justin a Hariharan, Bharath b Maaten, Laurens Van Der b Hoffman, Judy a Fei Fei, Li a Zitnick, C Lawrence b Girshick, Ross b

a STANFORD UNIVERSITY (United States)

b FACEBOOK AI RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ENGINES;

BLACK BOXES; BLACK-BOX MODEL; EXECUTION ENGINE; EXPLICIT REPRESENTATION; MODULE NETWORKS; REASONING PROCESS; VISUAL REASONING;

COMPUTER VISION;

EID: 85041924656 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2017.325 Document Type: Conference Paper

Times cited : (603)

References (50)

1
- 84993660571
- Learning to compose neural networks for question answering
- J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Learning to compose neural networks for question answering. In NAACL, 2016.
- (2016) NAACL
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

2
- 84986272553
- Neural module networks
- J. Andreas, M. Rohrbach, T. Darrell, and D. Klein. Neural module networks. In CVPR, 2016.
- (2016) CVPR
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

3
- 84973890960
- VQA: Visual question answering
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Zitnick, and D. Parikh. VQA: Visual question answering. In ICCV, 2015.
- (2015) ICCV
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.⁶ Parikh, D.⁷

4
- 85088225001
- Deepcoder: Learning to write programs
- M. Balog, A. Gaunt, M. Brockschmidt, S. Nowozin, and D. Tarlow. Deepcoder: Learning to write programs. In ICLR, 2017.
- (2017) ICLR
- Balog, M.¹ Gaunt, A.² Brockschmidt, M.³ Nowozin, S.⁴ Tarlow, D.⁵

5
- 85063566337
- Making neural programming architectures generalize via recursion
- J. Cai, R. Shin, and D. Song. Making neural programming architectures generalize via recursion. In ICLR, 2017.
- (2017) ICLR
- Cai, J.¹ Shin, R.² Song, D.³

6
- 85041927710
- Visual dialog
- A. Das, S. Kottur, K. Gupta, A. Singh, D. Yadav, J. Moura, D. Parikh, and D. Batra. Visual dialog. In CVPR, 2017.
- (2017) CVPR
- Das, A.¹ Kottur, S.² Gupta, K.³ Singh, A.⁴ Yadav, D.⁵ Moura, J.⁶ Parikh, D.⁷ Batra, D.⁸

7
- 84965102873
- arXiv preprint arXiv:1505. 04467
- J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505. 04467, 2015.
- (2015) Exploring Nearest Neighbor Approaches for Image Captioning
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Zitnick, C.L.⁵

8
- 84990060711
- arXiv:1606. 01847
- A. Fukui, D. H. Park, D. Yang, A. Rohrbach, T. Darrell, and M. Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In arXiv:1606. 01847, 2016.
- (2016) Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding
- Fukui, A.¹ Park, D.H.² Yang, D.³ Rohrbach, A.⁴ Darrell, T.⁵ Rohrbach, M.⁶

9
- 84986266770
- Compact bilinear pooling
- Y. Gao, O. Beijbom, N. Zhang, and T. Darrell. Compact bilinear pooling. In CVPR, 2016.
- (2016) CVPR
- Gao, Y.¹ Beijbom, O.² Zhang, N.³ Darrell, T.⁴

10
- 85029359197
- Fast R-CNN
- R. Girshick. Fast R-CNN. In ICCV, 2015.
- (2015) ICCV
- Girshick, R.¹

11
- 85041900002
- Making the v in VQA matter: Elevating the role of image understanding in visual question answering
- Y. Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh. Making the V in VQA matter: Elevating the role of image understanding in visual question answering. In CVPR, 2017.
- (2017) CVPR
- Goyal, Y.¹ Khot, T.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

12
- 84930616355
- arXiv preprint arXiv:1410. 5401
- A. Graves, G. Wayne, and I. Danihelka. Neural turing machines. arXiv preprint arXiv:1410. 5401, 2014.
- (2014) Neural Turing Machines
- Graves, A.¹ Wayne, G.² Danihelka, I.³

13
- 84993949467
- Hybrid computing using a neural network with dynamic external memory
- A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwinska, S. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, A. Badia, K. Hermann, Y. Zwols, G. Ostrovski, A. Cain, H. King, C. Summerfield, P. Blunsom, K. Kavukcuoglu, and D. Hassabis. Hybrid computing using a neural network with dynamic external memory. Nature, 2016.
- (2016) Nature
- Graves, A.¹ Wayne, G.² Reynolds, M.³ Harley, T.⁴ Danihelka, I.⁵ Grabska-Barwinska, A.⁶ Colmenarejo, S.⁷ Grefenstette, E.⁸ Ramalho, T.⁹ Agapiou, J.¹⁰ Badia, A.¹¹ Hermann, K.¹² Zwols, Y.¹³ Ostrovski, G.¹⁴ Cain, A.¹⁵ King, H.¹⁶ Summerfield, C.¹⁷ Blunsom, P.¹⁸ Kavukcuoglu, K.¹⁹ Hassabis, D.²⁰ more..

14
- 84986274465
- Deep residual learning for image recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

15
- 0031573117
- Long short-term memory
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9 (8):1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

16
- 85041904328
- Learning to reason: End-to-end module networks for visual question answering
- R. Hu, J. Andreas, M. Rohrbach, T. Darrell, and K. Saenko. Learning to reason: End-to-end module networks for visual question answering. In ICCV, 2017.
- (2017) ICCV
- Hu, R.¹ Andreas, J.² Rohrbach, M.³ Darrell, T.⁴ Saenko, K.⁵

17
- 85041929043
- Modeling relationships in referential expressions with compositional modular networks
- R. Hu, M. Rohrbach, J. Andreas, T. Darrell, and K. Saenko. Modeling relationships in referential expressions with compositional modular networks. In CVPR, 2017.
- (2017) CVPR
- Hu, R.¹ Rohrbach, M.² Andreas, J.³ Darrell, T.⁴ Saenko, K.⁵

18
- 85041926703
- Revisiting visual question answering baselines
- A. Jabri, A. Joulin, and L. van der Maaten. Revisiting visual question answering baselines. In ECCV, 2016.
- (2016) ECCV
- Jabri, A.¹ Joulin, A.² Van der Maaten, L.³

19
- 85041904911
- CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning
- J. Johnson, B. Hariharan, L. van der Maaten, L. Fei-Fei, C. L. Zitnick, and R. Girshick. CLEVR: A diagnostic dataset for compositional language and elementary visual reasoning. In CVPR, 2017.
- (2017) CVPR
- Johnson, J.¹ Hariharan, B.² Maaten Der Van, L.³ Fei-Fei, L.⁴ Zitnick, C.L.⁵ Girshick, R.⁶

20
- 84965117324
- Inferring algorithmic patterns with stack-augmented recurrent nets
- A. Joulin and T. Mikolov. Inferring algorithmic patterns with stack-augmented recurrent nets. In NIPS, 2015.
- (2015) NIPS
- Joulin, A.¹ Mikolov, T.²

21
- 85040923687
- arXiv preprint arXiv:1610. 01465
- K. Kafle and C. Kanan. Visual question answering: Datasets, algorithms, and future challenges. In arXiv preprint arXiv:1610. 01465, 2016.
- (2016) Visual Question Answering: Datasets, Algorithms, and Future Challenges
- Kafle, K.¹ Kanan, C.²

22
- 85083953090
- Neural GPUs learn algorithms
- L. Kaiser and I. Sutskever. Neural GPUs learn algorithms. In ICLR, 2016.
- (2016) ICLR
- Kaiser, L.¹ Sutskever, I.²

23
- 85146417759
- Accurate unlexicalized parsing
- D. Klein and C. D. Manning. Accurate unlexicalized parsing. In ACL, pages 423-430, 2003.
- (2003) ACL , pp. 423-430
- Klein, D.¹ Manning, C.D.²

24
- 85011596790
- Visual genome: Connecting language and vision using crowdsourced dense image annotations
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. IJCV, 2017.
- (2017) IJCV
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

25
- 84876231242
- ImageNet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. In NIPS. 2012.
- (2012) NIPS.
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

26
- 85083953578
- Neural random-access machines
- K. Kurach, M. Andrychowicz, and I. Sutskever. Neural random-access machines. In ICLR, 2016.
- (2016) ICLR
- Kurach, K.¹ Andrychowicz, M.² Sutskever, I.³

27
- 85056690547
- Building machines that learn and think like people
- B. M. Lake, T. D. Ullman, J. B. Tenenbaum, and S. J. Gershman. Building machines that learn and think like people. Behavioral and Brain Sciences, 2016.
- (2016) Behavioral and Brain Sciences
- Lake, B.M.¹ Ullman, T.D.² Tenenbaum, J.B.³ Gershman, S.J.⁴

28
- 85040923863
- Neural symbolic machines: Learning semantic parsers on freebase with weak supervision
- C. Liang, J. Berant, Q. Le, K. D. Forbus, and N. Lao. Neural symbolic machines: Learning semantic parsers on freebase with weak supervision. ACL, 2017.
- (2017) ACL
- Liang, C.¹ Berant, J.² Le, Q.³ Forbus, K.D.⁴ Lao, N.⁵

29
- 84859072748
- Learning dependencybased compositional semantics
- P. Liang, M. I. Jordan, and D. Klein. Learning dependencybased compositional semantics. In ACL, 2011.
- (2011) ACL
- Liang, P.¹ Jordan, M.I.² Klein, D.³

30
- 85018917850
- Hierarchical question-image co-attention for visual question answering
- J. Lu, J. Yang, D. Batra, and D. Parikh. Hierarchical question-image co-attention for visual question answering. In NIPS, 2016.
- (2016) NIPS
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

31
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS, 2014.
- (2014) NIPS
- Malinowski, M.¹ Fritz, M.²

32
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
- (2015) ICCV
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

33
- 85041909637
- Learning models for actions and person-object interactions with transfer to question answering
- A. Mallya and S. Lazebnik. Learning models for actions and person-object interactions with transfer to question answering. In ECCV, 2016.
- (2016) ECCV
- Mallya, A.¹ Lazebnik, S.²

34
- 85083953004
- Neural programmer: Inducing latent programs with gradient descent
- A. Neelakantan, Q. V. Le, and I. Sutskever. Neural programmer: Inducing latent programs with gradient descent. In ICLR, 2016.
- (2016) ICLR
- Neelakantan, A.¹ Le, Q.V.² Sutskever, I.³

35
- 85083952850
- Neural programmer-interpreters
- S. Reed and N. De Freitas. Neural programmer-interpreters. In ICLR, 2016.
- (2016) ICLR
- Reed, S.¹ De Freitas, N.²

36
- 84947041871
- ImageNet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. ImageNet large scale visual recognition challenge. IJCV, 2015.
- (2015) IJCV
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰

37
- 84965143740
- End-toend memory networks
- S. Sukhbaatar, A. Szlam, J. Weston, and R. Fergus. End-toend memory networks. In NIPS, 2015.
- (2015) NIPS
- Sukhbaatar, S.¹ Szlam, A.² Weston, J.³ Fergus, R.⁴

38
- 84928547704
- Sequence to sequence learning with neural networks
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
- (2014) NIPS
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

39
- 84986296727
- Movieqa: Understanding stories in movies through question-answering
- M. Tapaswi, Y. Zhu, R. Stiefelhagen, A. Torralba, R. Urtasun, and S. Fidler. Movieqa: Understanding stories in movies through question-answering. In CVPR, 2016.
- (2016) CVPR
- Tapaswi, M.¹ Zhu, Y.² Stiefelhagen, R.³ Torralba, A.⁴ Urtasun, R.⁵ Fidler, S.⁶

40
- 85083951616
- Memory networks
- J. Weston, S. Chopra, and A. Bordes. Memory networks. In ICLR, 2015.
- (2015) ICLR
- Weston, J.¹ Chopra, S.² Bordes, A.³

41
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8 (23), 1992.
- (1992) Machine Learning , vol.8 , Issue.23
- Williams, R.J.¹

42
- 0004057837
- Academic Press
- T. Winograd. Understanding Natural Language. Academic Press, 1972.
- (1972) Understanding Natural Language
- Winograd, T.¹

43
- 85035003427
- arXiv preprint arXiv:1607. 05910
- Q. Wu, D. Teney, P. Wang, C. Shen, A. Dick, and A. van den Hengel. Visual question answering: A survey of methods and datasets. In arXiv preprint arXiv:1607. 05910, 2016.
- (2016) Visual Question Answering: A Survey of Methods and Datasets
- Wu, Q.¹ Teney, D.² Wang, P.³ Shen, C.⁴ Dick, A.⁵ Van den Hengel, A.⁶

44
- 84999008900
- Dynamic memory networks for visual and textual question answering
- C. Xiong, S. Merity, and R. Socher. Dynamic memory networks for visual and textual question answering. ICML, 2016.
- (2016) ICML
- Xiong, C.¹ Merity, S.² Socher, R.³

45
- 84986334021
- Stacked attention networks for image question answering
- Z. Yang, X. He, J. Gao, L. Deng, and A. Smola. Stacked attention networks for image question answering. In CVPR, 2016.
- (2016) CVPR
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

46
- 84997831765
- Learning simple algorithms from examples
- W. Zaremba, T. Mikolov, A. Joulin, and R. Fergus. Learning simple algorithms from examples. In ICML, 2016.
- (2016) ICML
- Zaremba, W.¹ Mikolov, T.² Joulin, A.³ Fergus, R.⁴

47
- 84958234084
- arXiv preprint arXiv:1410. 4615
- W. Zaremba and I. Sutskever. Learning to execute. arXiv preprint arXiv:1410. 4615, 2014.
- (2014) Learning to Execute
- Zaremba, W.¹ Sutskever, I.²

48
- 84943750581
- arXiv preprint arXiv:1505. 00521
- W. Zaremba and I. Sutskever. Reinforcement learning neural turing machines. arXiv preprint arXiv:1505. 00521, 2015.
- (2015) Reinforcement Learning Neural Turing Machines
- Zaremba, W.¹ Sutskever, I.²

49
- 84986278354
- Yin and yang: Balancing and answering binary visual questions
- P. Zhang, Y. Goyal, D. Summers-Stay, D. Batra, and D. Parikh. Yin and yang: Balancing and answering binary visual questions. In CVPR, 2016.
- (2016) CVPR
- Zhang, P.¹ Goyal, Y.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

50
- 84986275767
- Visual7W: Grounded question answering in images
- Y. Zhu, O. Groth, M. Bernstein, and L. Fei-Fei. Visual7W: Grounded question answering in images. In CVPR, 2016.
- (2016) CVPR
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.