SCOPUS 정보 검색 플랫폼

6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings

Volumn , Issue , 2018, Pages

Interpretable counting for visual question answering

(3) Trott, Alexander a Xiong, Caiming a Socher, Richard a

a Salesforcecom Inc (United States)

Author keywords

[No Author keywords available]

Indexed keywords

DISCRETE CHOICE; QUESTION ANSWERING; SEQUENTIAL DECISION PROCESS; STATE OF THE ART;

OBJECT DETECTION;

EID: 85083952592 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (67)

References (48)

1
- 84973890960
- VQA: Visual question answering
- Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchell, C. Lawrence Zitnick, Devi Parikh, and Dhruv Batra. VQA: Visual Question Answering. International Journal of Computer Vision, 2015.
- (2015) International Journal of Computer Vision
- Agrawal, A.¹ Lu, J.² Antol, S.³ Mitchell, M.⁴ Zitnick, C.L.⁵ Parikh, D.⁶ Batra, D.⁷

2
- 85040308578
- arXiv
- Peter Anderson, Xiaodong He, Chris Buehler, Damien Teney, Mark Johnson, Stephen Gould, and Lei Zhang. Bottom-Up and Top-Down Attention for Image Captioning and VQA. arXiv, 2017.
- (2017) Bottom-Up and Top-Down Attention for Image Captioning and VQA
- Anderson, P.¹ He, X.² Buehler, C.³ Teney, D.⁴ Johnson, M.⁵ Gould, S.⁶ Zhang, L.⁷

3
- 84986272553
- Neural module networks
- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Neural module networks. In CVPR, 2016a.
- (2016) CVPR
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

4
- 84993660571
- Learning to compose neural networks for question answering
- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Learning to Compose Neural Networks for Question Answering. In NAACL, 2016b.
- (2016) NAACL
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

5
- 85050692557
- arXiv
- Arjun Chandrasekaran, Deshraj Yadav, Prithvijit Chattopadhyay, Viraj Prabhu, and Devi Parikh. It Takes Two to Tango: Towards Theory of AI’s Mind. arXiv, 2017.
- (2017) It Takes Two to Tango: Towards Theory of AI’S Mind
- Chandrasekaran, A.¹ Yadav, D.² Chattopadhyay, P.³ Prabhu, V.⁴ Parikh, D.⁵

6
- 85043790150
- Counting everyday objects in everyday scenes
- Prithvijit Chattopadhyay, Ramakrishna Vedantam, Ramprasaath R. Selvaraju, Dhruv Batra, and Devi Parikh. Counting Everyday Objects in Everyday Scenes. In CVPR, 2017.
- (2017) CVPR
- Chattopadhyay, P.¹ Vedantam, R.² Selvaraju, R.R.³ Batra, D.⁴ Parikh, D.⁵

7
- 85018938177
- R-FCN: Object Detection via Region-based Fully Convolutional Networks
- Jifeng Dai, Yi Li, Kaiming He, and Jian Sun. R-FCN: Object Detection via Region-based Fully Convolutional Networks. In NIPS, 2016.
- (2016) NIPS
- Dai, J.¹ Li, Y.² He, K.³ Sun, J.⁴

8
- 85044506279
- Multimodal compact bilinear pooling for visual question answering and visual grounding
- Akira Fukui, Dong Huk Park, Daylen Yang, Anna Rohrbach, Trevor Darrell, and Marcus Rohrbach. Multimodal compact bilinear pooling for visual question answering and visual grounding. In EMNLP, 2016.
- (2016) EMNLP
- Fukui, A.¹ Park, D.H.² Yang, D.³ Rohrbach, A.⁴ Darrell, T.⁵ Rohrbach, M.⁶

9
- 85029359197
- Fast R-CNn
- Ross Girshick. Fast R-CNN. In ICCV, 2015.
- (2015) ICCV
- Girshick, R.¹

10
- 84906343066
- Rich feature hierarchies for accurate object detection and semantic segmentation
- Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2015.
- (2015) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

11
- 85041900002
- Making the V in VQA matter: Elevating the role of image understanding in visual question answering
- Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering. In CVPR, 2017.
- (2017) CVPR
- Goyal, Y.¹ Khot, T.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

12
- 0031573117
- Long short-term memory
- Sepp Hochreiter and Jürgen Schmidhuber. Long Short-Term Memory. Neural Computation, 9(8): 1735–1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

13
- 85041904328
- Learning to reason: End-to-end module networks for visual question answering
- Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. Learning to Reason: End-to-End Module Networks for Visual Question Answering. In ICCV, 2017.
- (2017) ICCV
- Hu, R.¹ Andreas, J.² Rohrbach, M.³ Darrell, T.⁴ Saenko, K.⁵

14
- 85018925213
- arXiv
- Ilija Ilievski, Shuicheng Yan, and Jiashi Feng. A Focused Dynamic Attention Model for Visual Question Answering. arXiv, 2016.
- (2016) A Focused Dynamic Attention Model for Visual Question Answering
- Ilievski, I.¹ Yan, S.² Feng, J.³

15
- 85087529518
- Hadamard Product for Low-rank Bilinear Pooling
- Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, and Byoung-Tak Zhang. Hadamard Product for Low-rank Bilinear Pooling. In ICLR, 2017.
- (2017) ICLR
- Kim, J.-H.¹ On, K.-W.² Lim, W.³ Kim, J.⁴ Ha, J.-W.⁵ Zhang, B.-T.⁶

16
- 84941620184
- arXiv
- Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. arXiv, 2014.
- (2014) Adam: A Method for Stochastic Optimization
- Kingma, D.P.¹ Ba, J.²

17
- 84990070438
- Visual genome: Connecting language and vision using crowdsourced dense image annotations
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A. Shamma, Michael S. Bernstein, and Fei-Fei Li. Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. International Journal of Computer Vision, 2016.
- (2016) International Journal of Computer Vision
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰ Bernstein, M.S.¹¹ Li, F.-F.¹²

18
- 85162384490
- Learning to count objects in images
- Victor Lempitsky and Andrew Zisserman. Learning To Count Objects in Images. NIPS, 2010.
- (2010) NIPS
- Lempitsky, V.¹ Zisserman, A.²

19
- 84937834115
- Microsoft COCO: Common objects in context
- Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollár. Microsoft COCO: Common Objects in Context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Bourdev, L.⁴ Girshick, R.⁵ Hays, J.⁶ Perona, P.⁷ Ramanan, D.⁸ Zitnick, C.L.⁹ Dollár, P.¹⁰

20
- 85041916350
- Focal loss for dense object detection
- Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. Focal Loss for Dense Object Detection. ICCV, 2017.
- (2017) ICCV
- Lin, T.-Y.¹ Goyal, P.² Girshick, R.³ He, K.⁴ Dollár, P.⁵

21
- 85031922514
- Knowing when to look: Adaptive attention via a visual sentinel for image captioning
- Jiasen Lu, Caiming Xiong, Devi Parikh, and Richard Socher. Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning. In CVPR, 2016a.
- (2016) CVPR
- Lu, J.¹ Xiong, C.² Parikh, D.³ Socher, R.⁴

22
- 85018917850
- Hierarchical question-image co-attention for visual question answering
- Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. Hierarchical Question-Image Co-Attention for Visual Question Answering. In NIPS, 2016b.
- (2016) NIPS
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

23
- 85023777762
- Learning online alignments with continuous rewards policy gradient
- Yuping Luo, Chung-cheng Chiu, Navdeep Jaitly, and Ilya Sutskever. Learning Online Alignments with Continuous Rewards Policy Gradient. In ICASSP, 2017.
- (2017) ICASSP
- Luo, Y.¹ Chiu, C.-C.² Jaitly, N.³ Sutskever, I.⁴

24
- 84937822746
- A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input
- Mateusz Malinowski and Mario Fritz. A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input. In NIPS, 2014.
- (2014) NIPS
- Malinowski, M.¹ Fritz, M.²

25
- 84999036937
- Asynchronous methods for deep reinforcement learning
- Volodymyr Minh, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy Lillicrap, David Silver, and Koray Kavukcuoglu. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016.
- (2016) ICML
- Minh, V.¹ Badia, A.P.² Mirza, M.³ Graves, A.⁴ Harley, T.⁵ Lillicrap, T.⁶ Silver, D.⁷ Kavukcuoglu, K.⁸

26
- 85021624882
- Towards perspective-free object counting with deep learning
- Daniel Oñoro-Rubio and Roberto J. López-Sastre. Towards perspective-free object counting with deep learning. In ECCV, 2016.
- (2016) ECCV
- Oñoro-Rubio, D.¹ López-Sastre, R.J.²

27
- 85021677756
- arXiv
- Dong Huk Park, Lisa Anne Hendricks, Zeynep Akata, Bernt Schiele, Trevor Darrell, and Marcus Rohrbach. Attentive Explanations: Justifying Decisions and Pointing to the Evidence. arXiv, 2016.
- (2016) Attentive Explanations: Justifying Decisions and Pointing to the Evidence
- Park, D.H.¹ Hendricks, L.A.² Akata, Z.³ Schiele, B.⁴ Darrell, T.⁵ Rohrbach, M.⁶

28
- 85041099521
- arXiv
- Romain Paulus, Caiming Xiong, and Richard Socher. A Deep Reinforced Model for Abstractive Summarization. arXiv, 2017.
- (2017) A Deep Reinforced Model for Abstractive Summarization
- Paulus, R.¹ Xiong, C.² Socher, R.³

29
- 84961289992
- Glove: Global vectors for word representation
- Jeffrey Pennington, Richard Socher, and Christopher Manning. Glove: Global Vectors for Word Representation. EMNLP, 2014.
- (2014) EMNLP
- Pennington, J.¹ Socher, R.² Manning, C.³

30
- 85044274041
- End-to-end instance segmentation with recurrent attention
- Mengye Ren and Richard S. Zemel. End-to-End Instance Segmentation with Recurrent Attention. In CVPR, 2017.
- (2017) CVPR
- Ren, M.¹ Zemel, R.S.²

31
- 84965170394
- Exploring models and data for image question answering
- Mengye Ren, Ryan Kiros, and Richard Zemel. Exploring Models and Data for Image Question Answering. In NIPS, 2015a.
- (2015) NIPS
- Ren, M.¹ Kiros, R.² Zemel, R.³

32
- 84960980241
- Faster R-CNN: Towards real-time object detection with region proposal networks
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS, 2015b.
- (2015) NIPS
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

33
- 85041911392
- Self-critical Sequence Training for Image Captioning
- Steven J. Rennie, Etienne Marcheret, Youssef Mroueh, Jarret Ross, and Vaibhava Goel. Self-critical Sequence Training for Image Captioning. In CVPR, 2017.
- (2017) CVPR
- Rennie, S.J.¹ Marcheret, E.² Mroueh, Y.³ Ross, J.⁴ Goel, V.⁵

34
- 84951956461
- Learning to count with deep object features
- Santi Segui, Oriol Pujol, and Jordi Vitria. Learning to count with deep object features. In CVPRW, 2015.
- (2015) CVPRW
- Segui, S.¹ Pujol, O.² Vitria, J.³

35
- 85047741987
- Where to look: Focus regions for visual question answering
- Kevin J. Shih, Saurabh Singh, and Derek Hoiem. Where To Look: Focus Regions for Visual Question Answering. In CVPR, 2015.
- (2015) CVPR
- Shih, K.J.¹ Singh, S.² Hoiem, D.³

36
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- Richard Socher, Andrej Karpathy, Quoc V Le, Christopher D Manning, and Andrew Y Ng. Grounded Compositional Semantics for Finding and Describing Images with Sentences. In TACL, 2014.
- (2014) TACL
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

37
- 85044342917
- arXiv
- Damien Teney, Lingqiao Liu, and Anton van den Hengel. Graph-Structured Representations for Visual Question Answering. arXiv, 2016.
- (2016) Graph-Structured Representations for Visual Question Answering
- Teney, D.¹ Liu, L.² Van den Hengel, A.³

38
- 85071147167
- Tips and tricks for visual question answering: Learnings from the 2017 challenge
- Damien Teney, Peter Anderson, Xiaodong He, and Anton van den Hengel. Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge. In CVPR, 2017.
- (2017) CVPR
- Teney, D.¹ Anderson, P.² He, X.³ Van den Hengel, A.⁴

39
- 85018873682
- Conditional image generation with PixelCNN decoders
- Aaron van den Oord, Nal Kalchbrenner, Oriol Vinyals, Lasse Espeholt, Alex Graves, and Koray Kavukcuoglu. Conditional Image Generation with PixelCNN Decoders. In NIPS, 2016.
- (2016) NIPS
- Van den Oord, A.¹ Kalchbrenner, N.² Vinyals, O.³ Espeholt, L.⁴ Graves, A.⁵ Kavukcuoglu, K.⁶

40
- 0000337576
- Simple statistical gradient-following methods for connectionist reinforcement learning
- R J Williams. Simple statistical gradient-following methods for connectionist reinforcement learning. Machine Learning, 8:229–256, 1992.
- (1992) Machine Learning , vol.8 , pp. 229-256
- Williams, R.J.¹

41
- 0041154467
- Function Optimization using Connectionist Reinforcement Learning Algorithms
- Ronald J. Williams and Jing Peng. Function Optimization using Connectionist Reinforcement Learning Algorithms. Connection Science, 3(3):241–268, 1991.
- (1991) Connection Science , vol.3 , Issue.3 , pp. 241-268
- Williams, R.J.¹ Peng, J.²

42
- 84999008900
- Dynamic memory networks for visual and textual question answering
- Caiming Xiong, Stephen Merity, and Richard Socher. Dynamic Memory Networks for Visual and Textual Question Answering. In ICML, 2016.
- (2016) ICML
- Xiong, C.¹ Merity, S.² Socher, R.³

43
- 85035008367
- Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
- Huijuan Xu and Kate Saenko. Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering. In ECCV, 2015.
- (2015) ECCV
- Xu, H.¹ Saenko, K.²

44
- 85067831524
- Stacked attention networks for image question answering
- Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked Attention Networks for Image Question Answering. In CVPR, 2015.
- (2015) CVPR
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

45
- 84959214343
- Cross-scene crowd counting via deep convolutional neural networks
- Cong Zhang, Hongsheng Li, Xiaogang Wang, and Xiaokang Yang. Cross-scene crowd counting via deep convolutional neural networks. In CVPR, 2015.
- (2015) CVPR
- Zhang, C.¹ Li, H.² Wang, X.³ Yang, X.⁴

46
- 85017461468
- Salient object subitizing
- Jianming Zhang, Shugao Ma, Mehrnoosh Sameki, Stan Sclaroff, Margrit Betke, Zhe Lin, Xiao-hui Shen, Brian Price, and Radomír Měch. Salient Object Subitizing. International Journal of Computer Vision, 2017.
- (2017) International Journal of Computer Vision
- Zhang, J.¹ Ma, S.² Sameki, M.³ Sclaroff, S.⁴ Betke, M.⁵ Lin, Z.⁶ Shen, X.-H.⁷ Price, B.⁸ Měch, R.⁹

47
- 84986301525
- arXiv
- Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. Simple Baseline for Visual Question Answering. arXiv, 2015.
- (2015) Simple Baseline for Visual Question Answering
- Zhou, B.¹ Tian, Y.² Sukhbaatar, S.³ Szlam, A.⁴ Fergus, R.⁵

48
- 84990052104
- Visual7W: Grounded question answering in images
- Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. Visual7W: Grounded Question Answering in Images. In CVPR, 2015.
- (2015) CVPR
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.