SCOPUS 정보 검색 플랫폼

Advances in Neural Information Processing Systems

Volumn , Issue , 2016, Pages 289-297

Hierarchical question-image co-attention for visual question answering

(4) Lu, Jiasen a Yang, Jianwei a Batra, Dhruv a,b Parikh, Devi a,b

a VIRGINIA POLYTECHNIC INSTITUTE AND STATE UNIVERSITY (United States)

b GEORGIA INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ATTENTION MECHANISMS; ATTENTION MODEL; CONVOLUTION NEURAL NETWORK; IMAGE REGIONS; QUESTION ANSWERING; SPATIAL MAPS; STATE OF THE ART; VISUAL ATTENTION;

BEHAVIORAL RESEARCH;

EID: 85018917850 PISSN: 10495258 EISSN: None Source Type: Conference Proceeding
DOI: None Document Type: Conference Paper

Times cited : (1752)

References (27)

1
- 84985013144
- Deep compositional question answering with neural module networks
- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. Deep compositional question answering with neural module networks. In CVPR, 2016.
- (2016) CVPR
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

2
- 84973890960
- Vqa: Visual question answering
- Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. Vqa: Visual question answering. In ICCV, 2015.
- (2015) ICCV
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Lawrence Zitnick, C.⁶ Parikh, D.⁷

3
- 85083953689
- Neural Machine translation by jointly learning to align and translate
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015.
- (2015) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

4
- 84888340666
- Torch7: A matlab-like environment for Machine learning
- R. Collobert, K. Kavukcuoglu, and C. Farabet. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop, 2011.
- (2011) BigLearn, NIPS Workshop
- Collobert, R.¹ Kavukcuoglu, K.² Farabet, C.³

5
- 84990044140
- arXiv preprint arXiv:1606.03556
- Abhishek Das, Harsh Agrawal, C Lawrence Zitnick, Devi Parikh, and Dhruv Batra. Human attention in visual question answering: Do humans and deep networks look at the same regions? arXiv preprint arXiv:1606.03556, 2016.
- (2016) Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
- Das, A.¹ Agrawal, H.² Lawrence Zitnick, C.³ Parikh, D.⁴ Batra, D.⁵

6
- 84965148420
- Are you talking to a Machine? Dataset and methods for multilingual image question answering
- Haoyuan Gao, Junhua Mao, Jie Zhou, Zhiheng Huang, Lei Wang, and Wei Xu. Are you talking to a machine? dataset and methods for multilingual image question answering. In NIPS, 2015.
- (2015) NIPS
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

7
- 84986274465
- Deep residual learning for image recognition
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

8
- 84965139942
- Teaching Machines to read and comprehend
- Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman, and Phil Blunsom. Teaching machines to read and comprehend. In NIPS, 2015.
- (2015) NIPS
- Hermann, K.M.¹ Kocisky, T.² Grefenstette, E.³ Espeholt, L.⁴ Kay, W.⁵ Suleyman, M.⁶ Blunsom, P.⁷

9
- 84937936034
- Convolutional neural network architectures for matching natural language sentences
- Baotian Hu, Zhengdong Lu, Hang Li, and Qingcai Chen. Convolutional neural network architectures for matching natural language sentences. In NIPS, 2014.
- (2014) NIPS
- Hu, B.¹ Lu, Z.² Li, H.³ Chen, Q.⁴

10
- 85018925213
- arXiv:1604.01485
- Ilija Ilievski, Shuicheng Yan, and Jiashi Feng. A focused dynamic attention model for visual question answering. arXiv:1604.01485, 2016.
- (2016) A Focused Dynamic Attention Model for Visual Question Answering
- Ilievski, I.¹ Yan, S.² Feng, J.³

11
- 84978730111
- arXiv preprint arXiv:1602.07332
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332, 2016.
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, J.-L.⁹ Shamma, D.A.¹⁰

12
- 84937834115
- Microsoft coco: Common objects in context
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Tsung, Y.-L.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Lawrence Zitnick, C.⁸

13
- 85007153677
- Learning to answer questions from image using convolutional neural network
- Lin Ma, Zhengdong Lu, and Hang Li. Learning to answer questions from image using convolutional neural network. In AAAI, 2016.
- (2016) AAAI
- Ma, L.¹ Lu, Z.² Li, H.³

14
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
- (2015) ICCV
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

15
- 84965170394
- Exploring models and data for image question answering
- Mengye Ren, Ryan Kiros, and Richard Zemel. Exploring models and data for image question answering. In NIPS, 2015.
- (2015) NIPS
- Ren, M.¹ Kiros, R.² Zemel, R.³

16
- 85083950860
- Reasoning about entailment with neural attention
- Tim Rocktäschel, Edward Grefenstette, Karl Moritz Hermann, Tomáš Kočiskỳ, and Phil Blunsom. Reasoning about entailment with neural attention. In ICLR, 2016.
- (2016) ICLR
- Rocktäschel, T.¹ Grefenstette, E.² Hermann, K.M.³ Kočiskỳ, T.⁴ Blunsom, P.⁵

17
- 85011921848
- arXiv preprint arXiv:1602.03609
- Cicero dos Santos, Ming Tan, Bing Xiang, and Bowen Zhou. Attentive pooling networks. arXiv preprint arXiv:1602.03609, 2016.
- (2016) Attentive Pooling Networks
- Dos Santos, C.¹ Tan, M.² Xiang, B.³ Zhou, B.⁴

18
- 84986327457
- Where to look: Focus regions for visual question answering
- Kevin J Shih, Saurabh Singh, and Derek Hoiem. Where to look: Focus regions for visual question answering. In CVPR, 2016.
- (2016) CVPR
- Shih, K.J.¹ Singh, S.² Hoiem, D.³

19
- 84933585162
- Very deep convolutional networks for large-scale image recognition
- Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
- (2014) CoRR
- Simonyan, K.¹ Zisserman, A.²

20
- 84937522268
- Going deeper with convolutions
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. Going deeper with convolutions. In CVPR, 2015.
- (2015) CVPR
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

21
- 84999008900
- Dynamic memory networks for visual and textual question answering
- Caiming Xiong, Stephen Merity, and Richard Socher. Dynamic memory networks for visual and textual question answering. In ICML, 2016.
- (2016) ICML
- Xiong, C.¹ Merity, S.² Socher, R.³

22
- 84990044633
- arXiv preprint arXiv:1511.05234
- Huijuan Xu and Kate Saenko. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. arXiv preprint arXiv:1511.05234, 2015.
- (2015) Ask, Attend and Answer: Exploring Question-guided Spatial Attention for Visual Question Answering
- Xu, H.¹ Saenko, K.²

23
- 84986334021
- Stacked attention networks for image question answering
- Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. Stacked attention networks for image question answering. In CVPR, 2016.
- (2016) CVPR
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

24
- 85015342918
- Abcnn: Attention-based convolutional neural network for modeling sentence pairs
- Wenpeng Yin, Hinrich Schütze, Bing Xiang, and Bowen Zhou. Abcnn: Attention-based convolutional neural network for modeling sentence pairs. In ACL, 2016.
- (2016) ACL
- Yin, W.¹ Schütze, H.² Xiang, B.³ Zhou, B.⁴

25
- 84990069011
- arXiv preprint arXiv:1511.05099
- Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. Yin and yang: Balancing and answering binary visual questions. arXiv preprint arXiv:1511.05099, 2015.
- (2015) Yin and Yang: Balancing and Answering Binary Visual Questions
- Zhang, P.¹ Goyal, Y.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

26
- 84986275767
- Visual7w: Grounded question answering in images
- Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. Visual7w: Grounded question answering in images. In CVPR, 2016.
- (2016) CVPR
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

27
- 85018934522
- Measuring Machine intelligence through visual question answering
- C Lawrence Zitnick, Aishwarya Agrawal, Stanislaw Antol, Margaret Mitchell, Dhruv Batra, and Devi Parikh. Measuring machine intelligence through visual question answering. AI Magazine, 37(1), 2016.
- (2016) AI Magazine , vol.37 , Issue.1
- Lawrence Zitnick, C.¹ Agrawal, A.² Antol, S.³ Mitchell, M.⁴ Batra, D.⁵ Parikh, D.⁶

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.