SCOPUS 정보 검색 플랫폼

EMNLP 2016 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Volumn , Issue , 2016, Pages 457-468

Multimodal compact bilinear pooling for visual question answering and visual grounding

(6) Fukui, Akira a,b Park, Dong Huk a Yang, Daylen a Rohrbach, Anna a,c Darrell, Trevor a Rohrbach, Marcus a

a UNIVERSITY OF CALIFORNIA (United States)

b SONY CORPORATION (Japan)

c MAX PLANCK INSTITUTE FOR INFORMATICS (Germany)

Author keywords

[No Author keywords available]

Indexed keywords

LARGE DATASET; MODELING LANGUAGES; NATURAL LANGUAGE PROCESSING SYSTEMS;

HIGH DIMENSIONALITY; MULTIMODAL FEATURES; QUESTION ANSWERING; SPATIAL FEATURES; STATE OF THE ART; TEXTUAL REPRESENTATION; VECTOR REPRESENTATIONS; VISUAL INFORMATION;

VISUAL LANGUAGES;

EID: 85044506279 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.18653/v1/d16-1044 Document Type: Conference Paper

Times cited : (1048)

References (55)

1
- 84993660571
- Learning to compose neural networks for question answering
- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016a. Learning to compose neural networks for question answering. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL).
- (2016) Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics (NAACL)
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

2
- 84986272553
- Neural module networks
- Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Dan Klein. 2016b. Neural module networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

3
- 84973890960
- VQA: Visual question answering
- Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
- (2015) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Lawrence Zitnick, C.⁶ Parikh, D.⁷

4
- 84869158135
- Finding frequent items in data streams
- Springer
- Moses Charikar, Kevin Chen, and Martin Farach-Colton. 2002. Finding frequent items in data streams. In Automata, languages and programming, pages 693-703. Springer.
- (2002) Automata, Languages and Programming , pp. 693-703
- Charikar, M.¹ Chen, K.² Farach-Colton, M.³

5
- 85198028989
- ImageNet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2009) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

6
- 84904482223
- DeCAF: A deep convolutional activation feature for generic visual recognition
- Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2013. Decaf: A deep convolutional activation feature for generic visual recognition. In Proceedings of the International Conference on Machine Learning (ICML).
- (2013) Proceedings of the International Conference on Machine Learning (ICML)
- Donahue, J.¹ Jia, Y.² Vinyals, O.³ Hoffman, J.⁴ Zhang, N.⁵ Tzeng, E.⁶ Darrell, T.⁷

7
- 77649188328
- The segmented and annotated iapr tc-12 benchmark
- Hugo Jair Escalante, Carlos A Hernández, Jesus A Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F Morales, L Enrique Sucar, Luis Villaseñor, and Michael Grubinger. 2010. The segmented and annotated iapr tc-12 benchmark. Computer Vision and Image Understanding, 114(4):419-428.
- (2010) Computer Vision and Image Understanding , vol.114 , Issue.4 , pp. 419-428
- Escalante, H.J.¹ Hernández, C.A.² Gonzalez, J.A.³ López-López, A.⁴ Montes, M.⁵ Morales, E.F.⁶ Enrique Sucar, L.⁷ Villaseñor, L.⁸ Grubinger, M.⁹

8
- 84898958665
- Devise: A deep visual-semantic embedding model
- Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et al. 2013. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (NIPS).
- (2013) Advances in Neural Information Processing Systems (NIPS)
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Mikolov, T.⁶

9
- 84986266770
- Compact bilinear pooling
- Yang Gao, Oscar Beijbom, Ning Zhang, and Trevor Darrell. 2016. Compact bilinear pooling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Gao, Y.¹ Beijbom, O.² Zhang, N.³ Darrell, T.⁴

10
- 84986264311
- Fast R-CNn
- Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
- (2015) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Girshick, R.¹

11
- 84959243872
- Improving image-sentence embeddings using large weakly annotated photo collections
- Yunchao Gong, Liwei Wang, Micah Hodosh, Julia Hockenmaier, and Svetlana Lazebnik. 2014. Improving image-sentence embeddings using large weakly annotated photo collections. In Proceedings of the European Conference on Computer Vision (ECCV).
- (2014) Proceedings of the European Conference on Computer Vision (ECCV)
- Gong, Y.¹ Wang, L.² Hodosh, M.³ Hockenmaier, J.⁴ Lazebnik, S.⁵

12
- 38049183286
- The iapr tc-12 benchmark: A new evaluation resource for visual information systems
- Michael Grubinger, Paul Clough, Henning Müller, and Thomas Deselaers. 2006. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, volume 5, page 10.
- (2006) International Workshop OntoImage , vol.5 , pp. 10
- Grubinger, M.¹ Clough, P.² Müller, H.³ Deselaers, T.⁴

13
- 10044285992
- Canonical correlation analysis: An overview with application to learning methods
- David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639-2664.
- (2004) Neural Computation , vol.16 , Issue.12 , pp. 2639-2664
- Hardoon, D.R.¹ Szedmak, S.² Shawe-Taylor, J.³

14
- 84958589374
- Deep residual learning for image recognition
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

15
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- Peter Hodosh, Alice Young, Micah Lai, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In Transactions of the Association for Computational Linguistics (TACL).
- (2014) Transactions of the Association for Computational Linguistics (TACL)
- Hodosh, P.¹ Young, A.² Lai, M.³ Hockenmaier, J.⁴

16
- 85030448950
- Segmentation from natural language expressions
- Ronghang Hu, Marcus Rohrbach, and Trevor Darrell. 2016a. Segmentation from natural language expressions. In Proceedings of the European Conference on Computer Vision (ECCV).
- (2016) Proceedings of the European Conference on Computer Vision (ECCV)
- Hu, R.¹ Rohrbach, M.² Darrell, T.³

17
- 84986305787
- Natural language object retrieval
- Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016b. Natural language object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

18
- 85018925213
- Ilija Ilievski, Shuicheng Yan, and Jiashi Feng. 2016. A focused dynamic attention model for visual question answering. arXiv:1604.01485.
- (2016) A Focused Dynamic Attention Model for Visual Question Answering
- Ilievski, I.¹ Yan, S.² Feng, J.³

19
- 84969584486
- Batch normalization: Accelerating deep network training by reducing internal covariate shift
- Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning (ICML).
- (2015) Proceedings of the International Conference on Machine Learning (ICML)
- Ioffe, S.¹ Szegedy, C.²

20
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Karpathy, A.¹ Fei-Fei, L.²

21
- 84943540775
- Referit game: Referring to objects in photographs of natural scenes
- Sahar Kazemzadeh, Vicente Or-donez, Mark Matten, and Tamara L. Berg. 2014. Referit game: Referring to objects in photographs of natural scenes. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
- (2014) Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Kazemzadeh, S.¹ Or-Donez, V.² Matten, M.³ Berg, T.L.⁴

22
- 84941620184
- ADaM: A method for stochastic optimization
- Diederik Kingma and Jimmy Ba. 2014. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR).
- (2014) Proceedings of the International Conference on Learning Representations (ICLR)
- Kingma, D.¹ Ba, J.²

23
- 84919921461
- Multimodal neural language models
- Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. 2014. Multimodal neural language models. In Proceedings of the International Conference on Machine Learning (ICML), pages 595-603.
- (2014) Proceedings of the International Conference on Machine Learning (ICML) , pp. 595-603
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

24
- 84965153327
- Skip-thought vectors
- Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems (NIPS).
- (2015) Advances in Neural Information Processing Systems (NIPS)
- Kiros, R.¹ Zhu, Y.² Salakhutdinov, R.³ Zemel, R.S.⁴ Torralba, A.⁵ Urtasun, R.⁶ Fidler, S.⁷

25
- 84965125568
- Fisher vectors derived from hybrid Gaussian-laplacian mixture models for image annotation
- Benjamin Klein, Guy Lev, Gil Sadeh, and Lior Wolf. 2015. Fisher vectors derived from hybrid gaussian-laplacian mixture models for image annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Klein, B.¹ Lev, G.² Sadeh, G.³ Wolf, L.⁴

26
- 84978730111
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv:1602.07332.
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰ Bernstein, M.¹¹ Fei-Fei, L.¹²

27
- 84998698731
- Ask me Anything: Dynamic memory networks for natural language processing
- Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Ishaan Gulrajani, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the International Conference on Machine Learning (ICML).
- (2016) Proceedings of the International Conference on Machine Learning (ICML)
- Kumar, A.¹ Irsoy, O.² Su, J.³ Bradbury, J.⁴ English, R.⁵ Pierce, B.⁶ Ondruska, P.⁷ Gulrajani, I.⁸ Socher, R.⁹

28
- 84937834115
- Microsoft coco: Common objects in context
- Tsung-Yi Lin, Michael Maire, Serge Be-longie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV).
- (2014) Proceedings of the European Conference on Computer Vision (ECCV)
- Lin, T.-Y.¹ Maire, M.² Be-Longie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Lawrence Zitnick, C.⁸

29
- 84973863234
- Bilinear cnn models for fine-grained visual recognition
- Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. 2015. Bilinear cnn models for fine-grained visual recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
- (2015) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Lin, T.-Y.¹ RoyChowdhury, A.² Maji, S.³

30
- 84990020800
- Hierarchical co-attention for visual question answering
- Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2016. Hierarchical Co-Attention for Visual Question Answering. In Advances in Neural Information Processing Systems (NIPS).
- (2016) Advances in Neural Information Processing Systems (NIPS)
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

31
- 85030255039
- Mateusz Malinowski, Marcus Rohrbach, and Mario Fritz. 2016. Ask Your Neurons: A Deep Learning Approach to Visual Question Answering. arXiv: 1605.02697.
- (2016) Ask Your Neurons: A Deep Learning Approach to Visual Question Answering
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

32
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In Proceedings of the International Conference on Learning Representations (ICLR).
- (2015) Proceedings of the International Conference on Learning Representations (ICLR)
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

33
- 80053437179
- Multimodal deep learning
- Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 689-696.
- (2011) Proceedings of the International Conference on Machine Learning (ICML) , pp. 689-696
- Ngiam, J.¹ Khosla, A.² Kim, M.³ Nam, J.⁴ Lee, H.⁵ Ng, A.Y.⁶

34
- 85139592140
- Image question answering using convolutional neural network with dynamic parameter prediction
- Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han. 2015. Image question answering using convolutional neural network with dynamic parameter prediction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Noh, H.¹ Seo, P.H.² Han, B.³

35
- 84961289992
- Glove: Global vectors for word representation
- Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP).
- (2014) Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)
- Pennington, J.¹ Socher, R.² Manning, C.D.³

36
- 85023199520
- Fast and scalable polynomial kernels via explicit feature maps
- New York, NY, USA. ACM
- Ninh Pham and Rasmus Pagh. 2013. Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'13, pages 239-247, New York, NY, USA. ACM.
- (2013) Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'13 , pp. 239-247
- Pham, N.¹ Pagh, R.²

37
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
- Bryan Plummer, Liwei Wang, Chris Cervantes, Juan Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
- (2015) Proceedings of the IEEE International Conference on Computer Vision (ICCV)
- Plummer, B.¹ Wang, L.² Cervantes, C.³ Caicedo, J.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

38
- 84990043973
- Bryan Plummer, Liwei Wang, Chris Cervantes, Juan Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2016. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. arXiv:1505.04870v3.
- (2016) Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
- Plummer, B.¹ Wang, L.² Cervantes, C.³ Caicedo, J.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

39
- 84990024294
- Grounding of textual phrases in images by reconstruction
- Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele. 2016. Grounding of textual phrases in images by reconstruction. In Proceedings of the European Conference on Computer Vision (ECCV).
- (2016) Proceedings of the European Conference on Computer Vision (ECCV)
- Rohrbach, A.¹ Rohrbach, M.² Hu, R.³ Darrell, T.⁴ Schiele, B.⁵

40
- 84925410541
- Very deep convolutional networks for large-scale image recognition
- Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR).
- (2014) Proceedings of the International Conference on Learning Representations (ICLR)
- Simonyan, K.¹ Zisserman, A.²

41
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- Richard Socher, Andrej Karpathy, Quoc V Le, Christopher D Manning, and Andrew Y Ng. 2014. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics, 2:207-218.
- (2014) Transactions of the Association for Computational Linguistics , vol.2 , pp. 207-218
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

42
- 84928547704
- Sequence to sequence learning with neural networks
- Ilya Sutskever, Oriol Vinyals, and Quoc V. V Le. 2014. Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems (NIPS).
- (2014) Advances in Neural Information Processing Systems (NIPS)
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.V.³

43
- 0034202338
- Separating style and content with bilinear models
- Joshua B Tenenbaum and William T Freeman. 2000. Separating style and content with bilinear models. Neural computation, 12(6):1247-1283.
- (2000) Neural Computation , vol.12 , Issue.6 , pp. 1247-1283
- Tenenbaum, J.B.¹ Freeman, W.T.²

44
- 84949572890
- CoRR, abs/1503.01817
- Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. CoRR, abs/1503.01817.
- (2015) The New Data and New Challenges in Multimedia Research
- Thomee, B.¹ Shamma, D.A.² Friedland, G.³ Elizalde, B.⁴ Ni, K.⁵ Poland, D.⁶ Borth, D.⁷ Li, L.-J.⁸

45
- 84881160857
- Selective search for object recognition
- Jasper RR Uijlings, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2).
- (2013) International Journal of Computer Vision (IJCV) , vol.104 , Issue.2
- Uijlings, J.R.R.¹ Van De Sande, K.E.A.² Gevers, T.³ Smeulders, A.W.M.⁴

46
- 84986271102
- Learning deep structure-preserving image-text embeddings
- Liwei Wang, Yin Li, and Svetlana Lazebnik. 2016. Learning deep structure-preserving image-text embeddings. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Wang, L.¹ Li, Y.² Lazebnik, S.³

47
- 84867117593
- Wsabie: Scaling up to large vocabulary image annotation
- Jason Weston, Samy Bengio, and Nicolas Usunier. 2011. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
- (2011) Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI)
- Weston, J.¹ Bengio, S.² Usunier, N.³

48
- 84986320870
- Ask me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources
- Qi Wu, Peng Wang, Chunhua Shen, Anton van den Hengel, and Anthony Dick. 2016. Ask Me Anything: Free-form Visual Question Answering Based on Knowledge from External Sources. In Proc. IEEE Conf. Computer Vision Pattern Recognition.
- (2016) Proc. IEEE Conf. Computer Vision Pattern Recognition.
- Wu, Q.¹ Wang, P.² Shen, C.³ Van Den Hengel, A.⁴ Dick, A.⁵

49
- 84999008900
- Dynamic memory networks for visual and textual question answering
- Caiming Xiong, Stephen Merity, and Richard Socher. 2016. Dynamic memory networks for visual and textual question answering. In Proceedings of the International Conference on Machine Learning (ICML).
- (2016) Proceedings of the International Conference on Machine Learning (ICML)
- Xiong, C.¹ Merity, S.² Socher, R.³

50
- 85035008367
- Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
- Huijuan Xu and Kate Saenko. 2016. Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. In Proceedings of the European Conference on Computer Vision (ECCV).
- (2016) Proceedings of the European Conference on Computer Vision (ECCV)
- Xu, H.¹ Saenko, K.²

51
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning (ICML).
- (2015) Proceedings of the International Conference on Machine Learning (ICML)
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

52
- 84998809208
- Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2015. Stacked attention networks for image question answering. arXiv:1511.02274.
- (2015) Stacked Attention Networks for Image Question Answering
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.⁵

53
- 84986301525
- Bolei Zhou, Yuandong Tian, Sainbayar Sukhbaatar, Arthur Szlam, and Rob Fergus. 2015. Simple baseline for visual question answering. arXiv:1512.02167.
- (2015) Simple Baseline for Visual Question Answering
- Zhou, B.¹ Tian, Y.² Sukhbaatar, S.³ Szlam, A.⁴ Fergus, R.⁵

54
- 84986275767
- Visual7W: Grounded question answering in images
- Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. 2016. Visual7W: Grounded Question Answering in Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

55
- 84906489617
- Edge boxes: Locating object proposals from edges
- Springer
- C Lawrence Zitnick and Piotr Dollár. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision (ECCV), pages 391-405. Springer.
- (2014) Proceedings of the European Conference on Computer Vision (ECCV) , pp. 391-405
- Lawrence Zitnick, C.¹ Dollár, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.