SCOPUS 정보 검색 플랫폼

Computer Vision and Image Understanding

Volumn 163, Issue , 2017, Pages 3-20

Visual question answering: Datasets, algorithms, and future challenges

(2) Kafle, Kushal a Kanan, Christopher a

a ROCHESTER INSTITUTE OF TECHNOLOGY (United States)

Author keywords

Image understanding; Natural language processing; Vision and language

Indexed keywords

COMPUTER VISION; DEEP LEARNING; IMAGE UNDERSTANDING; VISUAL LANGUAGES;

EVALUATION METRICS; FUTURE CHALLENGES; LARGE AMOUNTS; POSSIBLE FUTURES; PROBLEM FORMULATION; QUESTION ANSWERING;

NATURAL LANGUAGE PROCESSING SYSTEMS;

EID: 85020874517 PISSN: 10773142 EISSN: 1090235X Source Type: Journal
DOI: 10.1016/j.cviu.2017.06.005 Document Type: Article

Times cited : (232)

References (75)

1
- 85072842417
- Analyzing the behavior of visual question answering models
- Agrawal, A., Batra, D., Parikh, D., Analyzing the behavior of visual question answering models. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2016.
- (2016) Conference on Empirical Methods on Natural Language Processing (EMNLP)
- Agrawal, A.¹ Batra, D.² Parikh, D.³

2
- 84985013144
- Deep compositional question answering with neural module networks
- Andreas, J., Rohrbach, M., Darrell, T., Klein, D., Deep compositional question answering with neural module networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

3
- 84993660571
- Learning to compose neural networks for question answering
- Andreas, J., Rohrbach, M., Darrell, T., Klein, D., Learning to compose neural networks for question answering. Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), 2016.
- (2016) Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)
- Andreas, J.¹ Rohrbach, M.² Darrell, T.³ Klein, D.⁴

4
- 84973890960
- VQA: Visual question answering
- Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D., VQA: Visual question answering. The IEEE International Conference on Computer Vision (ICCV), 2015.
- (2015) The IEEE International Conference on Computer Vision (ICCV)
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.L.⁶ Parikh, D.⁷

5
- 84959209132
- Zero-shot learning via visual abstraction
- Antol, S., Zitnick, C.L., Parikh, D., Zero-shot learning via visual abstraction. European Conference on Computer Vision (ECCV), 2014.
- (2014) European Conference on Computer Vision (ECCV)
- Antol, S.¹ Zitnick, C.L.² Parikh, D.³

6
- 85083951423
- Multiple object recognition with visual attention
- Ba, J., Mnih, V., Kavukcuoglu, K., Multiple object recognition with visual attention. International Conference on Learning Representations (ICLR), 2015.
- (2015) International Conference on Learning Representations (ICLR)
- Ba, J.¹ Mnih, V.² Kavukcuoglu, K.³

7
- 85083953689
- Neural machine translation by jointly learning to align and translate
- Bahdanau, D., Cho, K., Bengio, Y., Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR), 2015.
- (2015) International Conference on Learning Representations (ICLR)
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

8
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- Banerjee, S., Lavie, A., Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Vol. 29, 2005, 65–72.
- (2005) Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization , vol.29 , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

9
- 84960130911
- Automatic description generation from images: A survey of models, datasets, and evaluation measures
- Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., Keller, F., Muscat, A., Plank, B., Automatic description generation from images: A survey of models, datasets, and evaluation measures. Journal of Artificial Intelligence Research 55 (2016), 409–442.
- (2016) Journal of Artificial Intelligence Research , vol.55 , pp. 409-442
- Bernardi, R.¹ Cakici, R.² Elliott, D.³ Erdem, A.⁴ Erdem, E.⁵ Ikizler-Cinbis, N.⁶ Keller, F.⁷ Muscat, A.⁸ Plank, B.⁹

10
- 48249105214
- Re-evaluation the role of BLEU in machine translation research.
- Callison-Burch, C., Osborne, M., Koehn, P., Re-evaluation the role of BLEU in machine translation research. 2006.
- (2006)
- Callison-Burch, C.¹ Osborne, M.² Koehn, P.³

11
- 84961291190
- Learning phrase representations using RNN encoder-decoder for statistical machine translation
- Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., Learning phrase representations using RNN encoder-decoder for statistical machine translation. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2014.
- (2014) Conference on Empirical Methods on Natural Language Processing (EMNLP)
- Cho, K.¹ Van Merriënboer, B.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

12
- 85072846928
- Human attention in visual question answering: Do humans and deep networks look at the same regions?
- Das, A., Agrawal, H., Zitnick, C.L., Parikh, D., Batra, D., Human attention in visual question answering: Do humans and deep networks look at the same regions?. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2016.
- (2016) Conference on Empirical Methods on Natural Language Processing (EMNLP)
- Das, A.¹ Agrawal, H.² Zitnick, C.L.³ Parikh, D.⁴ Batra, D.⁵

13
- 84965102873
- arXiv preprint arXiv:, 1505.04467
- Devlin, J., Gupta, S., Girshick, R., Mitchell, M., Zitnick, C.L., Exploring nearest neighbor approaches for image captioning., 2015 arXiv preprint arXiv: 1505.04467.
- (2015) Exploring nearest neighbor approaches for image captioning.
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Zitnick, C.L.⁵

14
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T., Long-term recurrent convolutional networks for visual recognition and description. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

15
- 84959250180
- From captions to visual concepts and back
- Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J., et al. From captions to visual concepts and back. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

16
- 85044506279
- Multimodal compact bilinear pooling for visual question answering and visual grounding
- Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M., Multimodal compact bilinear pooling for visual question answering and visual grounding. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2016.
- (2016) Conference on Empirical Methods on Natural Language Processing (EMNLP)
- Fukui, A.¹ Park, D.H.² Yang, D.³ Rohrbach, A.⁴ Darrell, T.⁵ Rohrbach, M.⁶

17
- 84965148420
- Are you talking to a machine? Dataset and methods for multilingual image question answering
- Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W., Are you talking to a machine? Dataset and methods for multilingual image question answering. Advances in Neural Information Processing Systems (NIPS), 2015.
- (2015) Advances in Neural Information Processing Systems (NIPS)
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

18
- 84925422907
- Visual turing test for computer vision systems
- Geman, D., Geman, S., Hallonquist, N., Younes, L., Visual turing test for computer vision systems. Proceedings of the National Academy of Sciences, 112(12), 2015.
- (2015) Proceedings of the National Academy of Sciences , vol.112 , Issue.12
- Geman, D.¹ Geman, S.² Hallonquist, N.³ Younes, L.⁴

19
- 84986274465
- Deep residual learning for image recognition
- He, K., Zhang, X., Ren, S., Sun, J., Deep residual learning for image recognition. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

20
- 0031573117
- Long short-term memory
- Hochreiter, S., Schmidhuber, J., Long short-term memory. Neural computation 9:8 (1997), 1735–1780.
- (1997) Neural computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

21
- 85018925213
- arXiv preprint arXiv:, 1604.01485
- Ilievski, I., Yan, S., Feng, J., A focused dynamic attention model for visual question answering., 2016 arXiv preprint arXiv: 1604.01485.
- (2016) A focused dynamic attention model for visual question answering.
- Ilievski, I.¹ Yan, S.² Feng, J.³

22
- 85041926703
- Revisiting visual question answering baselines
- Jabri, A., Joulin, A., van der Maaten, L., Revisiting visual question answering baselines. European Conference on Computer Vision (ECCV), 2016.
- (2016) European Conference on Computer Vision (ECCV)
- Jabri, A.¹ Joulin, A.² van der Maaten, L.³

23
- 84986245786
- Densecap: Fully convolutional localization networks for dense captioning
- Johnson, J., Karpathy, A., Fei-Fei, L., Densecap: Fully convolutional localization networks for dense captioning. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Johnson, J.¹ Karpathy, A.² Fei-Fei, L.³

24
- 84986300506
- Answer-type prediction for visual question answering
- Kafle, K., Kanan, C., Answer-type prediction for visual question answering. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Kafle, K.¹ Kanan, C.²

25
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- Karpathy, A., Fei-Fei, L., Deep visual-semantic alignments for generating image descriptions. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Karpathy, A.¹ Fei-Fei, L.²

26
- 84911364368
- Large-scale video classification with convolutional neural networks
- Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L., Large-scale video classification with convolutional neural networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
- (2014) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

27
- 85018868398
- Multimodal residual learning for visual QA
- Kim, J.-H., Lee, S.-W., Kwak, D.-H., Heo, M.-O., Kim, J., Ha, J.-W., Zhang, B.-T., Multimodal residual learning for visual QA. Advances in Neural Information Processing Systems (NIPS), 2016.
- (2016) Advances in Neural Information Processing Systems (NIPS)
- Kim, J.-H.¹ Lee, S.-W.² Kwak, D.-H.³ Heo, M.-O.⁴ Kim, J.⁵ Ha, J.-W.⁶ Zhang, B.-T.⁷

28
- 85034816080
- arXiv preprint arXiv:, 1610.04325
- Kim, J.-H., On, K.-W., Kim, J., Ha, J.-W., Zhang, B.-T., Hadamard product for low-rank bilinear pooling., 2016 arXiv preprint arXiv: 1610.04325.
- (2016) Hadamard product for low-rank bilinear pooling.
- Kim, J.-H.¹ On, K.-W.² Kim, J.³ Ha, J.-W.⁴ Zhang, B.-T.⁵

29
- 84965153327
- Skip-thought vectors
- Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Torralba, A., Urtasun, R., Fidler, S., Skip-thought vectors. Advances in Neural Information Processing Systems (NIPS), 2015.
- (2015) Advances in Neural Information Processing Systems (NIPS)
- Kiros, R.¹ Zhu, Y.² Salakhutdinov, R.³ Zemel, R.S.⁴ Torralba, A.⁵ Urtasun, R.⁶ Fidler, S.⁷

30
- 85011596790
- Visual genome: Connecting language and vision using crowdsourced dense image annotations
- Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D.A., et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123:1 (2017), 32–73.
- (2017) International Journal of Computer Vision , vol.123 , Issue.1 , pp. 32-73
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

31
- 84998698731
- Ask me anything: Dynamic memory networks for natural language processing
- Kumar, A., Irsoy, O., Su, J., Bradbury, J., English, R., Pierce, B., Ondruska, P., Gulrajani, I., Socher, R., Ask me anything: Dynamic memory networks for natural language processing. International Conference on Machine Learning (ICML), 2016.
- (2016) International Conference on Machine Learning (ICML)
- Kumar, A.¹ Irsoy, O.² Su, J.³ Bradbury, J.⁴ English, R.⁵ Pierce, B.⁶ Ondruska, P.⁷ Gulrajani, I.⁸ Socher, R.⁹

32
- 84922756960
- DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia
- Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6:2 (2015), 167–195.
- (2015) Semantic Web , vol.6 , Issue.2 , pp. 167-195
- Lehmann, J.¹ Isele, R.² Jakob, M.³ Jentzsch, A.⁴ Kontokostas, D.⁵ Mendes, P.N.⁶ Hellmann, S.⁷ Morsey, M.⁸ van Kleef, P.⁹ Auer, S.¹⁰

33
- 26944501715
- Rouge: A package for automatic evaluation of summaries
- Barcelona, Spain
- Lin, C.-Y., Rouge: A package for automatic evaluation of summaries. Text summarization branches out: Proceedings of the ACL-04 workshop, Vol. 8, 2004, Barcelona, Spain.
- (2004) Text summarization branches out: Proceedings of the ACL-04 workshop , vol.8
- Lin, C.-Y.¹

34
- 84937834115
- Microsoft COCO: Common objects in context
- Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., Microsoft COCO: Common objects in context. European Conference on Computer Vision (ECCV), 2014.
- (2014) European Conference on Computer Vision (ECCV)
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

35
- 84973863234
- Bilinear cnn models for fine-grained visual recognition
- Lin, T.-Y., RoyChowdhury, A., Maji, S., Bilinear cnn models for fine-grained visual recognition. The IEEE International Conference on Computer Vision (ICCV), 2015.
- (2015) The IEEE International Conference on Computer Vision (ICCV)
- Lin, T.-Y.¹ RoyChowdhury, A.² Maji, S.³

36
- 84959205572
- Fully convolutional networks for semantic segmentation
- Long, J., Shelhamer, E., Darrell, T., Fully convolutional networks for semantic segmentation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Long, J.¹ Shelhamer, E.² Darrell, T.³

37
- 85018917850
- Hierarchical question-image co-attention for visual question answering
- Lu, J., Yang, J., Batra, D., Parikh, D., Hierarchical question-image co-attention for visual question answering. Advances in Neural Information Processing Systems (NIPS), 2016.
- (2016) Advances in Neural Information Processing Systems (NIPS)
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

38
- 85034970930
- Effective approaches to attention-based neural machine translation.
- Luong, M.-T., Pham, H., Manning, C.D., 2015. Effective approaches to attention-based neural machine translation.
- (2015)
- Luong, M.-T.¹ Pham, H.² Manning, C.D.³

39
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- Malinowski, M., Fritz, M., A multi-world approach to question answering about real-world scenes based on uncertain input. Advances in Neural Information Processing Systems (NIPS), 2014.
- (2014) Advances in Neural Information Processing Systems (NIPS)
- Malinowski, M.¹ Fritz, M.²

40
- 84951975735
- arXiv preprint arXiv:, 1410.8027
- Malinowski, M., Fritz, M., Towards a visual turing challenge., 2014 arXiv preprint arXiv: 1410.8027.
- (2014) Towards a visual turing challenge.
- Malinowski, M.¹ Fritz, M.²

41
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- Malinowski, M., Rohrbach, M., Fritz, M., Ask your neurons: A neural-based approach to answering questions about images. The IEEE International Conference on Computer Vision (ICCV), 2015.
- (2015) The IEEE International Conference on Computer Vision (ICCV)
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

42
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A., Deep captioning with multimodal recurrent neural networks (m-rnn). International Conference on Learning Representations (ICLR), 2015.
- (2015) International Conference on Learning Representations (ICLR)
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

43
- 84898956512
- Distributed representations of words and phrases and their compositionality
- Mikolov, T., Dean, J., Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems (NIPS), 2013.
- (2013) Advances in Neural Information Processing Systems (NIPS)
- Mikolov, T.¹ Dean, J.²

44
- 85021185316
- arXiv preprint arXiv:, 1611.00471
- Nam, H., Ha, J.-W., Kim, J., Dual attention networks for multimodal reasoning and matching., 2016 arXiv preprint arXiv: 1611.00471.
- (2016) Dual attention networks for multimodal reasoning and matching.
- Nam, H.¹ Ha, J.-W.² Kim, J.³

45
- 85030462424
- arXiv preprint arXiv:, 1606.03647
- Noh, H., Han, B., Training recurrent answering units with joint loss minimization for VQA., 2016 arXiv preprint arXiv: 1606.03647.
- (2016) Training recurrent answering units with joint loss minimization for VQA.
- Noh, H.¹ Han, B.²

46
- 84973879016
- Learning deconvolution network for semantic segmentation
- Noh, H., Hong, S., Han, B., Learning deconvolution network for semantic segmentation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Noh, H.¹ Hong, S.² Han, B.³

47
- 84986261711
- Image question answering using convolutional neural network with dynamic parameter prediction
- Noh, H., Seo, P.H., Han, B., Image question answering using convolutional neural network with dynamic parameter prediction. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Noh, H.¹ Seo, P.H.² Han, B.³

48
- 85133336275
- BLEU: a method for automatic evaluation of machine translation
- Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., BLEU: a method for automatic evaluation of machine translation. Annual Meeting of the Association for Computational Linguistics (ACL), 2002.
- (2002) Annual Meeting of the Association for Computational Linguistics (ACL)
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

49
- 84986308404
- You only look once: Unified, real-time object detection
- Redmon, J., Divvala, S., Girshick, R., Farhadi, A., You only look once: Unified, real-time object detection. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Redmon, J.¹ Divvala, S.² Girshick, R.³ Farhadi, A.⁴

50
- 84965170394
- Exploring models and data for image question answering
- Ren, M., Kiros, R., Zemel, R., Exploring models and data for image question answering. Advances in Neural Information Processing Systems (NIPS), 2015.
- (2015) Advances in Neural Information Processing Systems (NIPS)
- Ren, M.¹ Kiros, R.² Zemel, R.³

51
- 84960980241
- Faster r-CNN: Towards real-time object detection with region proposal networks
- Ren, S., He, K., Girshick, R., Sun, J., Faster r-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NIPS), 2015.
- (2015) Advances in Neural Information Processing Systems (NIPS)
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

52
- 85031713628
- arXiv preprint arXiv:, 1606.06108
- Saito, K., Shin, A., Ushiku, Y., Harada, T., Dualnet: Domain-invariant network for visual question answering., 2016 arXiv preprint arXiv: 1606.06108.
- (2016) Dualnet: Domain-invariant network for visual question answering.
- Saito, K.¹ Shin, A.² Ushiku, Y.³ Harada, T.⁴

53
- 84986327457
- Where to look: Focus regions for visual question answering
- Shih, K.J., Singh, S., Hoiem, D., Where to look: Focus regions for visual question answering. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Shih, K.J.¹ Singh, S.² Hoiem, D.³

54
- 84881536861
- Indoor segmentation and support inference from RGBD images
- Silberman, N., Hoiem, D., Kohli, P., Fergus, R., Indoor segmentation and support inference from RGBD images. European Conference on Computer Vision (ECCV), 2012.
- (2012) European Conference on Computer Vision (ECCV)
- Silberman, N.¹ Hoiem, D.² Kohli, P.³ Fergus, R.⁴

55
- 84959205514
- Instance segmentation of indoor scenes using a coverage loss
- Silberman, N., Sontag, D., Fergus, R., Instance segmentation of indoor scenes using a coverage loss. European Conference on Computer Vision (ECCV), 2014.
- (2014) European Conference on Computer Vision (ECCV)
- Silberman, N.¹ Sontag, D.² Fergus, R.³

56
- 84937862424
- Two-stream convolutional networks for action recognition in videos
- Simonyan, K., Zisserman, A., Two-stream convolutional networks for action recognition in videos. Advances in Neural Information Processing Systems (NIPS), 2014.
- (2014) Advances in Neural Information Processing Systems (NIPS)
- Simonyan, K.¹ Zisserman, A.²

57
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- Simonyan, K., Zisserman, A., Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR), 2015.
- (2015) International Conference on Learning Representations (ICLR)
- Simonyan, K.¹ Zisserman, A.²

58
- 84937522268
- Going deeper with convolutions
- Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., Going deeper with convolutions. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

59
- 84898989329
- Deep neural networks for object detection
- Szegedy, C., Toshev, A., Erhan, D., Deep neural networks for object detection. Advances in Neural Information Processing Systems (NIPS), 2013.
- (2013) Advances in Neural Information Processing Systems (NIPS)
- Szegedy, C.¹ Toshev, A.² Erhan, D.³

60
- 84957922397
- Yfcc100m: The new data in multimedia research
- Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.-J., Yfcc100m: The new data in multimedia research. Communications of the ACM 59:2 (2016), 64–73.
- (2016) Communications of the ACM , vol.59 , Issue.2 , pp. 64-73
- Thomee, B.¹ Shamma, D.A.² Friedland, G.³ Elizalde, B.⁴ Ni, K.⁵ Poland, D.⁶ Borth, D.⁷ Li, L.-J.⁸

61
- 80052908300
- Unbiased look at dataset bias
- Torralba, A., Efros, A., Unbiased look at dataset bias. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011.
- (2011) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Torralba, A.¹ Efros, A.²

62
- 84956980995
- Cider: Consensus-based image description evaluation
- Vedantam, R., Lawrence Zitnick, C., Parikh, D., Cider: Consensus-based image description evaluation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

63
- 84946747440
- Show and tell: A neural image caption generator
- Vinyals, O., Toshev, A., Bengio, S., Erhan, D., Show and tell: A neural image caption generator. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
- (2015) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

64
- 84986320870
- Ask me anything: Free-form visual question answering based on knowledge from external sources
- Wu, Q., Wang, P., Shen, C., van den Hengel, A., Dick, A.R., Ask me anything: Free-form visual question answering based on knowledge from external sources. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Wu, Q.¹ Wang, P.² Shen, C.³ van den Hengel, A.⁴ Dick, A.R.⁵

65
- 85146676791
- Verbs semantics and lexical selection
- Wu, Z., Palmer, M., Verbs semantics and lexical selection. Annual Meeting of the Association for Computational Linguistics (ACL), 1994.
- (1994) Annual Meeting of the Association for Computational Linguistics (ACL)
- Wu, Z.¹ Palmer, M.²

66
- 84999008900
- Dynamic memory networks for visual and textual question answering
- Xiong, C., Merity, S., Socher, R., Dynamic memory networks for visual and textual question answering. International Conference on Machine Learning (ICML), 2016.
- (2016) International Conference on Machine Learning (ICML)
- Xiong, C.¹ Merity, S.² Socher, R.³

67
- 85035076512
- Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
- Xu, H., Saenko, K., Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. European Conference on Computer Vision (ECCV), 2016.
- (2016) European Conference on Computer Vision (ECCV)
- Xu, H.¹ Saenko, K.²

68
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y., Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning (ICML), 2015.
- (2015) International Conference on Machine Learning (ICML)
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

69
- 84986334021
- Stacked attention networks for image question answering
- Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J., Stacked attention networks for image question answering. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Yang, Z.¹ He, X.² Gao, J.³ Deng, L.⁴ Smola, A.J.⁵

70
- 84986278354
- Yin and yang: balancing and answering binary visual questions
- Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D., Yin and yang: balancing and answering binary visual questions. CVPR, 2016.
- (2016) CVPR
- Zhang, P.¹ Goyal, Y.² Summers-Stay, D.³ Batra, D.⁴ Parikh, D.⁵

71
- 85060481824
- Instance-level segmentation with deep densely connected MRFs
- Zhang, Z., Fidler, S., Urtasun, R., Instance-level segmentation with deep densely connected MRFs. CVPR, 2016.
- (2016) CVPR
- Zhang, Z.¹ Fidler, S.² Urtasun, R.³

72
- 84973891613
- Monocular object instance segmentation and depth ordering with CNNs
- Zhang, Z., Schwing, A.G., Fidler, S., Urtasun, R., Monocular object instance segmentation and depth ordering with CNNs. CVPR, 2015, 2614–2622.
- (2015) CVPR , pp. 2614-2622
- Zhang, Z.¹ Schwing, A.G.² Fidler, S.³ Urtasun, R.⁴

73
- 84986301525
- arXiv preprint arXiv:, 1512.02167
- Zhou, B., Tian, Y., Sukhbaatar, S., Szlam, A., Fergus, R., Simple baseline for visual question answering., 2015 arXiv preprint arXiv: 1512.02167.
- (2015) Simple baseline for visual question answering.
- Zhou, B.¹ Tian, Y.² Sukhbaatar, S.³ Szlam, A.⁴ Fergus, R.⁵

74
- 84986275767
- Visual7w: Grounded question answering in images
- Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L., Visual7w: Grounded question answering in images. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
- (2016) The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
- Zhu, Y.¹ Groth, O.² Bernstein, M.³ Fei-Fei, L.⁴

75
- 84906489617
- Edge boxes: Locating object proposals from edges
- Springer
- Zitnick, C.L., Dollár, P., Edge boxes: Locating object proposals from edges. European Conference on Computer Vision, 2014, Springer, 391–405.
- (2014) European Conference on Computer Vision , pp. 391-405
- Zitnick, C.L.¹ Dollár, P.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.