-
1
-
-
85072842417
-
Analyzing the behavior of visual question answering models
-
Agrawal, A., Batra, D., Parikh, D., Analyzing the behavior of visual question answering models. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2016.
-
(2016)
Conference on Empirical Methods on Natural Language Processing (EMNLP)
-
-
Agrawal, A.1
Batra, D.2
Parikh, D.3
-
2
-
-
84985013144
-
Deep compositional question answering with neural module networks
-
Andreas, J., Rohrbach, M., Darrell, T., Klein, D., Deep compositional question answering with neural module networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Andreas, J.1
Rohrbach, M.2
Darrell, T.3
Klein, D.4
-
3
-
-
84993660571
-
Learning to compose neural networks for question answering
-
Andreas, J., Rohrbach, M., Darrell, T., Klein, D., Learning to compose neural networks for question answering. Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL), 2016.
-
(2016)
Annual Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL)
-
-
Andreas, J.1
Rohrbach, M.2
Darrell, T.3
Klein, D.4
-
4
-
-
84973890960
-
VQA: Visual question answering
-
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Zitnick, C.L., Parikh, D., VQA: Visual question answering. The IEEE International Conference on Computer Vision (ICCV), 2015.
-
(2015)
The IEEE International Conference on Computer Vision (ICCV)
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
5
-
-
84959209132
-
Zero-shot learning via visual abstraction
-
Antol, S., Zitnick, C.L., Parikh, D., Zero-shot learning via visual abstraction. European Conference on Computer Vision (ECCV), 2014.
-
(2014)
European Conference on Computer Vision (ECCV)
-
-
Antol, S.1
Zitnick, C.L.2
Parikh, D.3
-
6
-
-
85083951423
-
Multiple object recognition with visual attention
-
Ba, J., Mnih, V., Kavukcuoglu, K., Multiple object recognition with visual attention. International Conference on Learning Representations (ICLR), 2015.
-
(2015)
International Conference on Learning Representations (ICLR)
-
-
Ba, J.1
Mnih, V.2
Kavukcuoglu, K.3
-
7
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
Bahdanau, D., Cho, K., Bengio, Y., Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations (ICLR), 2015.
-
(2015)
International Conference on Learning Representations (ICLR)
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
8
-
-
85116156579
-
Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
-
Banerjee, S., Lavie, A., Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, Vol. 29, 2005, 65–72.
-
(2005)
Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization
, vol.29
, pp. 65-72
-
-
Banerjee, S.1
Lavie, A.2
-
9
-
-
84960130911
-
Automatic description generation from images: A survey of models, datasets, and evaluation measures
-
Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., Keller, F., Muscat, A., Plank, B., Automatic description generation from images: A survey of models, datasets, and evaluation measures. Journal of Artificial Intelligence Research 55 (2016), 409–442.
-
(2016)
Journal of Artificial Intelligence Research
, vol.55
, pp. 409-442
-
-
Bernardi, R.1
Cakici, R.2
Elliott, D.3
Erdem, A.4
Erdem, E.5
Ikizler-Cinbis, N.6
Keller, F.7
Muscat, A.8
Plank, B.9
-
10
-
-
48249105214
-
Re-evaluation the role of BLEU in machine translation research.
-
Callison-Burch, C., Osborne, M., Koehn, P., Re-evaluation the role of BLEU in machine translation research. 2006.
-
(2006)
-
-
Callison-Burch, C.1
Osborne, M.2
Koehn, P.3
-
11
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y., Learning phrase representations using RNN encoder-decoder for statistical machine translation. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2014.
-
(2014)
Conference on Empirical Methods on Natural Language Processing (EMNLP)
-
-
Cho, K.1
Van Merriënboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
12
-
-
85072846928
-
Human attention in visual question answering: Do humans and deep networks look at the same regions?
-
Das, A., Agrawal, H., Zitnick, C.L., Parikh, D., Batra, D., Human attention in visual question answering: Do humans and deep networks look at the same regions?. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2016.
-
(2016)
Conference on Empirical Methods on Natural Language Processing (EMNLP)
-
-
Das, A.1
Agrawal, H.2
Zitnick, C.L.3
Parikh, D.4
Batra, D.5
-
13
-
-
84965102873
-
-
arXiv preprint arXiv:, 1505.04467
-
Devlin, J., Gupta, S., Girshick, R., Mitchell, M., Zitnick, C.L., Exploring nearest neighbor approaches for image captioning., 2015 arXiv preprint arXiv: 1505.04467.
-
(2015)
Exploring nearest neighbor approaches for image captioning.
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.3
Mitchell, M.4
Zitnick, C.L.5
-
14
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T., Long-term recurrent convolutional networks for visual recognition and description. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-
(2015)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
15
-
-
84959250180
-
From captions to visual concepts and back
-
Fang, H., Gupta, S., Iandola, F., Srivastava, R., Deng, L., Dollár, P., Gao, J., He, X., Mitchell, M., Platt, J., et al. From captions to visual concepts and back. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-
(2015)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
-
16
-
-
85044506279
-
Multimodal compact bilinear pooling for visual question answering and visual grounding
-
Fukui, A., Park, D.H., Yang, D., Rohrbach, A., Darrell, T., Rohrbach, M., Multimodal compact bilinear pooling for visual question answering and visual grounding. Conference on Empirical Methods on Natural Language Processing (EMNLP), 2016.
-
(2016)
Conference on Empirical Methods on Natural Language Processing (EMNLP)
-
-
Fukui, A.1
Park, D.H.2
Yang, D.3
Rohrbach, A.4
Darrell, T.5
Rohrbach, M.6
-
17
-
-
84965148420
-
Are you talking to a machine? Dataset and methods for multilingual image question answering
-
Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., Xu, W., Are you talking to a machine? Dataset and methods for multilingual image question answering. Advances in Neural Information Processing Systems (NIPS), 2015.
-
(2015)
Advances in Neural Information Processing Systems (NIPS)
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
18
-
-
84925422907
-
Visual turing test for computer vision systems
-
Geman, D., Geman, S., Hallonquist, N., Younes, L., Visual turing test for computer vision systems. Proceedings of the National Academy of Sciences, 112(12), 2015.
-
(2015)
Proceedings of the National Academy of Sciences
, vol.112
, Issue.12
-
-
Geman, D.1
Geman, S.2
Hallonquist, N.3
Younes, L.4
-
19
-
-
84986274465
-
Deep residual learning for image recognition
-
He, K., Zhang, X., Ren, S., Sun, J., Deep residual learning for image recognition. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
20
-
-
0031573117
-
Long short-term memory
-
Hochreiter, S., Schmidhuber, J., Long short-term memory. Neural computation 9:8 (1997), 1735–1780.
-
(1997)
Neural computation
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
21
-
-
85018925213
-
-
arXiv preprint arXiv:, 1604.01485
-
Ilievski, I., Yan, S., Feng, J., A focused dynamic attention model for visual question answering., 2016 arXiv preprint arXiv: 1604.01485.
-
(2016)
A focused dynamic attention model for visual question answering.
-
-
Ilievski, I.1
Yan, S.2
Feng, J.3
-
22
-
-
85041926703
-
Revisiting visual question answering baselines
-
Jabri, A., Joulin, A., van der Maaten, L., Revisiting visual question answering baselines. European Conference on Computer Vision (ECCV), 2016.
-
(2016)
European Conference on Computer Vision (ECCV)
-
-
Jabri, A.1
Joulin, A.2
van der Maaten, L.3
-
23
-
-
84986245786
-
Densecap: Fully convolutional localization networks for dense captioning
-
Johnson, J., Karpathy, A., Fei-Fei, L., Densecap: Fully convolutional localization networks for dense captioning. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Johnson, J.1
Karpathy, A.2
Fei-Fei, L.3
-
26
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L., Large-scale video classification with convolutional neural networks. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
-
(2014)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
27
-
-
85018868398
-
Multimodal residual learning for visual QA
-
Kim, J.-H., Lee, S.-W., Kwak, D.-H., Heo, M.-O., Kim, J., Ha, J.-W., Zhang, B.-T., Multimodal residual learning for visual QA. Advances in Neural Information Processing Systems (NIPS), 2016.
-
(2016)
Advances in Neural Information Processing Systems (NIPS)
-
-
Kim, J.-H.1
Lee, S.-W.2
Kwak, D.-H.3
Heo, M.-O.4
Kim, J.5
Ha, J.-W.6
Zhang, B.-T.7
-
28
-
-
85034816080
-
-
arXiv preprint arXiv:, 1610.04325
-
Kim, J.-H., On, K.-W., Kim, J., Ha, J.-W., Zhang, B.-T., Hadamard product for low-rank bilinear pooling., 2016 arXiv preprint arXiv: 1610.04325.
-
(2016)
Hadamard product for low-rank bilinear pooling.
-
-
Kim, J.-H.1
On, K.-W.2
Kim, J.3
Ha, J.-W.4
Zhang, B.-T.5
-
29
-
-
84965153327
-
Skip-thought vectors
-
Kiros, R., Zhu, Y., Salakhutdinov, R., Zemel, R.S., Torralba, A., Urtasun, R., Fidler, S., Skip-thought vectors. Advances in Neural Information Processing Systems (NIPS), 2015.
-
(2015)
Advances in Neural Information Processing Systems (NIPS)
-
-
Kiros, R.1
Zhu, Y.2
Salakhutdinov, R.3
Zemel, R.S.4
Torralba, A.5
Urtasun, R.6
Fidler, S.7
-
30
-
-
85011596790
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
Krishna, R., Zhu, Y., Groth, O., Johnson, J., Hata, K., Kravitz, J., Chen, S., Kalantidis, Y., Li, L.-J., Shamma, D.A., et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision 123:1 (2017), 32–73.
-
(2017)
International Journal of Computer Vision
, vol.123
, Issue.1
, pp. 32-73
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
31
-
-
84998698731
-
Ask me anything: Dynamic memory networks for natural language processing
-
Kumar, A., Irsoy, O., Su, J., Bradbury, J., English, R., Pierce, B., Ondruska, P., Gulrajani, I., Socher, R., Ask me anything: Dynamic memory networks for natural language processing. International Conference on Machine Learning (ICML), 2016.
-
(2016)
International Conference on Machine Learning (ICML)
-
-
Kumar, A.1
Irsoy, O.2
Su, J.3
Bradbury, J.4
English, R.5
Pierce, B.6
Ondruska, P.7
Gulrajani, I.8
Socher, R.9
-
32
-
-
84922756960
-
DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia
-
Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al. DBpedia–a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6:2 (2015), 167–195.
-
(2015)
Semantic Web
, vol.6
, Issue.2
, pp. 167-195
-
-
Lehmann, J.1
Isele, R.2
Jakob, M.3
Jentzsch, A.4
Kontokostas, D.5
Mendes, P.N.6
Hellmann, S.7
Morsey, M.8
van Kleef, P.9
Auer, S.10
-
34
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L., Microsoft COCO: Common objects in context. European Conference on Computer Vision (ECCV), 2014.
-
(2014)
European Conference on Computer Vision (ECCV)
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
35
-
-
84973863234
-
Bilinear cnn models for fine-grained visual recognition
-
Lin, T.-Y., RoyChowdhury, A., Maji, S., Bilinear cnn models for fine-grained visual recognition. The IEEE International Conference on Computer Vision (ICCV), 2015.
-
(2015)
The IEEE International Conference on Computer Vision (ICCV)
-
-
Lin, T.-Y.1
RoyChowdhury, A.2
Maji, S.3
-
36
-
-
84959205572
-
Fully convolutional networks for semantic segmentation
-
Long, J., Shelhamer, E., Darrell, T., Fully convolutional networks for semantic segmentation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-
(2015)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Long, J.1
Shelhamer, E.2
Darrell, T.3
-
37
-
-
85018917850
-
Hierarchical question-image co-attention for visual question answering
-
Lu, J., Yang, J., Batra, D., Parikh, D., Hierarchical question-image co-attention for visual question answering. Advances in Neural Information Processing Systems (NIPS), 2016.
-
(2016)
Advances in Neural Information Processing Systems (NIPS)
-
-
Lu, J.1
Yang, J.2
Batra, D.3
Parikh, D.4
-
38
-
-
85034970930
-
-
Effective approaches to attention-based neural machine translation.
-
Luong, M.-T., Pham, H., Manning, C.D., 2015. Effective approaches to attention-based neural machine translation.
-
(2015)
-
-
Luong, M.-T.1
Pham, H.2
Manning, C.D.3
-
41
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
Malinowski, M., Rohrbach, M., Fritz, M., Ask your neurons: A neural-based approach to answering questions about images. The IEEE International Conference on Computer Vision (ICCV), 2015.
-
(2015)
The IEEE International Conference on Computer Vision (ICCV)
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
42
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A., Deep captioning with multimodal recurrent neural networks (m-rnn). International Conference on Learning Representations (ICLR), 2015.
-
(2015)
International Conference on Learning Representations (ICLR)
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
44
-
-
85021185316
-
-
arXiv preprint arXiv:, 1611.00471
-
Nam, H., Ha, J.-W., Kim, J., Dual attention networks for multimodal reasoning and matching., 2016 arXiv preprint arXiv: 1611.00471.
-
(2016)
Dual attention networks for multimodal reasoning and matching.
-
-
Nam, H.1
Ha, J.-W.2
Kim, J.3
-
46
-
-
84973879016
-
Learning deconvolution network for semantic segmentation
-
Noh, H., Hong, S., Han, B., Learning deconvolution network for semantic segmentation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-
(2015)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Noh, H.1
Hong, S.2
Han, B.3
-
47
-
-
84986261711
-
Image question answering using convolutional neural network with dynamic parameter prediction
-
Noh, H., Seo, P.H., Han, B., Image question answering using convolutional neural network with dynamic parameter prediction. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Noh, H.1
Seo, P.H.2
Han, B.3
-
48
-
-
85133336275
-
BLEU: a method for automatic evaluation of machine translation
-
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J., BLEU: a method for automatic evaluation of machine translation. Annual Meeting of the Association for Computational Linguistics (ACL), 2002.
-
(2002)
Annual Meeting of the Association for Computational Linguistics (ACL)
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
49
-
-
84986308404
-
You only look once: Unified, real-time object detection
-
Redmon, J., Divvala, S., Girshick, R., Farhadi, A., You only look once: Unified, real-time object detection. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Redmon, J.1
Divvala, S.2
Girshick, R.3
Farhadi, A.4
-
50
-
-
84965170394
-
Exploring models and data for image question answering
-
Ren, M., Kiros, R., Zemel, R., Exploring models and data for image question answering. Advances in Neural Information Processing Systems (NIPS), 2015.
-
(2015)
Advances in Neural Information Processing Systems (NIPS)
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
51
-
-
84960980241
-
Faster r-CNN: Towards real-time object detection with region proposal networks
-
Ren, S., He, K., Girshick, R., Sun, J., Faster r-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems (NIPS), 2015.
-
(2015)
Advances in Neural Information Processing Systems (NIPS)
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
52
-
-
85031713628
-
-
arXiv preprint arXiv:, 1606.06108
-
Saito, K., Shin, A., Ushiku, Y., Harada, T., Dualnet: Domain-invariant network for visual question answering., 2016 arXiv preprint arXiv: 1606.06108.
-
(2016)
Dualnet: Domain-invariant network for visual question answering.
-
-
Saito, K.1
Shin, A.2
Ushiku, Y.3
Harada, T.4
-
53
-
-
84986327457
-
Where to look: Focus regions for visual question answering
-
Shih, K.J., Singh, S., Hoiem, D., Where to look: Focus regions for visual question answering. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Shih, K.J.1
Singh, S.2
Hoiem, D.3
-
54
-
-
84881536861
-
Indoor segmentation and support inference from RGBD images
-
Silberman, N., Hoiem, D., Kohli, P., Fergus, R., Indoor segmentation and support inference from RGBD images. European Conference on Computer Vision (ECCV), 2012.
-
(2012)
European Conference on Computer Vision (ECCV)
-
-
Silberman, N.1
Hoiem, D.2
Kohli, P.3
Fergus, R.4
-
55
-
-
84959205514
-
Instance segmentation of indoor scenes using a coverage loss
-
Silberman, N., Sontag, D., Fergus, R., Instance segmentation of indoor scenes using a coverage loss. European Conference on Computer Vision (ECCV), 2014.
-
(2014)
European Conference on Computer Vision (ECCV)
-
-
Silberman, N.1
Sontag, D.2
Fergus, R.3
-
58
-
-
84937522268
-
Going deeper with convolutions
-
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A., Going deeper with convolutions. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-
(2015)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
59
-
-
84898989329
-
Deep neural networks for object detection
-
Szegedy, C., Toshev, A., Erhan, D., Deep neural networks for object detection. Advances in Neural Information Processing Systems (NIPS), 2013.
-
(2013)
Advances in Neural Information Processing Systems (NIPS)
-
-
Szegedy, C.1
Toshev, A.2
Erhan, D.3
-
60
-
-
84957922397
-
Yfcc100m: The new data in multimedia research
-
Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.-J., Yfcc100m: The new data in multimedia research. Communications of the ACM 59:2 (2016), 64–73.
-
(2016)
Communications of the ACM
, vol.59
, Issue.2
, pp. 64-73
-
-
Thomee, B.1
Shamma, D.A.2
Friedland, G.3
Elizalde, B.4
Ni, K.5
Poland, D.6
Borth, D.7
Li, L.-J.8
-
62
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
Vedantam, R., Lawrence Zitnick, C., Parikh, D., Cider: Consensus-based image description evaluation. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-
(2015)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Vedantam, R.1
Lawrence Zitnick, C.2
Parikh, D.3
-
63
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Vinyals, O., Toshev, A., Bengio, S., Erhan, D., Show and tell: A neural image caption generator. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2015.
-
(2015)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
64
-
-
84986320870
-
Ask me anything: Free-form visual question answering based on knowledge from external sources
-
Wu, Q., Wang, P., Shen, C., van den Hengel, A., Dick, A.R., Ask me anything: Free-form visual question answering based on knowledge from external sources. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Wu, Q.1
Wang, P.2
Shen, C.3
van den Hengel, A.4
Dick, A.R.5
-
66
-
-
84999008900
-
Dynamic memory networks for visual and textual question answering
-
Xiong, C., Merity, S., Socher, R., Dynamic memory networks for visual and textual question answering. International Conference on Machine Learning (ICML), 2016.
-
(2016)
International Conference on Machine Learning (ICML)
-
-
Xiong, C.1
Merity, S.2
Socher, R.3
-
67
-
-
85035076512
-
Ask, attend and answer: Exploring question-guided spatial attention for visual question answering
-
Xu, H., Saenko, K., Ask, attend and answer: Exploring question-guided spatial attention for visual question answering. European Conference on Computer Vision (ECCV), 2016.
-
(2016)
European Conference on Computer Vision (ECCV)
-
-
Xu, H.1
Saenko, K.2
-
68
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Xu, K., Ba, J., Kiros, R., Courville, A., Salakhutdinov, R., Zemel, R., Bengio, Y., Show, attend and tell: Neural image caption generation with visual attention. International Conference on Machine Learning (ICML), 2015.
-
(2015)
International Conference on Machine Learning (ICML)
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
69
-
-
84986334021
-
Stacked attention networks for image question answering
-
Yang, Z., He, X., Gao, J., Deng, L., Smola, A.J., Stacked attention networks for image question answering. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.J.5
-
70
-
-
84986278354
-
Yin and yang: balancing and answering binary visual questions
-
Zhang, P., Goyal, Y., Summers-Stay, D., Batra, D., Parikh, D., Yin and yang: balancing and answering binary visual questions. CVPR, 2016.
-
(2016)
CVPR
-
-
Zhang, P.1
Goyal, Y.2
Summers-Stay, D.3
Batra, D.4
Parikh, D.5
-
71
-
-
85060481824
-
Instance-level segmentation with deep densely connected MRFs
-
Zhang, Z., Fidler, S., Urtasun, R., Instance-level segmentation with deep densely connected MRFs. CVPR, 2016.
-
(2016)
CVPR
-
-
Zhang, Z.1
Fidler, S.2
Urtasun, R.3
-
72
-
-
84973891613
-
Monocular object instance segmentation and depth ordering with CNNs
-
Zhang, Z., Schwing, A.G., Fidler, S., Urtasun, R., Monocular object instance segmentation and depth ordering with CNNs. CVPR, 2015, 2614–2622.
-
(2015)
CVPR
, pp. 2614-2622
-
-
Zhang, Z.1
Schwing, A.G.2
Fidler, S.3
Urtasun, R.4
-
73
-
-
84986301525
-
-
arXiv preprint arXiv:, 1512.02167
-
Zhou, B., Tian, Y., Sukhbaatar, S., Szlam, A., Fergus, R., Simple baseline for visual question answering., 2015 arXiv preprint arXiv: 1512.02167.
-
(2015)
Simple baseline for visual question answering.
-
-
Zhou, B.1
Tian, Y.2
Sukhbaatar, S.3
Szlam, A.4
Fergus, R.5
-
74
-
-
84986275767
-
Visual7w: Grounded question answering in images
-
Zhu, Y., Groth, O., Bernstein, M., Fei-Fei, L., Visual7w: Grounded question answering in images. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
-
(2016)
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Zhu, Y.1
Groth, O.2
Bernstein, M.3
Fei-Fei, L.4
-
75
-
-
84906489617
-
Edge boxes: Locating object proposals from edges
-
Springer
-
Zitnick, C.L., Dollár, P., Edge boxes: Locating object proposals from edges. European Conference on Computer Vision, 2014, Springer, 391–405.
-
(2014)
European Conference on Computer Vision
, pp. 391-405
-
-
Zitnick, C.L.1
Dollár, P.2
|