-
1
-
-
84973890960
-
-
Antol, S., Agrawal, A., Lu, J., Mitchell, M., Batra, D., Lawrence Zitnick, C., & Parikh, D. (2015). VQA: Visual question answering. In International conference on computer vision (ICCV).
-
(2015)
VQA: Visual question answering. In International conference on computer vision (ICCV)
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
2
-
-
49949092526
-
-
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., & Ives, Z. (2007). Dbpedia: A nucleus for a web of open data. In The semantic web (pp. 722–735). Springer.
-
(2007)
Dbpedia: A nucleus for a web of open data. In The semantic web (pp. 722–735). Springer
-
-
Auer, S.1
Bizer, C.2
Kobilarov, G.3
Lehmann, J.4
Cyganiak, R.5
Ives, Z.6
-
4
-
-
85027982212
-
-
Cho, K., Van Merrienboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2015). Learning phrase representations using RNN encoder—decoder for statistical machine translation. In Proceedings of the conference on empirical methods in natural language processing (EMNLP).
-
(2015)
Learning phrase representations using RNN encoder—decoder for statistical machine translation. In Proceedings of the conference on empirical methods in natural language processing (EMNLP)
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
5
-
-
84939821078
-
-
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
-
(2014)
Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555
-
-
Chung, J.1
Gulcehre, C.2
Cho, K.3
Bengio, Y.4
-
7
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., & Darrell, T. (2015). Long-term recurrent convolutional networks for visual recognition and description. In Conference on computer vision and pattern recognition (CVPR).
-
(2015)
In Conference on computer vision and pattern recognition (CVPR)
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
9
-
-
84898958665
-
DeViSE: A deep visual-semantic embedding model
-
Frome, A., Corrado, G. S., Shlens, J., Bengio, S., Dean, J., & Mikolov, T. (2013). DeViSE: A deep visual-semantic embedding model. In Conference on neural information processing systems (NIPS).
-
(2013)
In Conference on neural information processing systems (NIPS)
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
10
-
-
84959507605
-
Recognizing an action using its name: A knowledge-based approach
-
Gan, C., Yang, Y., Zhu, L., Zhao, D., & Zhuang, Y. (2016). Recognizing an action using its name: A knowledge-based approach. International Journal of Computer Vision (IJCV), 120, 61–77.
-
(2016)
International Journal of Computer Vision (IJCV)
, vol.120
, pp. 61-77
-
-
Gan, C.1
Yang, Y.2
Zhu, L.3
Zhao, D.4
Zhuang, Y.5
-
11
-
-
84965148420
-
Are you talking to a machine?
-
Gao, H., Mao, J., Zhou, J., Huang, Z., Wang, L., & Xu, W. (2015). Are you talking to a machine? Dataset and methods for multilingual image question answering. In Conference on neural information processing systems (NIPS).
-
(2015)
Dataset and methods for multilingual image question answering. In Conference on neural information processing systems (NIPS)
-
-
Gao, H.1
Mao, J.2
Zhou, J.3
Huang, Z.4
Wang, L.5
Xu, W.6
-
12
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Conference on computer vision and pattern recognition (CVPR).
-
(2014)
In Conference on computer vision and pattern recognition (CVPR)
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
13
-
-
84894905366
-
A multi-view embedding space for modeling internet images, tags, and their semantics
-
Gong, Y., Ke, Q., Isard, M., & Lazebnik, S. (2014). A multi-view embedding space for modeling internet images, tags, and their semantics. International Journal of Computer Vision (IJCV), 106(2), 210–233.
-
(2014)
International Journal of Computer Vision (IJCV)
, vol.106
, Issue.2
, pp. 210-233
-
-
Gong, Y.1
Ke, Q.2
Isard, M.3
Lazebnik, S.4
-
14
-
-
0031573117
-
Long short-term memory
-
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
-
(1997)
Neural Computation
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
15
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
Hodosh, M., Young, P., & Hockenmaier, J. (2013). Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research (JAIR), 47, 853–899.
-
(2013)
Journal of Artificial Intelligence Research (JAIR)
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
19
-
-
84965153327
-
Skip-thought vectors
-
Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Skip-thought vectors. In Conference on neural information processing systems (NIPS).
-
(2015)
In Conference on neural information processing systems (NIPS)
-
-
Kiros, R.1
Zhu, Y.2
Salakhutdinov, R.R.3
Zemel, R.4
Urtasun, R.5
Torralba, A.6
Fidler, S.7
-
22
-
-
80052901011
-
Baby talk: Understanding and generating image descriptions
-
Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A. C., & Berg, T. L. (2011). Baby talk: Understanding and generating image descriptions. In Conference on computer vision and pattern recognition (CVPR).
-
(2011)
In Conference on computer vision and pattern recognition (CVPR)
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
24
-
-
85027963396
-
-
Lin, T.-Y., Maire, M., Belongie, S., Perona, P., Ramanan, D., Hays, J., et al
-
Lin, T.-Y., Maire, M., Belongie, S., Perona, P., Ramanan, D., Hays, J., et al. (2014). Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV).
-
(2014)
Microsoft COCO: Common objects in context. In European conference on computer vision (ECCV).
-
-
-
28
-
-
85011853174
-
Generation and comprehension of unambiguous object descriptions
-
Mao, J., Huang, J., Toshev, A., Camburu, O., Yuille, A. L., & Murphy, K. (2015). Generation and comprehension of unambiguous object descriptions. In Conference on computer vision and pattern recognition (CVPR).
-
(2015)
In Conference on computer vision and pattern recognition (CVPR)
-
-
Mao, J.1
Huang, J.2
Toshev, A.3
Camburu, O.4
Yuille, A.L.5
Murphy, K.6
-
29
-
-
84986312889
-
-
MED. (2014). TRECVID MED 14. http://nist.gov/itl/iad/mig/med14.cfm.
-
(2014)
TRECVID MED
, pp. 14
-
-
-
30
-
-
84898956512
-
-
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In Conference on neural information processing systems (NIPS).
-
(2013)
Distributed representations of words and phrases and their compositionality. In Conference on neural information processing systems (NIPS)
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
31
-
-
84936796885
-
Large scale retrieval and generation of image descriptions
-
Ordonez, V., Han, X., Kuznetsova, P., Kulkarni, G., Mitchell, M., Yamaguchi, K., et al. (2015). Large scale retrieval and generation of image descriptions. International Journal of Computer Vision (IJCV), 119, 46–59.
-
(2015)
International Journal of Computer Vision (IJCV)
, vol.119
, pp. 46-59
-
-
Ordonez, V.1
Han, X.2
Kuznetsova, P.3
Kulkarni, G.4
Mitchell, M.5
Yamaguchi, K.6
-
32
-
-
84986290372
-
Hierarchical recurrent neural encoder for video representation with application to captioning
-
Pan, P., Xu, Z., Yang, Y., Wu, F., & Zhuang, Y. (2016). Hierarchical recurrent neural encoder for video representation with application to captioning. In Conference on computer vision and pattern recognition (CVPR).
-
(2016)
In Conference on computer vision and pattern recognition (CVPR)
-
-
Pan, P.1
Xu, Z.2
Yang, Y.3
Wu, F.4
Zhuang, Y.5
-
33
-
-
0013363097
-
-
Papineni, K., Roukos, S., Ward, T., & Zhu, W. J. (2002). BLEU: a method for automatic evaluation of machine translation. In Proceedings of the annual meeting of the Association for Computational Linguistics (ACL).
-
(2002)
BLEU: a method for automatic evaluation of machine translation. In Proceedings of the annual meeting of the Association for Computational Linguistics (ACL)
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.J.4
-
34
-
-
84898785648
-
Grounding action descriptions in videos
-
Regneri, M., Rohrbach, M., Wetzel, D., Thater, S., Schiele, B., & Pinkal, M. (2013). Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics (TACL), 1, 25–36.
-
(2013)
Transactions of the Association for Computational Linguistics (TACL)
, vol.1
, pp. 25-36
-
-
Regneri, M.1
Rohrbach, M.2
Wetzel, D.3
Thater, S.4
Schiele, B.5
Pinkal, M.6
-
36
-
-
84959211977
-
A dataset for movie description
-
Rohrbach, A., Rohrbach, M., Tandon, N., & Schiele, B. (2015). A dataset for movie description. In Conference on computer vision and pattern recognition (CVPR).
-
(2015)
In Conference on computer vision and pattern recognition (CVPR)
-
-
Rohrbach, A.1
Rohrbach, M.2
Tandon, N.3
Schiele, B.4
-
37
-
-
84898775239
-
-
Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., & Schiele, B. (2013). Translating video content to natural language descriptions. In International conference on computer vision (ICCV).
-
(2013)
Translating video content to natural language descriptions. In International conference on computer vision (ICCV)
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
38
-
-
84947041871
-
ImageNet large scale visual recognition challenge
-
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). ImageNet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), 115(3), 211–252.
-
(2015)
International Journal of Computer Vision (IJCV)
, vol.115
, Issue.3
, pp. 211-252
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
-
42
-
-
84937522268
-
-
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al
-
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Conference on computer vision and pattern recognition (CVPR).
-
(2015)
Going deeper with convolutions. In Conference on computer vision and pattern recognition (CVPR).
-
-
-
43
-
-
84986296727
-
-
preprint arXiv:1512.02902
-
Tapaswi, M., Zhu, Y., Stiefelhagen, R., Torralba, A., Urtasun, R., & Fidler, S. (2016). Movieqa: Understanding stories in movies through question-answering. In Conference on computer vision and pattern recognition (CVPR). arXiv preprint arXiv:1512.02902.
-
(2016)
Movieqa: Understanding stories in movies through question-answering. In Conference on computer vision and pattern recognition (CVPR). arXiv
-
-
Tapaswi, M.1
Zhu, Y.2
Stiefelhagen, R.3
Torralba, A.4
Urtasun, R.5
Fidler, S.6
-
45
-
-
84973865953
-
-
Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3D convolutional networks. In International conference on computer vision (ICCV).
-
(2015)
Learning spatiotemporal features with 3D convolutional networks. In International conference on computer vision (ICCV)
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
46
-
-
84901405262
-
Joint video and text parsing for understanding events and answering queries
-
Tu, K., Meng, M., Lee, M. W., Choe, T. E., & Zhu, S. C. (2014). Joint video and text parsing for understanding events and answering queries. IEEE MultiMedia, 21(2), 42–70.
-
(2014)
IEEE MultiMedia
, vol.21
, Issue.2
, pp. 42-70
-
-
Tu, K.1
Meng, M.2
Lee, M.W.3
Choe, T.E.4
Zhu, S.C.5
-
49
-
-
84973882730
-
-
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., & Saenko, K. (2015). Sequence to sequence—video to text. In International conference on computer vision (ICCV).
-
(2015)
Sequence to sequence—video to text. In International conference on computer vision (ICCV)
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
50
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Vinyals, O., Toshev, A., Bengio, S., & Erhan, D. (2015). Show and tell: A neural image caption generator. In Conference on computer vision and pattern recognition (CVPR).
-
(2015)
In Conference on computer vision and pattern recognition (CVPR)
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
52
-
-
84876945537
-
Dense trajectories and motion boundary descriptors for action recognition
-
Wang, H., Kläser, A., Schmid, C., & Liu, C. L. (2013). Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer Vision (IJCV), 103(1), 60–79.
-
(2013)
International Journal of Computer Vision (IJCV)
, vol.103
, Issue.1
, pp. 60-79
-
-
Wang, H.1
Kläser, A.2
Schmid, C.3
Liu, C.L.4
-
53
-
-
84986320870
-
-
Wu, Q., Wang, P., Shen, C., Dick, A., & van den Hengel, A. (2016). Ask me anything: Free-form visual question answering based on knowledge from external sources. In Conference on computer vision and pattern recognition (CVPR).
-
(2016)
Ask me anything: Free-form visual question answering based on knowledge from external sources. In Conference on computer vision and pattern recognition (CVPR)
-
-
Wu, Q.1
Wang, P.2
Shen, C.3
Dick, A.4
van den Hengel, A.5
-
54
-
-
84970002232
-
-
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., & Bengio, Y. (2015a). Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning (ICML).
-
(2015)
Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning (ICML)
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudinov, R.6
Bengio, Y.7
-
56
-
-
84999792442
-
Image classification by cross-media active learning with privileged information
-
Yan, Y., Nie, F., Li, W., Gao, C., Yang, Y., & Xu, D. (2016). Image classification by cross-media active learning with privileged information. IEEE Transactions on Multimedia, 18(12), 2494–2502.
-
(2016)
IEEE Transactions on Multimedia
, vol.18
, Issue.12
, pp. 2494-2502
-
-
Yan, Y.1
Nie, F.2
Li, W.3
Gao, C.4
Yang, Y.5
Xu, D.6
-
57
-
-
72449143147
-
-
Yang, Y., Xu, D., Nie, F., Luo, J., & Zhuang, Y. (2009). Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the 17th ACM international conference on multimedia (pp. 175–184). ACM.
-
(2009)
Ranking with local regression and global alignment for cross media retrieval. In Proceedings of the 17th ACM international conference on multimedia (pp. 175–184). ACM
-
-
Yang, Y.1
Xu, D.2
Nie, F.3
Luo, J.4
Zhuang, Y.5
-
58
-
-
84973884896
-
-
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., & Courville, A. (2015). Describing videos by exploiting temporal structure. In International conference on computer vision (ICCV).
-
(2015)
Describing videos by exploiting temporal structure. In International conference on computer vision (ICCV)
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
59
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Young, P., Lai, A., Hodosh, M., & Hockenmaier, J. (2014). From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics (TACL), 2, 67–78.
-
(2014)
Transactions of the Association for Computational Linguistics (TACL)
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
61
-
-
84973892583
-
-
Yu, L., Park, E., Berg, A. C., & Berg, T. L. (2015). Visual Madlibs: Fill in the blank image generation and question answering. In International conference on computer vision (ICCV).
-
(2015)
Visual Madlibs: Fill in the blank image generation and question answering. In International conference on computer vision (ICCV)
-
-
Yu, L.1
Park, E.2
Berg, A.C.3
Berg, T.L.4
-
63
-
-
84986275767
-
-
Zhu, Y., Groth, O., Bernstein, M., & Fei-Fei, L. (2016). Visual7w: Grounded question answering in images. In Conference on computer vision and pattern recognition (CVPR).
-
(2016)
Visual7w: Grounded question answering in images. In Conference on computer vision and pattern recognition (CVPR)
-
-
Zhu, Y.1
Groth, O.2
Bernstein, M.3
Fei-Fei, L.4
-
64
-
-
84973911532
-
-
Zhu, Y., Kiros, R., Zemel, R., Salakhutdinov, R., Urtasun, R., Torralba, A., & Fidler, S. (2015). Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In International conference on computer vision (ICCV).
-
(2015)
Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In International conference on computer vision (ICCV)
-
-
Zhu, Y.1
Kiros, R.2
Zemel, R.3
Salakhutdinov, R.4
Urtasun, R.5
Torralba, A.6
Fidler, S.7
|