-
3
-
-
84973890960
-
VQA: Visual question answering
-
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
5
-
-
85198028989
-
ImageNet: A large-scale hierarchical image database
-
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A Large-Scale Hierarchical Image Database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
-
(2009)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
6
-
-
84904482223
-
DeCAF: A deep convolutional activation feature for generic visual recognition
-
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2013. Decaf: A deep convolutional activation feature for generic visual recognition. In Proceedings of the International Conference on Machine Learning (ICML).
-
(2013)
Proceedings of the International Conference on Machine Learning (ICML)
-
-
Donahue, J.1
Jia, Y.2
Vinyals, O.3
Hoffman, J.4
Zhang, N.5
Tzeng, E.6
Darrell, T.7
-
7
-
-
77649188328
-
The segmented and annotated iapr tc-12 benchmark
-
Hugo Jair Escalante, Carlos A Hernández, Jesus A Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F Morales, L Enrique Sucar, Luis Villaseñor, and Michael Grubinger. 2010. The segmented and annotated iapr tc-12 benchmark. Computer Vision and Image Understanding, 114(4):419-428.
-
(2010)
Computer Vision and Image Understanding
, vol.114
, Issue.4
, pp. 419-428
-
-
Escalante, H.J.1
Hernández, C.A.2
Gonzalez, J.A.3
López-López, A.4
Montes, M.5
Morales, E.F.6
Enrique Sucar, L.7
Villaseñor, L.8
Grubinger, M.9
-
8
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
Andrea Frome, Greg S Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Tomas Mikolov, et al. 2013. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems (NIPS).
-
(2013)
Advances in Neural Information Processing Systems (NIPS)
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
12
-
-
38049183286
-
The iapr tc-12 benchmark: A new evaluation resource for visual information systems
-
Michael Grubinger, Paul Clough, Henning Müller, and Thomas Deselaers. 2006. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, volume 5, page 10.
-
(2006)
International Workshop OntoImage
, vol.5
, pp. 10
-
-
Grubinger, M.1
Clough, P.2
Müller, H.3
Deselaers, T.4
-
13
-
-
10044285992
-
Canonical correlation analysis: An overview with application to learning methods
-
David R Hardoon, Sandor Szedmak, and John Shawe-Taylor. 2004. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639-2664.
-
(2004)
Neural Computation
, vol.16
, Issue.12
, pp. 2639-2664
-
-
Hardoon, D.R.1
Szedmak, S.2
Shawe-Taylor, J.3
-
17
-
-
84986305787
-
Natural language object retrieval
-
Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016b. Natural language object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Hu, R.1
Xu, H.2
Rohrbach, M.3
Feng, J.4
Saenko, K.5
Darrell, T.6
-
24
-
-
84965153327
-
Skip-thought vectors
-
Ryan Kiros, Yukun Zhu, Ruslan Salakhutdinov, Richard S Zemel, Antonio Torralba, Raquel Urtasun, and Sanja Fidler. 2015. Skip-thought vectors. In Advances in Neural Information Processing Systems (NIPS).
-
(2015)
Advances in Neural Information Processing Systems (NIPS)
-
-
Kiros, R.1
Zhu, Y.2
Salakhutdinov, R.3
Zemel, R.S.4
Torralba, A.5
Urtasun, R.6
Fidler, S.7
-
26
-
-
84978730111
-
-
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. 2016. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv:1602.07332.
-
(2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
Bernstein, M.11
Fei-Fei, L.12
-
27
-
-
84998698731
-
Ask me Anything: Dynamic memory networks for natural language processing
-
Ankit Kumar, Ozan Irsoy, Jonathan Su, James Bradbury, Robert English, Brian Pierce, Peter Ondruska, Ishaan Gulrajani, and Richard Socher. 2016. Ask me anything: Dynamic memory networks for natural language processing. In Proceedings of the International Conference on Machine Learning (ICML).
-
(2016)
Proceedings of the International Conference on Machine Learning (ICML)
-
-
Kumar, A.1
Irsoy, O.2
Su, J.3
Bradbury, J.4
English, R.5
Pierce, B.6
Ondruska, P.7
Gulrajani, I.8
Socher, R.9
-
28
-
-
84937834115
-
Microsoft coco: Common objects in context
-
Tsung-Yi Lin, Michael Maire, Serge Be-longie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision (ECCV).
-
(2014)
Proceedings of the European Conference on Computer Vision (ECCV)
-
-
Lin, T.-Y.1
Maire, M.2
Be-Longie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Lawrence Zitnick, C.8
-
32
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In Proceedings of the International Conference on Learning Representations (ICLR).
-
(2015)
Proceedings of the International Conference on Learning Representations (ICLR)
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
33
-
-
80053437179
-
Multimodal deep learning
-
Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y Ng. 2011. Multimodal deep learning. In Proceedings of the International Conference on Machine Learning (ICML), pages 689-696.
-
(2011)
Proceedings of the International Conference on Machine Learning (ICML)
, pp. 689-696
-
-
Ngiam, J.1
Khosla, A.2
Kim, M.3
Nam, J.4
Lee, H.5
Ng, A.Y.6
-
36
-
-
85023199520
-
Fast and scalable polynomial kernels via explicit feature maps
-
New York, NY, USA. ACM
-
Ninh Pham and Rasmus Pagh. 2013. Fast and scalable polynomial kernels via explicit feature maps. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'13, pages 239-247, New York, NY, USA. ACM.
-
(2013)
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD'13
, pp. 239-247
-
-
Pham, N.1
Pagh, R.2
-
37
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
-
Bryan Plummer, Liwei Wang, Chris Cervantes, Juan Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV).
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
-
-
Plummer, B.1
Wang, L.2
Cervantes, C.3
Caicedo, J.4
Hockenmaier, J.5
Lazebnik, S.6
-
38
-
-
84990043973
-
-
Bryan Plummer, Liwei Wang, Chris Cervantes, Juan Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2016. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. arXiv:1505.04870v3.
-
(2016)
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
-
-
Plummer, B.1
Wang, L.2
Cervantes, C.3
Caicedo, J.4
Hockenmaier, J.5
Lazebnik, S.6
-
41
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
Richard Socher, Andrej Karpathy, Quoc V Le, Christopher D Manning, and Andrew Y Ng. 2014. Grounded compositional semantics for finding and describing images with sentences. Transactions of the Association for Computational Linguistics, 2:207-218.
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, pp. 207-218
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
43
-
-
0034202338
-
Separating style and content with bilinear models
-
Joshua B Tenenbaum and William T Freeman. 2000. Separating style and content with bilinear models. Neural computation, 12(6):1247-1283.
-
(2000)
Neural Computation
, vol.12
, Issue.6
, pp. 1247-1283
-
-
Tenenbaum, J.B.1
Freeman, W.T.2
-
44
-
-
84949572890
-
-
CoRR, abs/1503.01817
-
Bart Thomee, David A. Shamma, Gerald Friedland, Benjamin Elizalde, Karl Ni, Douglas Poland, Damian Borth, and Li-Jia Li. 2015. The new data and new challenges in multimedia research. CoRR, abs/1503.01817.
-
(2015)
The New Data and New Challenges in Multimedia Research
-
-
Thomee, B.1
Shamma, D.A.2
Friedland, G.3
Elizalde, B.4
Ni, K.5
Poland, D.6
Borth, D.7
Li, L.-J.8
-
45
-
-
84881160857
-
Selective search for object recognition
-
Jasper RR Uijlings, Koen EA van de Sande, Theo Gevers, and Arnold WM Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision (IJCV), 104(2).
-
(2013)
International Journal of Computer Vision (IJCV)
, vol.104
, Issue.2
-
-
Uijlings, J.R.R.1
Van De Sande, K.E.A.2
Gevers, T.3
Smeulders, A.W.M.4
-
51
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Kelvin Xu, Jimmy Ba, Ryan Kiros, Aaron Courville, Ruslan Salakhutdinov, Richard Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. Proceedings of the International Conference on Machine Learning (ICML).
-
(2015)
Proceedings of the International Conference on Machine Learning (ICML)
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
|