-
1
-
-
84986274522
-
Deep compositional captioning: Describing novel object categories without paired training data
-
Lisa Anne Hendricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell, Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, et al. 2016. Deep Compositional Captioning: Describing Novel Object Categories without Paired Training Data. In CVPR.
-
(2016)
CVPR
-
-
Hendricks, L.A.1
Venugopalan, S.2
Rohrbach, M.3
Mooney, R.4
Saenko, K.5
Darrell, T.6
Mao, J.7
Huang, J.8
Toshev, A.9
Camburu, O.10
-
2
-
-
84973890960
-
VQA: Visual question answering
-
Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. VQA: Visual Question Answering. In ICCV.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Lawrence Zitnick, C.6
Parikh, D.7
-
4
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. In EMNLP.
-
(2014)
EMNLP
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
5
-
-
85035226122
-
More is less: A more complicated network with less inference complexity
-
Xuanyi Dong, Junshi Huang, Yi Yang, and Shuicheng Yan. 2017. More Is Less: A More Complicated Network With Less Inference Complexity. In CVPR.
-
(2017)
CVPR
-
-
Dong, X.1
Huang, J.2
Yang, Y.3
Yan, S.4
-
6
-
-
85035243254
-
A dual-network progressive approach to weakly supervised object detection
-
Xuanyi Dong, Deyu Meng, Fan Ma, and Yi Yang. 2017. A dual-network progressive approach to weakly supervised object detection. In ACM on Multimedia.
-
(2017)
ACM on Multimedia
-
-
Dong, X.1
Meng, D.2
Ma, F.3
Yang, Y.4
-
7
-
-
85055682668
-
Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors
-
Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, and Yaser Sheikh. 2018. Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors. In CVPR.
-
(2018)
CVPR
-
-
Dong, X.1
Yu, S.-I.2
Weng, X.3
Wei, S.-E.4
Yang, Y.5
Sheikh, Y.6
-
9
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every Picture Tells a Story: Generating Sentences from Images. In ECCV.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
10
-
-
85041899497
-
Model-agnostic meta-learning for fast adaptation of deep networks
-
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML.
-
(2017)
ICML
-
-
Finn, C.1
Abbeel, P.2
Levine, S.3
-
11
-
-
85041900002
-
Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering
-
Yash Goyal, Tejas Khot, Douglas Summers-Stay, Dhruv Batra, and Devi Parikh. 2017. Making the V in VQA matter: Elevating the role of image understanding in Visual Question Answering. In CVPR.
-
(2017)
CVPR
-
-
Goyal, Y.1
Khot, T.2
Summers-Stay, D.3
Batra, D.4
Parikh, D.5
-
12
-
-
84973911419
-
Delving deep into rectifiers: Surpassing human-level performance on imageNet classification
-
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-level Performance on ImageNet Classification. In ICCV.
-
(2015)
ICCV
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
13
-
-
84986274465
-
Deep residual learning for image recognition
-
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
15
-
-
85041904328
-
Learning to reason: End-to-end module networks for visual question answering
-
Ronghang Hu, Jacob Andreas, Marcus Rohrbach, Trevor Darrell, and Kate Saenko. 2017. Learning to Reason: End-To-End Module Networks for Visual Question Answering. In ICCV.
-
(2017)
ICCV
-
-
Hu, R.1
Andreas, J.2
Rohrbach, M.3
Darrell, T.4
Saenko, K.5
-
17
-
-
85041925505
-
Arbitrary style transfer in real-time with adaptive instance normalization
-
Xun Huang and Serge Belongie. 2017. Arbitrary Style Transfer in Real-Time With Adaptive Instance Normalization. In ICCV.
-
(2017)
ICCV
-
-
Huang, X.1
Belongie, S.2
-
18
-
-
85083951076
-
ADaM: A method for stochastic optimization
-
Diederik P Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In ICLR.
-
(2015)
ICLR
-
-
Kingma, D.P.1
Ba, J.2
-
19
-
-
85011596790
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
2017
-
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual Genome: Connecting Language and Vision using Crowdsourced Dense Image Annotations. International Journal of Computer Vision (2017).
-
(2017)
International Journal of Computer Vision
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
20
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
21
-
-
84887601544
-
BabyTalk: Understanding and generating simple image descriptions
-
2013
-
Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. 2013. BabyTalk: Understanding and Generating Simple Image Descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence (2013). https://doi.org/10.1109/TPAMI.2012.162
-
(2013)
IEEE Transactions on Pattern Analysis and Machine Intelligence
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
22
-
-
85016291927
-
Multimodal word meaning induction from minimal exposure to natural text
-
2017
-
Angeliki Lazaridou, Marco Marelli, and Marco Baroni. 2017. Multimodal Word Meaning Induction from Minimal Exposure to Natural Text. Cognitive Science (2017).
-
(2017)
Cognitive Science
-
-
Lazaridou, A.1
Marelli, M.2
Baroni, M.3
-
23
-
-
85030216293
-
Temporal convolutional networks for action segmentation and detection
-
Colin Lea, Michael D. Flynn, Rene Vidal, Austin Reiter, and Gregory D. Hager. 2017. Temporal Convolutional Networks for Action Segmentation and Detection. In CVPR.
-
(2017)
CVPR
-
-
Lea, C.1
Flynn, M.D.2
Vidal, R.3
Reiter, A.4
Hager, G.D.5
-
24
-
-
84937834115
-
Microsoft COCO: Common objects in Context
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in Context. In ECCV.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Lawrence Zitnick, C.8
-
25
-
-
85058225480
-
Exploring disentangled feature representation beyond face identification
-
Yu Liu, Fangyin Wei, Jing Shao, Lu Sheng, Junjie Yan, and Xiaogang Wang. 2018. Exploring Disentangled Feature Representation beyond Face Identification. In CVPR.
-
(2018)
CVPR
-
-
Liu, Y.1
Wei, F.2
Shao, J.3
Sheng, L.4
Yan, J.5
Wang, X.6
-
27
-
-
84973863256
-
Learning like a child: Fast novel visual concept learning from sentence descriptions of images
-
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images. In ICCV.
-
(2015)
ICCV
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
28
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Margaret Mitchell, Xufeng Han, Jesse Dodge, Alyssa Mensch, Amit Goyal, Alex Berg, Kota Yamaguchi, Tamara Berg, Karl Stratos, and Hal Daumé III. 2012. Midge: Generating Image Descriptions from Computer Vision Detections. In EACL.
-
(2012)
EACL
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Daumé, H.10
-
30
-
-
84986261711
-
Image question answering using convolutional neural network with dynamic parameter prediction
-
Hyeonwoo Noh, Paul Hongsuck Seo, and Bohyung Han. 2016. Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction. In CVPR.
-
(2016)
CVPR
-
-
Noh, H.1
Seo, P.H.2
Han, B.3
-
32
-
-
0032983160
-
On the momentum term in gradient descent learning algorithms
-
1999
-
Ning Qian. 1999. On the momentum term in gradient descent learning algorithms. Neural Networks (1999).
-
(1999)
Neural Networks
-
-
Qian, N.1
-
33
-
-
85044297988
-
An empirical evaluation of visual question answering for novel objects
-
Santhosh K Ramakrishnan, Ambar Pal, Gaurav Sharma, and Anurag Mittal. 2017. An Empirical Evaluation of Visual Question Answering for Novel Objects. In CVPR.
-
(2017)
CVPR
-
-
Ramakrishnan, S.K.1
Pal, A.2
Sharma, G.3
Mittal, A.4
-
34
-
-
85041901997
-
Optimization as a Model for Few-shot Learning
-
Sachin Ravi and Hugo Larochelle. 2017. Optimization as a Model for Few-shot Learning. In ICLR.
-
(2017)
ICLR
-
-
Ravi, S.1
Larochelle, H.2
-
35
-
-
84965170394
-
Exploring models and data for image question answering
-
Mengye Ren, Ryan Kiros, and Richard Zemel. 2015. Exploring Models and Data for Image Question Answering. In NIPS.
-
(2015)
NIPS
-
-
Ren, M.1
Kiros, R.2
Zemel, R.3
-
36
-
-
84986327457
-
Where to look: Focus regions for visual question answering
-
Kevin J Shih, Saurabh Singh, and Derek Hoiem. 2016. Where to Look: Focus Regions for Visual Question Answering. In CVPR.
-
(2016)
CVPR
-
-
Shih, K.J.1
Singh, S.2
Hoiem, D.3
-
37
-
-
85046993347
-
Prototypical Networks for Few-shot Learning
-
Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical Networks for Few-shot Learning. In NIPS.
-
(2017)
NIPS
-
-
Snell, J.1
Swersky, K.2
Zemel, R.3
-
39
-
-
85040312182
-
Graph-structured representations for visual question answering
-
Damien Teney, Lingqiao Liu, and Anton van den Hengel. 2017. Graph-Structured Representations for Visual Question Answering. In CVPR.
-
(2017)
CVPR
-
-
Teney, D.1
Liu, L.2
Van Den Hengel, A.3
-
41
-
-
85058211108
-
Distributional modeling on a diet: One-shot Word Learning from Text only
-
Su Wang, Stephen Roller, and Katrin Erk. 2017. Distributional Modeling on a Diet: One-shot Word Learning from Text only. In IJCNLP.
-
(2017)
IJCNLP
-
-
Wang, S.1
Roller, S.2
Erk, K.3
-
42
-
-
85058224842
-
Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning
-
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the Unknown Gradually: One-Shot Video-Based Person Re-Identification by Stepwise Learning. In CVPR.
-
(2018)
CVPR
-
-
Wu, Y.1
Lin, Y.2
Dong, X.3
Yan, Y.4
Ouyang, W.5
Yang, Y.6
-
44
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In ICML.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudinov, R.6
Zemel, R.7
Bengio, Y.8
-
45
-
-
85044454030
-
Few-shot object recognition from machine-labeled web images
-
Zhongwen Xu, Linchao Zhu, and Yi Yang. 2017. Few-shot object recognition from machine-labeled web images. In CVPR.
-
(2017)
CVPR
-
-
Xu, Z.1
Zhu, L.2
Yang, Y.3
-
46
-
-
84986334021
-
Stacked attention networks for image question answering
-
Zichao Yang, Xiaodong He, Jianfeng Gao, Li Deng, and Alex Smola. 2016. Stacked Attention Networks for Image Question Answering. In CVPR.
-
(2016)
CVPR
-
-
Yang, Z.1
He, X.2
Gao, J.3
Deng, L.4
Smola, A.5
-
47
-
-
85029391966
-
Incorporating copying mechanism in image captioning for learning novel objects
-
Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2017. Incorporating Copying Mechanism in Image Captioning for Learning Novel Objects. In CVPR.
-
(2017)
CVPR
-
-
Yao, T.1
Pan, Y.2
Li, Y.3
Mei, T.4
-
48
-
-
84986317307
-
Image captioning with semantic attention
-
Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image Captioning with Semantic Attention. In CVPR.
-
(2016)
CVPR
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
49
-
-
84973892583
-
Visual madlibs: Fill in the blank description generation and question answering
-
Licheng Yu, Eunbyung Park, Alexander C Berg, and Tamara L Berg. 2015. Visual Madlibs: Fill in the Blank Description Generation and Question Answering. In ICCV.
-
(2015)
ICCV
-
-
Yu, L.1
Park, E.2
Berg, A.C.3
Berg, T.L.4
-
50
-
-
85083950177
-
Learning to count objects in natural images for visual question answering
-
Yan Zhang, Jonathon Hare, and Adam Prügel-Bennett. 2018. Learning to Count Objects in Natural Images for Visual Question Answering. In ICLR.
-
(2018)
ICLR
-
-
Zhang, Y.1
Hare, J.2
Prügel-Bennett, A.3
-
51
-
-
85053373905
-
Camera style adaptation for person re-identification
-
Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. 2018. Camera Style Adaptation for Person Re-Identification. In CVPR.
-
(2018)
CVPR
-
-
Zhong, Z.1
Zheng, L.2
Zheng, Z.3
Li, S.4
Yang, Y.5
-
52
-
-
84986275767
-
Visual7W: Grounded question answering in images
-
Yuke Zhu, Oliver Groth, Michael Bernstein, and Li Fei-Fei. 2016. Visual7w: Grounded question answering in images. In CVPR.
-
(2016)
CVPR
-
-
Zhu, Y.1
Groth, O.2
Bernstein, M.3
Fei-Fei, L.4
|