-
1
-
-
85075670920
-
TensorFlow: A system for large-scale machine learning
-
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265-283.
-
(2016)
OSDI
, vol.16
, pp. 265-283
-
-
Abadi, M.1
Barham, P.2
Chen, J.3
Chen, Z.4
Davis, A.5
Dean, J.6
Devin, M.7
Ghemawat, S.8
Irving, G.9
Isard, M.10
-
2
-
-
85048487879
-
Guided open vocabulary image captioning with constrained beam search
-
Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2017. Guided open vocabulary image captioning with constrained beam search. In EMNLP.
-
(2017)
EMNLP
-
-
Anderson, P.1
Fernando, B.2
Johnson, M.3
Gould, S.4
-
3
-
-
84986274522
-
Deep compositional captioning: Describing novel object categories without paired training data
-
Lisa Anne Henzdricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell, Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, et al. 2016. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR.
-
(2016)
CVPR
-
-
Henzdricks, L.A.1
Venugopalan, S.2
Rohrbach, M.3
Mooney, R.4
Saenko, K.5
Darrell, T.6
Mao, J.7
Huang, J.8
Toshev, A.9
Camburu, O.10
-
4
-
-
85116156579
-
Meteor: An automatic metric for MT evaluation with improved correlation with human judgments
-
Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In ACL-W. 65-72.
-
(2005)
ACL-W
, pp. 65-72
-
-
Banerjee, S.1
Lavie, A.2
-
5
-
-
84965179228
-
Scheduled sampling for sequence prediction with recurrent neural networks
-
Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS. 1171-1179.
-
(2015)
NIPS
, pp. 1171-1179
-
-
Bengio, S.1
Vinyals, O.2
Jaitly, N.3
Shazeer, N.4
-
6
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In CVPR. 2625-2634.
-
(2015)
CVPR
, pp. 2625-2634
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
85058215742
-
Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
-
Xuanyi Dong, Linchao Zhu, De Zhang, Yi Yang, and Fei Wu. 2018. Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering. In ACM on Multimedia.
-
(2018)
ACM on Multimedia
-
-
Dong, X.1
Zhu, L.2
Zhang, D.3
Yang, Y.4
Wu, F.5
-
8
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In ECCV. 15-29.
-
(2010)
ECCV
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
9
-
-
85046762258
-
Model-agnostic meta-learning for fast adaptation of deep networks
-
Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML. 1126-1135.
-
(2017)
ICML
, pp. 1126-1135
-
-
Finn, C.1
Abbeel, P.2
Levine, S.3
-
11
-
-
85041891404
-
Speed/accuracy trade-offs for modern convolutional object detectors
-
Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR.
-
(2017)
CVPR
-
-
Huang, J.1
Rathod, V.2
Sun, C.3
Zhu, M.4
Korattikara, A.5
Fathi, A.6
Fischer, I.7
Wojna, Z.8
Song, Y.9
Guadarrama, S.10
-
12
-
-
84962449644
-
Bridging the ultimate semantic gap: A semantic search engine for internet videos
-
Lu Jiang, Shoou-I Yu, Deyu Meng, Teruko Mitamura, and Alexander G Hauptmann. 2015. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In ICMR. 27-34.
-
(2015)
ICMR
, pp. 27-34
-
-
Jiang, L.1
Yu, S.-I.2
Meng, D.3
Mitamura, T.4
Hauptmann, A.G.5
-
13
-
-
84962796449
-
Fast and accurate content-based semantic search in 100m internet videos
-
Lu Jiang, Shoou-I Yu, Deyu Meng, Yi Yang, Teruko Mitamura, and Alexander G Hauptmann. 2015. Fast and accurate content-based semantic search in 100m internet videos. In ACM on Multimedia. 49-58.
-
(2015)
ACM on Multimedia
, pp. 49-58
-
-
Jiang, L.1
Yu, S.-I.2
Meng, D.3
Yang, Y.4
Mitamura, T.5
Hauptmann, A.G.6
-
14
-
-
84986245786
-
DenseCap: Fully convolutional localization networks for dense captioning
-
Justin Johnson, Andrej Karpathy, and Li Fei-Fei. 2016. Densecap: Fully convolutional localization networks for dense captioning. In CVPR. 4565-4574.
-
(2016)
CVPR
, pp. 4565-4574
-
-
Johnson, J.1
Karpathy, A.2
Fei-Fei, L.3
-
15
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128-3137.
-
(2015)
CVPR
, pp. 3128-3137
-
-
Karpathy, A.1
Fei-Fei, L.2
-
16
-
-
85083951076
-
ADaM: A method for stochastic optimization
-
Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
-
(2015)
ICLR
-
-
Kingma, D.P.1
Ba, J.2
-
17
-
-
84929363334
-
Multimodal neural language models
-
Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. 2014. Multimodal neural language models. In ICML. 595-603.
-
(2014)
ICML
, pp. 595-603
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.3
-
18
-
-
84887601544
-
BabyTalk: Understanding and generating simple image descriptions
-
2013
-
Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. 2013. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 12 (2013), 2891-2903.
-
(2013)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.35
, Issue.12
, pp. 2891-2903
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
19
-
-
84894522762
-
Attribute-based classification for zero-shot visual object categorization
-
2014
-
Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2014. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 453-465.
-
(2014)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.36
, Issue.3
, pp. 453-465
-
-
Lampert, C.H.1
Nickisch, H.2
Harmeling, S.3
-
20
-
-
84906493406
-
Microsoft coco: Common objects in context
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. 740-755.
-
(2014)
ECCV
, pp. 740-755
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Lawrence Zitnick, C.8
-
22
-
-
84973863256
-
Learning like a child: Fast novel visual concept learning from sentence descriptions of images
-
Junhua Mao, Xu Wei, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan L Yuille. 2015. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV. 2533-2541.
-
(2015)
ICCV
, pp. 2533-2541
-
-
Mao, J.1
Wei, X.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.L.6
-
23
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-RNN)
-
2015
-
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). ICLR (2015).
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
24
-
-
34248833974
-
Introduction to WordNet: An on-line lexical database
-
1990
-
George A Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J Miller. 1990. Introduction to WordNet: An on-line lexical database. International journal of lexicography 3, 4 (1990), 235-244.
-
(1990)
International Journal of Lexicography
, vol.3
, Issue.4
, pp. 235-244
-
-
Miller, G.A.1
Beckwith, R.2
Fellbaum, C.3
Gross, D.4
Miller, K.J.5
-
25
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Margaret Mitchell, Xufeng Han, Jesse Dodge, Alyssa Mensch, Amit Goyal, Alex Berg, Kota Yamaguchi, Tamara Berg, Karl Stratos, and Hal Daumé III. 2012. Midge: Generating Image Descriptions From Computer Vision Detections. In EACL. 747-756.
-
(2012)
EACL
, pp. 747-756
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Daumé, H.10
-
26
-
-
85162522202
-
Im2Text: Describing images using 1 million captioned photographs
-
Vicente Ordonez, Girish Kulkarni, and Tamara L Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In NIPS. 1143-1151.
-
(2011)
NIPS
, pp. 1143-1151
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
-
27
-
-
85083951479
-
Sequence level training with recurrent neural networks
-
Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In ICLR.
-
(2016)
ICLR
-
-
Ranzato, M.1
Chopra, S.2
Auli, M.3
Zaremba, W.4
-
28
-
-
84960980241
-
Faster R-CNN: Towards real-time object detection with region proposal networks
-
Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS. 91-99.
-
(2015)
NIPS
, pp. 91-99
-
-
Ren, S.1
He, K.2
Girshick, R.3
Sun, J.4
-
29
-
-
80052892795
-
Evaluating knowledge transfer and zero-shot learning in a large-scale setting
-
Marcus Rohrbach, Michael Stark, and Bernt Schiele. 2011. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In CVPR. 1641-1648.
-
(2011)
CVPR
, pp. 1641-1648
-
-
Rohrbach, M.1
Stark, M.2
Schiele, B.3
-
31
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
32
-
-
85028013193
-
Inception-v4, inception-resnet and the impact of residual connections on learning
-
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.
-
(2017)
AAAI
-
-
Szegedy, C.1
Ioffe, S.2
Vanhoucke, V.3
Alemi, A.A.4
-
33
-
-
85041928364
-
Paying attention to descriptions generated by image captioning models
-
Hamed R Tavakoliy, Rakshith Shetty, Ali Borji, and Jorma Laaksonen. 2017. Paying Attention to Descriptions Generated by Image Captioning Models. In ICCV. 2506-2515.
-
(2017)
ICCV
, pp. 2506-2515
-
-
Tavakoliy, H.R.1
Shetty, R.2
Borji, A.3
Laaksonen, J.4
-
34
-
-
85044269789
-
Captioning images with diverse objects
-
Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2017. Captioning Images with Diverse Objects. In CVPR.
-
(2017)
CVPR
-
-
Venugopalan, S.1
Hendricks, L.A.2
Rohrbach, M.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
35
-
-
84973882730
-
Sequence to sequence-video to text
-
Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence-video to text. In ICCV. 4534-4542.
-
(2015)
ICCV
, pp. 4534-4542
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
37
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR. 3156-3164.
-
(2015)
CVPR
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
38
-
-
85015770940
-
Show and Tell: Lessons learned from the 2015 MSCOCO image captioning challenge
-
April 2017
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2017. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (April 2017), 652-663.
-
(2017)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.39
, Issue.4
, pp. 652-663
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
39
-
-
85050180490
-
Zero-shot learning - A comprehensive evaluation of the good, the bad and the ugly
-
2018
-
Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata. 2018. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018), 1-1. https://doi.org/10.1109/TPAMI.2018.2857768
-
(2018)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, pp. 1
-
-
Xian, Y.1
Lampert, C.H.2
Schiele, B.3
Akata, Z.4
-
40
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML. 2048-2057.
-
(2015)
ICML
, pp. 2048-2057
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudinov, R.6
Zemel, R.7
Bengio, Y.8
-
41
-
-
85029391966
-
Incorporating copying mechanism in image captioning for learning novel objects
-
Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2017. Incorporating copying mechanism in image captioning for learning novel objects. In CVPR. 5263-5271.
-
(2017)
CVPR
, pp. 5263-5271
-
-
Yao, T.1
Pan, Y.2
Li, Y.3
Mei, T.4
-
42
-
-
84986317307
-
Image captioning with semantic attention
-
Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR. 4651-4659.
-
(2016)
CVPR
, pp. 4651-4659
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
43
-
-
85023773582
-
Uncovering the temporal context for video question answering
-
01 Sep 2017
-
Linchao Zhu, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2017. Uncovering the Temporal Context for Video Question Answering. International Journal of Computer Vision 124, 3 (01 Sep 2017), 409-421. https://doi.org/10.1007/s11263-017-1033-7
-
(2017)
International Journal of Computer Vision
, vol.124
, Issue.3
, pp. 409-421
-
-
Zhu, L.1
Xu, Z.2
Yang, Y.3
Hauptmann, A.G.4
|