-
2
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In IEEE conference on computer vision and pattern recognition (CVPR), pages 2625–2634.
-
(2015)
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 2625-2634
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
3
-
-
84906929591
-
Image description using visual dependency representations
-
Desmond Elliott and Frank Keller. 2013. Image description using visual dependency representations. In EMNLP, volume 13, pages 1292–1302.
-
(2013)
EMNLP
, vol.13
, pp. 1292-1302
-
-
Elliott, D.1
Keller, F.2
-
5
-
-
84959250180
-
From captions to visual concepts and back
-
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C Platt, et al. 2015. From captions to visual concepts and back. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1473–1482.
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1473-1482
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
6
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Springer
-
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In European conference on computer vision, pages 15–29. Springer.
-
(2010)
European Conference on Computer Vision
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
9
-
-
84986305787
-
Natural language object retrieval
-
Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016. Natural language object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4555–4564.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 4555-4564
-
-
Hu, R.1
Xu, H.2
Rohrbach, M.3
Feng, J.4
Saenko, K.5
Darrell, T.6
-
10
-
-
85012016847
-
Summarizing source code using a neural attention model
-
Berlin, Germany. Association for Computational Linguistics
-
Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2073–2083, Berlin, Germany. Association for Computational Linguistics.
-
(2016)
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
, pp. 2073-2083
-
-
Iyer, S.1
Konstas, I.2
Cheung, A.3
Zettlemoyer, L.4
-
12
-
-
85073158456
-
-
Andrej Karpathy et al. 2016. Neuraltalk2. https://github.com/karpathy/neuraltalk2/.
-
(2016)
Neuraltalk2
-
-
Karpathy, A.1
-
14
-
-
85011596790
-
Visual genome: Connecting language and vision using crowdsourced dense image annotations
-
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1):32–73.
-
(2017)
International Journal of Computer Vision
, vol.123
, Issue.1
, pp. 32-73
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
-
17
-
-
84906493406
-
Microsoft coco: Common objects in context
-
Springer
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV, pages 740–755. Springer.
-
(2014)
ECCV
, pp. 740-755
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Lawrence Zitnick, C.8
-
18
-
-
85039172448
-
Effective approaches to attention-based neural machine translation
-
abs/1508.04025
-
Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. CoRR, abs/1508.04025.
-
(2015)
CoRR
-
-
Luong, M.-T.1
Pham, H.2
Manning, C.D.3
-
19
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). ICLR.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
20
-
-
84906925144
-
Nonparametric method for data-driven image captioning
-
Rebecca Mason and Eugene Charniak. 2014. Nonparametric method for data-driven image captioning. In ACL (2), pages 592–598.
-
(2014)
ACL
, Issue.2
, pp. 592-598
-
-
Mason, R.1
Charniak, E.2
-
21
-
-
84986247766
-
Large scale retrieval and generation of image descriptions
-
Vicente Ordonez, Xufeng Han, Polina Kuznetsova, Girish Kulkarni, Margaret Mitchell, Kota Yamaguchi, Karl Stratos, Amit Goyal, Jesse Dodge, Alyssa Mensch, III Daume, Hal, Alexander C. Berg, Yejin Choi, and Tamara L. Berg. 2015. Large scale retrieval and generation of image descriptions. International Journal of Computer Vision, pages 1–14.
-
(2015)
International Journal of Computer Vision
, pp. 1-14
-
-
Ordonez, V.1
Han, X.2
Kuznetsova, P.3
Kulkarni, G.4
Mitchell, M.5
Yamaguchi, K.6
Stratos, K.7
Goyal, A.8
Dodge, J.9
Mensch, A.10
Daume, H.11
Berg, A.C.12
Choi, Y.13
Berg, T.L.14
-
24
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
-
Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision, pages 2641–2649.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 2641-2649
-
-
Plummer, B.A.1
Wang, L.2
Cervantes, C.M.3
Caicedo, J.C.4
Hockenmaier, J.5
Lazebnik, S.6
-
25
-
-
84959897086
-
Combining geometric, textual and visual features for predicting prepositions in image descriptions
-
Association for Computational Linguistics
-
Arnau Ramisa, JK Wang, Ying Lu, Emmanuel Dellandrea, Francesc Moreno-Noguer, and Robert Gaizauskas. 2015. Combining geometric, textual and visual features for predicting prepositions in image descriptions. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 214–220. Association for Computational Linguistics.
-
(2015)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
, pp. 214-220
-
-
Ramisa, A.1
Wang, J.K.2
Lu, Y.3
Dellandrea, E.4
Moreno-Noguer, F.5
Gaizauskas, R.6
-
27
-
-
84990068682
-
Grounding of textual phrases in images by reconstruction
-
Springer
-
Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele. 2016. Grounding of textual phrases in images by reconstruction. In European Conference on Computer Vision, pages 817–834. Springer.
-
(2016)
European Conference on Computer Vision
, pp. 817-834
-
-
Rohrbach, A.1
Rohrbach, M.2
Hu, R.3
Darrell, T.4
Schiele, B.5
-
28
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252.
-
(2015)
International Journal of Computer Vision
, vol.115
, Issue.3
, pp. 211-252
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
-
29
-
-
85072830763
-
Word ordering without syntax
-
Austin, Texas
-
Allen Schmaltz, Alexander M. Rush, and Stuart Shieber. 2016. Word ordering without syntax. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2319–2324, Austin, Texas.
-
(2016)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
, pp. 2319-2324
-
-
Schmaltz, A.1
Rush, A.M.2
Shieber, S.3
-
33
-
-
84973926486
-
Learning common sense through visual abstraction
-
Ramakrishna Vedantam, Xiao Lin, Tanmay Batra, C Lawrence Zitnick, and Devi Parikh. 2015b. Learning common sense through visual abstraction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2542–2550.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision (ICCV)
, pp. 2542-2550
-
-
Vedantam, R.1
Lin, X.2
Batra, T.3
Lawrence Zitnick, C.4
Parikh, D.5
-
35
-
-
84959897734
-
Semantically conditioned lstm-based natural language generation for spoken dialogue systems
-
Lisbon, Portugal
-
Tsung-Hsien Wen, Milica Gasic, Nikola Mrkšić, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1711–1721, Lisbon, Portugal.
-
(2015)
Conference on Empirical Methods in Natural Language Processing (EMNLP)
, pp. 1711-1721
-
-
Wen, T.-H.1
Gasic, M.2
Mrkšić, N.3
Su, P.-H.4
Vandyke, D.5
Young, S.6
-
36
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, pages 2048–2057.
-
(2015)
International Conference on Machine Learning
, pp. 2048-2057
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudinov, R.6
Zemel, R.7
Bengio, Y.8
-
39
-
-
84994129838
-
Stating the obvious: Extracting visual common sense knowledge
-
San Diego, California. Association for Computational Linguistics
-
Mark Yatskar, Vicente Ordonez, and Ali Farhadi. 2016. Stating the obvious: Extracting visual common sense knowledge. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 193–198, San Diego, California. Association for Computational Linguistics.
-
(2016)
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
, pp. 193-198
-
-
Yatskar, M.1
Ordonez, V.2
Farhadi, A.3
-
40
-
-
84986317307
-
Image captioning with semantic attention
-
Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4651–4659.
-
(2016)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 4651-4659
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
|