-
3
-
-
85006390452
-
-
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Antonio Torralba, and Aude Oliva. Places: An image database for deep scene understanding. arXiv preprint arXiv: 1610.02055, 2016.
-
(2016)
Places: An image database for deep scene understanding
-
-
Zhou, B.1
Khosla, A.2
Lapedriza, A.3
Torralba, A.4
Oliva, A.5
-
4
-
-
84911443783
-
Panda: Pose aligned networks for deep attribute modeling
-
Ning Zhang, Manohar Paluri, Marc'Aurelio Ranzato, Trevor Darrell, and Lubomir Bourdev. Panda: Pose aligned networks for deep attribute modeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1637-1644, 2014.
-
(2014)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1637-1644
-
-
Zhang, N.1
Paluri, M.2
Ranzato, M.3
Darrell, T.4
Bourdev, L.5
-
5
-
-
84978730111
-
-
Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, Michael Bernstein, and Li Fei-Fei. Visual genome: Connecting language and vision using crowdsourced dense image annotations. 2016.
-
(2016)
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
-
-
Krishna, R.1
Zhu, Y.2
Groth, O.3
Johnson, J.4
Hata, K.5
Kravitz, J.6
Chen, S.7
Kalantidis, Y.8
Li, L.-J.9
Shamma, D.A.10
Bernstein, M.11
Fei-Fei, L.12
-
8
-
-
57149125139
-
Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
-
Springer
-
Abhinav Gupta and Larry S Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In European conference on computer vision, pages 16-29. Springer, 2008.
-
(2008)
European conference on computer vision
, pp. 16-29
-
-
Gupta, A.1
Davis, L.S.2
-
9
-
-
84959233256
-
Image retrieval using scene graphs
-
IEEE
-
Justin Johnson, Ranjay Krishna, Michael Stark, Li-Jia Li, David A Shamma, Michael S Bernstein, and Li Fei-Fei. Image retrieval using scene graphs. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3668-3678. IEEE, 2015.
-
(2015)
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 3668-3678
-
-
Johnson, J.1
Krishna, R.2
Stark, M.3
Li, L.-J.4
Shamma, D.A.5
Bernstein, M.S.6
Fei-Fei, L.7
-
10
-
-
51949110976
-
Object categorization using co-occurrence, location and appearance
-
IEEE
-
Carolina Galleguillos, Andrew Rabinovich, and Serge Belongie. Object categorization using co-occurrence, location and appearance. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE, 2008.
-
(2008)
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on
, pp. 1-8
-
-
Galleguillos, C.1
Rabinovich, A.2
Belongie, S.3
-
11
-
-
84887394346
-
Understanding indoor scenes using 3d geometric phrases
-
Wongun Choi, Yu-Wei Chao, Caroline Pantofaru, and Silvio Savarese. Understanding indoor scenes using 3d geometric phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 33-40, 2013.
-
(2013)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 33-40
-
-
Choi, W.1
Chao, Y.-W.2
Pantofaru, C.3
Savarese, S.4
-
12
-
-
80052901011
-
Baby talk: Understanding and generating image descriptions
-
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. Baby talk: Understanding and generating image descriptions. In Proceedings of the 24th CVPR. Citeseer, 2011.
-
(2011)
Proceedings of the 24th CVPR. Citeseer
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
15
-
-
84898785648
-
Grounding action descriptions in videos
-
Michaela Regneri, Marcus Rohrbach, Dominikus Wetzel, Stefan Thater, Bernt Schiele, and Manfred Pinkal. Grounding action descriptions in videos. Transactions of the Association for Computational Linguistics, 1: 25-36, 2013.
-
(2013)
Transactions of the Association for Computational Linguistics
, vol.1
, pp. 25-36
-
-
Regneri, M.1
Rohrbach, M.2
Wetzel, D.3
Thater, S.4
Schiele, B.5
Pinkal, M.6
-
16
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
Jesse Thomason, Subhashini Venugopalan, Sergio Guadarrama, Kate Saenko, and Raymond J Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, volume 2, page 9, 2014.
-
(2014)
COLING
, vol.2
, pp. 9
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.J.5
-
17
-
-
84959233994
-
Learning semantic relationships for better action retrieval in images
-
IEEE
-
Vignesh Ramanathan, Congcong Li, Jia Deng, Wei Han, Zhen Li, Kunlong Gu, Yang Song, Samy Bengio, Chuck Rossenberg, and Li Fei-Fei. Learning semantic relationships for better action retrieval in images. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1100-1109. IEEE, 2015.
-
(2015)
2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, pp. 1100-1109
-
-
Ramanathan, V.1
Li, C.2
Deng, J.3
Han, W.4
Li, Z.5
Gu, K.6
Song, Y.7
Bengio, S.8
Rossenberg, C.9
Fei-Fei, L.10
-
18
-
-
84898775239
-
Translating video content to natural language descriptions
-
Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, and Bernt Schiele. Translating video content to natural language descriptions. In Proceedings of the IEEE International Conference on Computer Vision, pages 433-440, 2013.
-
(2013)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 433-440
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
19
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
-
Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In Proceedings of the IEEE International Conference on Computer Vision, pages 2712-2719, 2013.
-
(2013)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 2712-2719
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
21
-
-
85044252753
-
-
Mohamed Elhoseiny, Scott Cohen, Walter Chang, Brian Price, and Ahmed Elgammal. Sherlock: Scalable fact learning in images. arXiv preprint arXiv: 1511.04891, 2015.
-
(2015)
Sherlock: Scalable Fact Learning in Images
-
-
Elhoseiny, M.1
Cohen, S.2
Chang, W.3
Price, B.4
Elgammal, A.5
-
22
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Springer
-
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. Every picture tells a story: Generating sentences from images. In European Conference on Computer Vision, pages 15-29. Springer, 2010.
-
(2010)
European Conference on Computer Vision
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
23
-
-
84959226544
-
Recognize complex events from static images by fusing deep channels
-
Yuanjun Xiong, Kai Zhu, Dahua Lin, and Xiaoou Tang. Recognize complex events from static images by fusing deep channels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1600-1609, 2015.
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1600-1609
-
-
Xiong, Y.1
Zhu, K.2
Lin, D.3
Tang, X.4
-
25
-
-
33745938597
-
Discovering objects and their location in images
-
IEEE
-
Josef Sivic, Bryan C Russell, Alexei A Efros, Andrew Zisserman, and William T Freeman. Discovering objects and their location in images. In Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1, volume 1, pages 370-377. IEEE, 2005.
-
(2005)
Tenth IEEE International Conference on Computer Vision (ICCV'05)
, vol.1
, Issue.1
, pp. 370-377
-
-
Sivic, J.1
Russell, B.C.2
Efros, A.A.3
Zisserman, A.4
Freeman, W.T.5
-
27
-
-
77956006912
-
Exploiting hierarchical context on a large database of object categories
-
IEEE
-
Myung Jin Choi, Joseph J Lim, Antonio Torralba, and Alan S Willsky. Exploiting hierarchical context on a large database of object categories. In Computer vision and pattern recognition (CVPR), 2010 IEEE conference on, pages 129-136. IEEE, 2010.
-
(2010)
Computer vision and pattern recognition (CVPR), 2010 IEEE conference on
, pp. 129-136
-
-
Choi, M.J.1
Lim, J.J.2
Torralba, A.3
Willsky, A.S.4
-
28
-
-
78149343534
-
Graph cut based inference with cooccurrence statistics
-
Springer
-
Lubor Ladicky, Chris Russell, Pushmeet Kohli, and Philip HS Torr. Graph cut based inference with cooccurrence statistics. In European Conference on Computer Vision, pages 239-253. Springer, 2010.
-
(2010)
European Conference on Computer Vision
, pp. 239-253
-
-
Ladicky, L.1
Russell, C.2
Kohli, P.3
Torr, P.H.S.4
-
29
-
-
80052905403
-
Learning to share visual appearance for multiclass object detection
-
IEEE
-
Ruslan Salakhutdinov, Antonio Torralba, and Josh Tenenbaum. Learning to share visual appearance for multiclass object detection. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pages 1481-1488. IEEE, 2011.
-
(2011)
Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on
, pp. 1481-1488
-
-
Salakhutdinov, R.1
Torralba, A.2
Tenenbaum, J.3
-
30
-
-
50649096757
-
Objects in context
-
IEEE
-
Andrew Rabinovich, Andrea Vedaldi, Carolina Galleguillos, Eric Wiewiora, and Serge Belongie. Objects in context. In 2007 IEEE 11th International Conference on Computer Vision, pages 1-8. IEEE, 2007.
-
(2007)
2007 IEEE 11th International Conference on Computer Vision
, pp. 1-8
-
-
Rabinovich, A.1
Vedaldi, A.2
Galleguillos, C.3
Wiewiora, E.4
Belongie, S.5
-
32
-
-
33845596932
-
Using multiple segmentations to discover objects and their extent in image collections
-
IEEE
-
Bryan C Russell, William T Freeman, Alexei A Efros, Josef Sivic, and Andrew Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06), volume 2, pages 1605-1614. IEEE, 2006.
-
(2006)
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06)
, vol.2
, pp. 1605-1614
-
-
Russell, B.C.1
Freeman, W.T.2
Efros, A.A.3
Sivic, J.4
Zisserman, A.5
-
34
-
-
84894905366
-
A multi-view embedding space for modeling internet images, tags, and their semantics
-
Yunchao Gong, Qifa Ke, Michael Isard, and Svetlana Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. International journal of computer vision, 106(2): 210233, 2014.
-
(2014)
International journal of computer vision
, vol.106
, Issue.2
, pp. 210233
-
-
Gong, Y.1
Ke, Q.2
Isard, M.3
Lazebnik, S.4
-
36
-
-
51349086291
-
Putting objects in perspective
-
Derek Hoiem, Alexei A Efros, and Martial Hebert. Putting objects in perspective. International Journal of Computer Vision, 80(1): 3-15, 2008.
-
(2008)
International Journal of Computer Vision
, vol.80
, Issue.1
, pp. 3-15
-
-
Hoiem, D.1
Efros, A.A.2
Hebert, M.3
-
37
-
-
84990046210
-
Semantic parsing for text to 3d scene generation
-
Angel X Chang, Manolis Savva, and Christopher D Manning. Semantic parsing for text to 3d scene generation. ACL 2014, page 17, 2014.
-
(2014)
ACL 2014
, pp. 17
-
-
Chang, A.X.1
Savva, M.2
Manning, C.D.3
-
38
-
-
84866687133
-
Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation
-
IEEE
-
Jian Yao, Sanja Fidler, and Raquel Urtasun. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 702-709. IEEE, 2012.
-
(2012)
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on
, pp. 702-709
-
-
Yao, J.1
Fidler, S.2
Urtasun, R.3
-
40
-
-
52449123642
-
Multi-class segmentation with relative location prior
-
Stephen Gould, Jim Rodgers, David Cohen, Gal Elidan, and Daphne Koller. Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80(3): 300-316, 2008.
-
(2008)
International Journal of Computer Vision
, vol.80
, Issue.3
, pp. 300-316
-
-
Gould, S.1
Rodgers, J.2
Cohen, D.3
Elidan, G.4
Koller, D.5
-
41
-
-
84866726859
-
Understanding and predicting importance in images
-
IEEE
-
Alexander C Berg, Tamara L Berg, Hal Daume, Jesse Dodge, Amit Goyal, Xufeng Han, Alyssa Mensch, Margaret Mitchell, Aneesh Sood, Karl Stratos, et al. Understanding and predicting importance in images. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 3562-3569. IEEE, 2012.
-
(2012)
Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on
, pp. 3562-3569
-
-
Berg, A.C.1
Berg, T.L.2
Daume, H.3
Dodge, J.4
Goyal, A.5
Han, X.6
Mensch, A.7
Mitchell, M.8
Sood, A.9
Stratos, K.10
-
42
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
-
Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE International Conference on Computer Vision, pages 2641-2649, 2015.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 2641-2649
-
-
Plummer, B.A.1
Wang, L.2
Cervantes, C.M.3
Caicedo, J.C.4
Hockenmaier, J.5
Lazebnik, S.6
-
44
-
-
84986327251
-
-
Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele. Grounding of textual phrases in images by reconstruction. arXiv preprint arXiv: 1511.03745, 2015.
-
(2015)
Grounding of Textual Phrases in Images By Reconstruction
-
-
Rohrbach, A.1
Rohrbach, M.2
Hu, R.3
Darrell, T.4
Schiele, B.5
-
45
-
-
84887345951
-
A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
-
Pradipto Das, Chenliang Xu, Richard F Doell, and Jason J Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2634-2641, 2013.
-
(2013)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 2634-2641
-
-
Das, P.1
Xu, C.2
Doell, R.F.3
Corso, J.J.4
-
47
-
-
84973926486
-
Learning common sense through visual abstraction
-
Ramakrishna Vedantam, Xiao Lin, Tanmay Batra, C Lawrence Zitnick, and Devi Parikh. Learning common sense through visual abstraction. In Proceedings of the IEEE International Conference on Computer Vision, pages 2542-2550, 2015.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 2542-2550
-
-
Vedantam, R.1
Lin, X.2
Batra, T.3
Zitnick, C.L.4
Parikh, D.5
-
48
-
-
84959250180
-
From captions to visual concepts and back
-
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C Platt, et al. From captions to visual concepts and back. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1473-1482, 2015.
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 1473-1482
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
49
-
-
84933585162
-
Very deep convolutional networks for large-scale image recognition
-
abs/1409.1556
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, abs/1409.1556, 2014.
-
(2014)
CoRR
-
-
Simonyan, K.1
Zisserman, A.2
-
51
-
-
84973861983
-
Conditional random fields as recurrent neural networks
-
Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, and Philip HS Torr. Conditional random fields as recurrent neural networks. In Proceedings of the IEEE International Conference on Computer Vision, pages 1529-1537, 2015.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 1529-1537
-
-
Zheng, S.1
Jayasumana, S.2
Romera-Paredes, B.3
Vineet, V.4
Su, Z.5
Du, D.6
Huang, C.7
Torr, P.H.S.8
-
53
-
-
85162351107
-
Efficient inference in fully connected crfs with Gaussian edge potentials
-
Vladlen Koltun. Efficient inference in fully connected crfs with gaussian edge potentials. Adv. Neural Inf. Process. Syst, 2011.
-
(2011)
Adv. Neural Inf. Process. Syst
-
-
Koltun, V.1
-
58
-
-
84990066623
-
Deep markov random field for image modeling
-
Springer
-
Zhirong Wu, Dahua Lin, and Xiaoou Tang. Deep markov random field for image modeling. In European Conference on Computer Vision, pages 295-312. Springer, 2016.
-
(2016)
European Conference on Computer Vision
, pp. 295-312
-
-
Wu, Z.1
Lin, D.2
Tang, X.3
-
59
-
-
84913555165
-
-
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv: 1408.5093, 2014.
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
60
-
-
77955422240
-
Object detection with discriminatively trained part based models
-
P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part based models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(9): 1627-1645, 2010.
-
(2010)
IEEE Transactions on Pattern Analysis and Machine Intelligence
, vol.32
, Issue.9
, pp. 1627-1645
-
-
Felzenszwalb, P.F.1
Girshick, R.B.2
McAllester, D.3
Ramanan, D.4
-
62
-
-
84990053150
-
-
Somak Aditya, Yezhou Yang, Chitta Baral, Cornelia Fermuller, and Yiannis Aloimonos. From images to sentences through scene description graphs using commonsense reasoning and knowledge. arXiv preprint arXiv: 1511.03292, 2015.
-
(2015)
From Images to Sentences Through Scene Description Graphs Using Commonsense Reasoning and Knowledge
-
-
Aditya, S.1
Yang, Y.2
Baral, C.3
Fermuller, C.4
Aloimonos, Y.5
-
63
-
-
85044286206
-
-
Qi Wu, Damien Teney, Peng Wang, Chunhua Shen, Anthony Dick, and Anton van den Hengel. Visual question answering: A survey of methods and datasets. arXiv preprint arXiv: 1607.05910, 2016.
-
(2016)
Visual Question Answering: A Survey of Methods and Datasets
-
-
Qi, Wu.1
Teney, D.2
Wang, P.3
Shen, C.4
Dick, A.5
Van Den Hengel, A.6
|