-
1
-
-
84911448580
-
2d human pose estimation: New benchmark and State of the Art Analysis
-
Columbus, OH, US
-
Mykhaylo Andriluka, Leonid Pishchulin, Peter Gehler, and Bernt Schiele. 2014. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis. In CVPR 14, pages 3686-3693, Columbus, OH, US.
-
(2014)
CVPR
, vol.14
, pp. 3686-3693
-
-
Andriluka, M.1
Pishchulin, L.2
Gehler, P.3
Schiele, B.4
-
2
-
-
0029681342
-
Spatial context in recognition
-
Moshe Bar and Shimon Ullman. 1996. Spatial Context in Recognition. Perception, 25(3):343-52.
-
(1996)
Perception
, vol.25
, Issue.3
, pp. 343-352
-
-
Bar, M.1
Ullman, S.2
-
3
-
-
0020120019
-
Scene perception: Detecting and judging objects undergoing relational violations
-
Irving Biederman, Robert J Mezzanotte, and Jan C Rabinowitz. 1982. Scene perception: Detecting and judging objects undergoing relational violations. Cognitive Psychology, 14(2):143-177.
-
(1982)
Cognitive Psychology
, vol.14
, Issue.2
, pp. 143-177
-
-
Biederman, I.1
Mezzanotte, R.J.2
Rabinowitz, J.C.3
-
4
-
-
84859020282
-
Better hypothesis testing for statistical machine translation: Controlling for optimizer instability
-
Portland, OR, U.S.A
-
JH Clark, Chris Dyer, Alon Lavie, and NA Smith. 2011. Better hypothesis testing for statistical machine translation: Controlling for optimizer instability. In ACL-HTL 11, pages 176-181, Portland, OR, U.S.A.
-
(2011)
ACL-HTL
, vol.11
, pp. 176-181
-
-
Clark, J.H.1
Dyer, C.2
Lavie, A.3
Smith, N.A.4
-
5
-
-
85120305515
-
Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems
-
Edinburgh, Scotland, U.K
-
Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In SMT at EMNLP 11, Edinburgh, Scotland, U.K.
-
(2011)
SMT at EMNLP
, vol.11
-
-
Denkowski, M.1
Lavie, A.2
-
6
-
-
84959236502
-
Longterm recurrent convolutional networks for Visual Recognition and Description
-
Boston, MA, U.S.A
-
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Longterm Recurrent Convolutional Networks for Visual Recognition and Description. In CVPR 15, Boston, MA, U.S.A.
-
(2015)
CVPR
, vol.15
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
84937943470
-
Depth map prediction from a single image using a Multi-Scale Deep Network
-
Lake Tahoe, CA, U.S.A, June
-
David Eigen, Christian Puhrsch, and Rob Fergus. 2014. Depth Map Prediction from a Single Image using a Multi-Scale Deep Network. In NIPS 27, Lake Tahoe, CA, U.S.A, June.
-
(2014)
NIPS
, vol.27
-
-
Eigen, D.1
Puhrsch, C.2
Fergus, R.3
-
8
-
-
84906929591
-
Image description using visual dependency Representations
-
Seattle, WA, U.S.A
-
Desmond Elliott and Frank Keller. 2013. Image Description using Visual Dependency Representations. In EMNLP 13, pages 1292-1302, Seattle, WA, U.S.A.
-
(2013)
EMNLP
, vol.13
, pp. 1292-1302
-
-
Elliott, D.1
Keller, F.2
-
9
-
-
84906928552
-
Comparing automatic evaluation measures for Image Description
-
Baltimore MD, U.S.A
-
Desmond Elliott and Frank Keller. 2014. Comparing Automatic Evaluation Measures for Image Description. In ACL 14, pages 452-457, Baltimore, MD, U.S.A.
-
(2014)
ACL
, vol.14
, pp. 452-457
-
-
Elliott, D.1
Keller, F.2
-
10
-
-
84943810574
-
Query-by-example image retrieval using visual Dependency Representations
-
Dublin, Ireland
-
Desmond Elliott, Victor Lavrenko, and Frank Keller. 2014. Query-by-Example Image Retrieval using Visual Dependency Representations. In COLING 14, pages 109-120, Dublin, Ireland.
-
(2014)
COLING
, vol.14
, pp. 109-120
-
-
Elliott, D.1
Lavrenko, V.2
Keller, F.3
-
11
-
-
77951298115
-
The pascal visual object classes challenge
-
Mark Everingham, Luc Van Gool, Christopher Williams, John Winn, and Andrew Zisserman. 2010. The PASCAL Visual Object Classes Challenge. IJCV, 88(2):303-338.
-
(2011)
IJCV
, vol.88
, Issue.2
, pp. 303-338
-
-
Everingham, M.1
Van Gool, L.2
Williams, C.3
Winn, J.4
Zisserman, A.5
-
12
-
-
84959250180
-
From captions to visual concepts and back
-
Boston, MA, U.S.A
-
Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh Srivastava, Li Deng, Piotr Dolĺar, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. 2015. From Captions to Visual Concepts and Back. In CVPR 15, Boston, MA, U.S.A.
-
(2015)
CVPR
, vol.15
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dolĺar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Zitnick Lawrence, C.10
Platt, J.C.11
Zweig, G.12
-
13
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
Heraklion, Crete, Greece
-
Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: generating sentences from images. In ECCV 10, pages 15-29, Heraklion, Crete, Greece.
-
(2011)
ECCV
, vol.10
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Amin Sadeghi, M.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
14
-
-
84859924694
-
Automatic image annotation using auxiliary text Information
-
Colombus, Ohio
-
Yansong Feng and Mirella Lapata. 2008. Automatic Image Annotation Using Auxiliary Text Information. In ACL 08, pages 272-280, Colombus, Ohio.
-
(2008)
ACL
, vol.8
, pp. 272-280
-
-
Feng, Y.1
Lapata, M.2
-
15
-
-
84913561844
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
abs/1311.2
-
Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. 2014. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2.
-
(2014)
CoRR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
16
-
-
84883394520
-
Framing image description as a ranking task: Data, Models and Evaluation Metrics
-
Micah Hodosh, Peter Young, and Julia Hockenmaier. 2013. Framing Image Description as a Ranking Task: Data, Models and Evaluation Metrics. JAIR, 47:853-899.
-
(2013)
JAIR
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
17
-
-
84913580146
-
Caffe: Convolutional architecture for fast feature Embedding
-
Orlando, FL, U.S.A
-
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross B. Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In MM 14, pages 675-678, Orlando, FL, U.S.A.
-
(2014)
MM
, vol.14
, pp. 675-678
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.B.6
Guadarrama, S.7
Darrell, T.8
-
18
-
-
84952902559
-
Deep visual-semantic alignments for generating Image Descriptions
-
Boston, MA, U.S.A
-
Andrej Karpathy and Li Fei-Fei. 2015. Deep Visual-Semantic Alignments for Generating Image Descriptions. In CVPR 15, Boston, MA, U.S.A.
-
(2015)
CVPR
, vol.15
-
-
Karpathy, A.1
Fei-Fei, L.2
-
19
-
-
84959252592
-
Deep fragment embeddings for bidirectional image Sentence Mapping
-
Montreal, Quebec, Canada
-
Andrej Karpathy, Armand Joulin, and Li Fei-Fei. 2014. Deep Fragment Embeddings for Bidirectional Image Sentence Mapping. In NIPS 28, Montreal, Quebec, Canada.
-
(2014)
NIPS
, vol.28
-
-
Karpathy, A.1
Joulin, A.2
Fei-Fei, L.3
-
20
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
Colorado Springs, CO, U.S.A
-
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg, and Tamara L. Berg. 2011. Baby talk: Understanding and generating simple image descriptions. In CVPR 11, pages 1601-1608, Colorado Springs, CO, U.S.A.
-
(2011)
CVPR
, vol.11
, pp. 1601-1608
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
21
-
-
84878189119
-
Collective generation of natural image Descriptions
-
Jeju Island, South Korea
-
Polina Kuznetsova, Vicente Ordonez, Alexander C. Berg, Tamara L. Berg, and Yejin Choi. 2012. Collective Generation of Natural Image Descriptions. In ACL 12, pages 359-368, Jeju Island, South Korea.
-
(2012)
ACL
, vol.12
, pp. 359-368
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
22
-
-
85062874978
-
Tuhoi : Trento universal human object Interaction Dataset
-
Dublin, Ireland
-
Dieu-Thu Le, Jasper Uijlings, and Raffaella Bernardi. 2014. TUHOI : Trento Universal Human Object Interaction Dataset. In WVL at COLING 14, pages 17-24, Dublin, Ireland.
-
(2014)
WVL at COLING
, vol.14
, pp. 17-24
-
-
Le, D.1
Uijlings, J.2
Bernardi, R.3
-
23
-
-
84970028761
-
Phrase-based image captioning
-
Lille, France, February
-
Remi Lebret, Pedro O. Pinheiro, and Ronan Collobert. 2015. Phrase-based Image Captioning. In ICML 15, Lille, France, February.
-
(2015)
ICML
, vol.15
-
-
Lebret, R.1
Pinheiro, P.O.2
Collobert, R.3
-
24
-
-
84862279067
-
Composing simple image descriptions using web-scale n-grams
-
Portland, OR, U.S.A
-
Siming Li, Girish Kulkarni, Tamara L. Berg, Alexander C. Berg, and Yejin Choi. 2011. Composing simple image descriptions using web-scale n-grams. In CoNLL 11, pages 220-228, Portland, OR, U.S.A.
-
(2011)
CoNLL
, vol.11
, pp. 220-228
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
25
-
-
85149140250
-
Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics
-
Barcelona, Spain
-
Chin-Yew Lin and Franz Josef Och. 2004. Automatic evaluation of machine translation quality using longest common subsequence and skip-bigram statistics. In ACL 04, pages 605-612, Barcelona, Spain.
-
(2004)
ACL
, vol.4
, pp. 605-612
-
-
Lin, C.1
Josef Och, F.2
-
26
-
-
85083953135
-
Network in network
-
volume abs/1312.4, Banff, Canada
-
Min Lin, Qiang Chen, and Shuicheng Yan. 2014a. Network In Network. In ICLR 14, volume abs/1312.4, Banff, Canada.
-
(2014)
ICLR
, vol.14
-
-
Lin, M.1
Chen, Q.2
Yan, S.3
-
27
-
-
84937834115
-
Microsoft coco: Common objects in context
-
Zurich, Switzerland
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dolĺar. 2014b. Microsoft COCO: Common Objects in Context. In ECCV 14, pages 740-755, Zurich, Switzerland.
-
(2014)
ECCV
, vol.14
, pp. 740-755
-
-
Lin, T.1
Maire, M.2
Belongie, S.3
Bourdev, L.4
Girshick, R.5
Hays, J.6
Perona, P.7
Ramanan, D.8
Lawrence Zitnick, C.9
Dolĺar, P.10
-
28
-
-
0001964555
-
-
In Paul Bloom, Mary A. Peterson, Lynn Nadel, and Merrill F. Garrett, editors, Language and Space MIT Press
-
GD Logan and DD Sadler. 1996. A computational analysis of the apprehension of spatial relations. In Paul Bloom, Mary A. Peterson, Lynn Nadel, and Merrill F. Garrett, editors, Language and Space, pages 492-592. MIT Press.
-
(1996)
A Computational Analysis of the Apprehension of Spatial Relations
, pp. 492-592
-
-
Logan, G.D.1
Sadler, D.D.2
-
29
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn
-
volume abs/1412.6632, San Diego, CA, U.S.A
-
Junhua Mao, Wei Xu, Yi Yang, Yiang Wang, and Alan L. Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR 15, volume abs/1412.6632, San Diego, CA, U.S.A.
-
(2015)
ICLR
, vol.15
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, Y.4
Yuille, A.L.5
-
30
-
-
0242626599
-
A taxonomy of relationships between images and text
-
Emily E. Marsh and Marilyn Domas White. 2003. A taxonomy of relationships between images and text. Journal of Documentation, 59(6):647-672.
-
(2003)
Journal of Documentation
, vol.59
, Issue.6
, pp. 647-672
-
-
Marsh, E.E.1
Domas White, M.2
-
32
-
-
85034832841
-
Midge : Generating image descriptions from Computer Vision Detections
-
Avignon, France
-
Margaret Mitchell, Jesse Dodge, Amit Goyal, Kota Yamaguchi, Karl Stratos, Alyssa Mensch, Alex Berg, Tamara Berg, and Hal Daum. 2012. Midge : Generating Image Descriptions From Computer Vision Detections. In EACL 12, pages 747-756, Avignon, France.
-
(2012)
EACL
, vol.12
, pp. 747-756
-
-
Mitchell, M.1
Dodge, J.2
Goyal, A.3
Yamaguchi, K.4
Stratos, K.5
Mensch, A.6
Berg, A.7
Berg, T.8
Daum, H.9
-
33
-
-
34447620889
-
MaltParser: A language-independent system for data-driven dependency parsing
-
Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, Gülsen Eryigit, Sandra Kübler, Svetoslav Marinov, and Erwin Marsi. 2007. MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering, 13(2):1.
-
(2007)
Natural Language Engineering
, vol.13
, Issue.2
, pp. 1
-
-
Nivre, J.1
Hall, J.2
Nilsson, J.3
Chanev, A.4
Eryigit, G.5
Kübler, S.6
Marinov, S.7
Marsi, E.8
-
34
-
-
84911449395
-
Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks
-
Columbus, OH, US
-
Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks. In CVPR 14, pages 1717-1724, Columbus, OH, US.
-
(2014)
CVPR
, vol.14
, pp. 1717-1724
-
-
Oquab, M.1
Bottou, L.2
Laptev, I.3
Sivic, J.4
-
35
-
-
84960194772
-
Learning to interpret and describe abstract scenes
-
Denver, CO, U.S.A
-
Luis M. G. Ortiz, Clemens Wolff, and Mirella Lapata. 2015. Learning to Interpret and Describe Abstract Scenes. In NAACL 15, Denver, CO, U.S.A.
-
(2015)
NAACL
, vol.15
-
-
Luis, M.1
Ortiz, G.2
Wolff, C.3
Lapata, M.4
-
37
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
Philadelphia, PA, U.S.A
-
Kishore Papineni, Salim Roukos, Todd Ward, and WJ Zhu. 2002. BLEU: A method for automatic evaluation of machine translation. In ACL 02, pages 311-318, Philadelphia, PA, U.S.A.
-
(2002)
ACL
, vol.2
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.J.4
-
39
-
-
85090348677
-
Collecting image annotations using Amazons Mechanical Turk
-
Los Angeles, CA, U.S.A
-
Cyrus Rashtchian, Peter Young, Micah Hodosh, and Julia Hockenmaier. 2010. Collecting image annotations using Amazons Mechanical Turk. In AMT at NAACL 10, pages 139-147, Los Angeles, CA, U.S.A.
-
(2011)
AMT at NAACL
, vol.10
, pp. 139-147
-
-
Rashtchian, C.1
Young, P.2
Hodosh, M.3
Hockenmaier, J.4
-
40
-
-
84909978410
-
-
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge.
-
(2014)
ImageNet Large Scale Visual Recognition Challenge
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
41
-
-
80052889458
-
Recognition using visual phrases
-
Colorado Springs, CO, U.S.A
-
Mohammad A Sadeghi and Ali Farhadi. 2011. Recognition Using Visual Phrases. In CVPR 11, pages 1745-1752, Colorado Springs, CO, U.S.A.
-
(2011)
CVPR
, vol.11
, pp. 1745-1752
-
-
Sadeghi, M.A.1
Farhadi, A.2
-
42
-
-
84952235015
-
Analysing the subject of a picture: A theoretical Approach
-
Sara Shatford. 1986. Analysing the Subject of a Picture: A Theoretical Approach. Cataloging &Classification Quarterly, 6(3):39-62.
-
(1986)
Cataloging &Classification Quarterly
, vol.6
, Issue.3
, pp. 39-62
-
-
Shatford, S.1
-
43
-
-
85083953063
-
Very deep convolutional networks for large-scale Image Recognition
-
volume abs/1409.1, San Diego, CA, U.S.A
-
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In ICLR 15, volume abs/1409.1, San Diego, CA, U.S.A.
-
(2015)
ICLR
, vol.15
-
-
Simonyan, K.1
Zisserman, A.2
-
44
-
-
84906925854
-
Grounded compositional semantics for finding and Describing Images with Sentences
-
Richard Socher, Andrej Karpathy, Q Le, C Manning, and A Ng. 2014. Grounded Compositional Semantics for Finding and Describing Images with Sentences. TACL, 2:207-218.
-
(2014)
TACL
, vol.2
, pp. 207-218
-
-
Socher, R.1
Karpathy, A.2
Le, Q.3
Manning, C.4
Ng, A.5
-
45
-
-
84983470508
-
Feature-rich part-of-speech tagging with a cyclic Dependency Network
-
Edmonton, Canada
-
Kristina Toutanova, Dan Klein, and Christopher D Manning. 2003. Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In HLTNAACL 03, pages 173-180, Edmonton, Canada.
-
(2003)
HLTNAACL
, vol.3
, pp. 173-180
-
-
Toutanova, K.1
Klein, D.2
Manning, C.D.3
-
46
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Boston, MA, U.S.A
-
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR 15, Boston, MA, U.S.A.
-
(2015)
CVPR
, vol.15
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
47
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
Edinburgh, Scotland, UK
-
Yezhou Yang, Ching Lik Teo, Hal Daumé, and Yiannis Aloimonos. 2011. Corpus-Guided Sentence Generation of Natural Images. In EMNLP 11, pages 444-454, Edinburgh, Scotland, UK.
-
(2011)
EMNLP
, vol.11
, pp. 444-454
-
-
Yang, Y.1
Lik Teo, C.2
Daumé, H.3
Aloimonos, Y.4
-
48
-
-
85026937926
-
See no evil, say no evil: Description generation from Densely Labeled Images
-
Dublin, Ireland
-
Mark Yatskar, Michel Galley, L Vanderwende, and L Zettlemoyer. 2014. See No Evil, Say No Evil: Description Generation from Densely Labeled Images. In SEM, pages 110-120, Dublin, Ireland.
-
(2014)
SEM
, pp. 110-120
-
-
Yatskar, M.1
Galley, M.2
Vanderwende, L.3
Zettlemoyer, L.4
-
49
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Peter Young, Alice Lai, Micah Hodosh, and Julia Hockenmaier. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2:67-78
-
(2014)
TACL
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
|