-
2
-
-
84973890960
-
VQA: Visual question answering
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual question answering. In ICCV, 2015.
-
(2015)
ICCV
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
3
-
-
84951960494
-
From generic to specific deep representation for visual recognition
-
H. Azizpour, A. S. Razavian, J. Sullivan, A. Maki, and S. Carlsson. From generic to specific deep representation for visual recognition. In CVPR Workshops, 2015.
-
(2015)
CVPR Workshops
-
-
Azizpour, H.1
Razavian, A.S.2
Sullivan, J.3
Maki, A.4
Carlsson, S.5
-
4
-
-
84866726859
-
Understanding and predicting importance in images
-
A. C. Berg, T. L. Berg, H. Daum, J. Dodge, A. Goyal, X. Han, A. Mensch, M. Mitchell, A. Sood, K. Stratos, and K. Yamaguchi. Understanding and predicting importance in images. In CVPR, 2012.
-
(2012)
CVPR
-
-
Berg, A.C.1
Berg, T.L.2
Daum, H.3
Dodge, J.4
Goyal, A.5
Han, X.6
Mensch, A.7
Mitchell, M.8
Sood, A.9
Stratos, K.10
Yamaguchi, K.11
-
5
-
-
84960130911
-
Automatic description generation from images: A survey
-
R. Bernardi, R. Cakici, D. Elliott, A. Erdem, E. Erdem, N. Ikizler-Cinbis, F. Keller, A. Muscat, and B. Plank. Automatic description generation from images: A survey. J. Artif. Intell. Res., 55 (1), 2016.
-
(2016)
J. Artif. Intell. Res.
, vol.55
, Issue.1
-
-
Bernardi, R.1
Cakici, R.2
Elliott, D.3
Erdem, A.4
Erdem, E.5
Ikizler-Cinbis, N.6
Keller, F.7
Muscat, A.8
Plank, B.9
-
6
-
-
0037611992
-
Minding the clock
-
K. Bock, D. Irwin, D. Davidson, andW. Levelt. Minding the clock. J. Mem. Lang., 48, 2003.
-
(2003)
J. Mem. Lang.
, vol.48
-
-
Bock, K.1
Irwin, D.2
Davidson, D.3
Levelt, W.4
-
7
-
-
84943411248
-
Reconciling saliency and object center-bias hypotheses in explaining free-viewing fixations
-
A. Borji and J. Tanner. Reconciling saliency and object center-bias hypotheses in explaining free-viewing fixations. IEEE Trans Neural Netw Learn Syst., 27 (6), 2016.
-
(2016)
IEEE Trans Neural Netw Learn Syst.
, vol.27
, Issue.6
-
-
Borji, A.1
Tanner, J.2
-
8
-
-
84897056830
-
Analysis of scores, datasets, and models in visual saliency prediction
-
A. Borji, H. R. Tavakoli, D. N. Sihite, and L. Itti. Analysis of scores, datasets, and models in visual saliency prediction. In ICCV, 2013.
-
(2013)
ICCV
-
-
Borji, A.1
Tavakoli, H.R.2
Sihite, D.N.3
Itti, L.4
-
9
-
-
84952349295
-
-
CoRR, abs/1504. 00325
-
X. Chen, T.-Y. L. Hao Fang, R. Vedantam, S. Gupta, P. Dollr, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. CoRR, abs/1504. 00325, 2015.
-
(2015)
Microsoft COCO Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Hao Fang, T.-Y.L.2
Vedantam, R.3
Gupta, S.4
Dollr, P.5
Zitnick, C.L.6
-
10
-
-
84950120533
-
Deep filter banks for texture recognition and segmentation
-
M. Cimpoi, S. Maji, and A. Vedaldi. Deep filter banks for texture recognition and segmentation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Cimpoi, M.1
Maji, S.2
Vedaldi, A.3
-
11
-
-
84954214926
-
Giving good directions: Order of mention reflects visual salience
-
A. D. F. Clarke, M. Elsner, and H. Rohde. Giving good directions: order of mention reflects visual salience. Front. Psychol., 6, 2015.
-
(2015)
Front. Psychol.
, vol.6
-
-
Clarke, A.D.F.1
Elsner, M.2
Rohde, H.3
-
12
-
-
85107661995
-
Meteor universal: Language specific translation evaluation for any target language
-
M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL, 2014.
-
(2014)
EACL
-
-
Denkowski, M.1
Lavie, A.2
-
13
-
-
84944096380
-
Language models for image captioning: The quirks and what work
-
J. Devlin, H. Cheng, H. Fang, S. Gupta, L. Deng, X. He, G. Zweig, and M. Mitchell. Language models for image captioning: The quirks and what work. In ACL, 2015.
-
(2015)
ACL
-
-
Devlin, J.1
Cheng, H.2
Fang, H.3
Gupta, S.4
Deng, L.5
He, X.6
Zweig, G.7
Mitchell, M.8
-
14
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
15
-
-
84943812736
-
Describing images using inferred visual dependency representations
-
D. Elliott and A. P. de Vries. Describing images using inferred visual dependency representations. In ACL, 2015.
-
(2015)
ACL
-
-
Elliott, D.1
De Vries, A.P.2
-
16
-
-
84906929591
-
Image description using visual dependency representations
-
D. Elliott and F. Keller. Image description using visual dependency representations. In EMNLP, 2013.
-
(2013)
EMNLP
-
-
Elliott, D.1
Keller, F.2
-
17
-
-
84921069139
-
The pascal visual object classes challenge: A retrospective
-
M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The pascal visual object classes challenge: A retrospective. IJCV, 111 (1), 2015.
-
(2015)
IJCV
, vol.111
, Issue.1
-
-
Everingham, M.1
Eslami, S.M.A.2
Van Gool, L.3
Williams, C.K.I.4
Winn, J.5
Zisserman, A.6
-
18
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollar, J. Gao, X. He, M. Mitchell, J. Platt, L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
Zitnick, L.11
Zweig, G.12
-
19
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
-
(2010)
ECCV
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
20
-
-
84930639891
-
Statistics of high-level scene context
-
M. R. Greene. Statistics of high-level scene context. Front. Psychol., 4, 2013.
-
(2013)
Front. Psychol.
, vol.4
-
-
Greene, M.R.1
-
21
-
-
0034232298
-
What the eyes say about speaking
-
Z. Griffin and K. Bock. What the eyes say about speaking. Psychol Sci., 11 (4), 2000.
-
(2000)
Psychol Sci.
, vol.11
, Issue.4
-
-
Griffin, Z.1
Bock, K.2
-
22
-
-
33750499478
-
Observing the what and when of language production for different age groups by monitoring speakers eye movements
-
Language Comprehension across the Life Span
-
Z. M. Griffin and D. H. Spieler. Observing the what and when of language production for different age groups by monitoring speakers eye movements. Brain and Language, 99 (3):272-288, 2006. Language Comprehension across the Life Span.
-
(2006)
Brain and Language
, vol.99
, Issue.3
, pp. 272-288
-
-
Griffin, Z.M.1
Spieler, D.H.2
-
24
-
-
84986274465
-
Deep residual learning for image recognition
-
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
-
(2016)
CVPR
-
-
He, K.1
Zhang, X.2
Ren, S.3
Sun, J.4
-
25
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artif. Intell. Res., 47, 2013.
-
(2013)
J. Artif. Intell. Res.
, vol.47
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
27
-
-
33745903481
-
Extereme learning machine: Theory and applicatons
-
G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew. Extereme learning machine: Theory and applicatons. Neurocomput., 70, 2006.
-
(2006)
Neurocomput
, pp. 70
-
-
Huang, G.-B.1
Zhu, Q.-Y.2
Siew, C.-K.3
-
31
-
-
20444505688
-
Referential domains in spoken language comprehension: Using eye movements to bridge the product and action traditions
-
Psychology Press
-
M. k. Tanenhaus, C. Chambers, and J. E. Hanna. Referential domains in spoken language comprehension: Using eye movements to bridge the product and action traditions. In The interface of language, vision, and action: Eye movements and visual world. Psychology Press, 2004.
-
(2004)
The Interface of Language, Vision, and Action: Eye Movements and Visual World
-
-
Tanenhaus, M.K.1
Chambers, C.2
Hanna, J.E.3
-
32
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
33
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
A. Karpathy, A. Joulin, and L. Fei-Fei. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, 2014.
-
(2014)
NIPS
-
-
Karpathy, A.1
Joulin, A.2
Fei-Fei, L.3
-
34
-
-
84862279067
-
Composing simple image descriptions using web-scale n-grams
-
S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In CoNLL, 2011.
-
(2011)
CoNLL
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
36
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, and C. L. Zitnick. Microsoft COCO: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollar, P.7
Zitnick, C.L.8
-
37
-
-
84973896625
-
Ask your neurons: A neural-based approach to answering questions about images
-
M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In ICCV, 2015.
-
(2015)
ICCV
-
-
Malinowski, M.1
Rohrbach, M.2
Fritz, M.3
-
38
-
-
85117622017
-
The Stanford CoreNLP natural language processing toolkit
-
C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL, 2014.
-
(2014)
ACL
-
-
Manning, C.D.1
Surdeanu, M.2
Bauer, J.3
Finkel, J.4
Bethard, S.J.5
McClosky, D.6
-
39
-
-
84961654805
-
Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition
-
S. Mathe and C. Sminchisescu. Actions in the eye: Dynamic gaze datasets and learnt saliency models for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell., 2015.
-
(2015)
IEEE Trans. Pattern Anal. Mach. Intell.
-
-
Mathe, S.1
Sminchisescu, C.2
-
41
-
-
0032060018
-
Viewing and naming objects: Eye movements during noun phrase production
-
A. S. Meyer, A. M. Sleiderink, andW. J. Levelt. Viewing and naming objects: eye movements during noun phrase production. Cognition, 66 (2), 1998.
-
(1998)
Cognition
, vol.66
, Issue.2
-
-
Meyer, A.S.1
Sleiderink, A.M.2
Levelt, A.J.3
-
42
-
-
85083951332
-
Efficient estimation of word representations in vector space
-
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. ICLR, 2013.
-
(2013)
ICLR
-
-
Mikolov, T.1
Chen, K.2
Corrado, G.3
Dean, J.4
-
43
-
-
84911444024
-
The role of context for object detection and semantic segmentation in the wild
-
R. Mottaghi, X. Chen, X. Liu, N. G. Cho, S. W. Lee, S. Fidler, R. Urtasun, and A. Yuille. The role of context for object detection and semantic segmentation in the wild. In CVPR, 2014.
-
(2014)
CVPR
-
-
Mottaghi, R.1
Chen, X.2
Liu, X.3
Cho, N.G.4
Lee, S.W.5
Fidler, S.6
Urtasun, R.7
Yuille, A.8
-
44
-
-
85162522202
-
Im2text: Describing images using 1 million captioned photographs
-
V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
-
(2011)
NIPS
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
-
45
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W. jing Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, 2002.
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Jing Zhu, W.4
-
47
-
-
20544446875
-
Components of bottom-up gaze allocation in natural images
-
R. J. Peters, A. Iyer, L. Itti, and C. Koch. Components of bottom-up gaze allocation in natural images. Vision Research, 45, 2005.
-
(2005)
Vision Research
, pp. 45
-
-
Peters, R.J.1
Iyer, A.2
Itti, L.3
Koch, C.4
-
48
-
-
0034886959
-
Walking or talking: Behavioral and neurophysiological correlates of action verb processing
-
F. Pulvermüller, M. Hrle, and F. Hummel. Walking or talking: Behavioral and neurophysiological correlates of action verb processing. Brain and Language, 78 (2), 2001.
-
(2001)
Brain and Language
, vol.78
, Issue.2
-
-
Pulvermüller, F.1
Hrle, M.2
Hummel, F.3
-
52
-
-
84977650097
-
Video captioning with recurrent networks based on frame-and video-level features and visual content classification
-
R. Shetty and J. Laaksonen. Video captioning with recurrent networks based on frame-and video-level features and visual content classification. In CVPR Workshops, 2015.
-
(2015)
CVPR Workshops
-
-
Shetty, R.1
Laaksonen, J.2
-
58
-
-
85041901191
-
Saliency revisited: Analysis of mouse movements versus fixations
-
H. R. Tavakoli, F. Ahmad, A. Borji, and J. Laaksonen. Saliency revisited: Analysis of mouse movements versus fixations. In CVPR, 2017.
-
(2017)
CVPR
-
-
Tavakoli, H.R.1
Ahmad, F.2
Borji, A.3
Laaksonen, J.4
-
59
-
-
85017026850
-
Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features
-
H. R. Tavakoli, A. Borji, J. Laaksonen, and E. Rahtu. Exploiting inter-image similarity and ensemble of extreme learners for fixation prediction using deep features. Neurocomput., 244, 2017.
-
(2017)
Neurocomput.
, pp. 244
-
-
Tavakoli, H.R.1
Borji, A.2
Laaksonen, J.3
Rahtu, E.4
-
60
-
-
84956980995
-
CIDEr: Consensus-based image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
62
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
F. Bach and D. Blei, editors, Lille, France, 07-09 Jul PMLR
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In F. Bach and D. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, volume 37 of Proceedings of Machine Learning Research, pages 2048-2057, Lille, France, 07-09 Jul 2015. PMLR.
-
(2015)
Proceedings of the 32nd International Conference on Machine Learning Volume 37 of Proceedings of Machine Learning Research
, pp. 2048-2057
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudinov, R.6
Zemel, R.7
Bengio, Y.8
-
63
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2, 2014.
-
(2014)
TACL
, vol.2
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
64
-
-
84891713152
-
Exploring the role of gaze behavior and object detection in scene understanding
-
K. Yun, Y. Peng, D. Samaras, G. Zelinsky, and T. Berg. Exploring the role of gaze behavior and object detection in scene understanding. Front. Psychol., 4, 2013.
-
(2013)
Front. Psychol.
, vol.4
-
-
Yun, K.1
Peng, Y.2
Samaras, D.3
Zelinsky, G.4
Berg, T.5
-
65
-
-
84887396648
-
Studying relationships between human gaze, description, and computer vision
-
K. Yun, Y. Peng, D. Samaras, G. J. Zelinsky, and T. L. Berg. Studying relationships between human gaze, description, and computer vision. In CVPR, 2013.
-
(2013)
CVPR
-
-
Yun, K.1
Peng, Y.2
Samaras, D.3
Zelinsky, G.J.4
Berg, T.L.5
-
66
-
-
84875275420
-
Learning saliency-based visual attention: A review
-
Q. Zhao and C. Koch. Learning saliency-based visual attention: A review. Signal Processing, 93, 2013.
-
(2013)
Signal Processing
, vol.93
-
-
Zhao, Q.1
Koch, C.2
-
68
-
-
84885340415
-
Fixations on objects in natural scenes: Dissociating importance from salience
-
B. M. t Hart, H. C. E. F. Schmidt, C. Roth, andW. Einhauser. Fixations on objects in natural scenes: dissociating importance from salience. Front. Psychol., 4, 2013.
-
(2013)
Front. Psychol.
, vol.4
-
-
Hart, B.M.T.1
Schmidt, H.C.E.F.2
Roth, C.3
Einhauser, W.4
|