-
1
-
-
84959502295
-
-
arXiv preprint arXiv:1505.00468
-
S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. Vqa: Visual question answering. arXiv preprint arXiv:1505.00468, 2015.
-
(2015)
Vqa: Visual Question Answering
-
-
Antol, S.1
Agrawal, A.2
Lu, J.3
Mitchell, M.4
Batra, D.5
Zitnick, C.L.6
Parikh, D.7
-
2
-
-
78649587763
-
Vizwiz: Nearly real-time answers to visual questions
-
J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White, S. White, et al. Vizwiz: nearly real-time answers to visual questions. In ACM symposium on User interface software and technology, pages 333-342, 2010.
-
(2010)
ACM Symposium on User Interface Software and Technology
, pp. 333-342
-
-
Bigham, J.P.1
Jayant, C.2
Ji, H.3
Little, G.4
Miller, A.5
Miller, R.C.6
Miller, R.7
Tatarowicz, A.8
White, B.9
White, S.10
-
3
-
-
85083954148
-
Semantic image segmentation with deep convolutional nets and fully connected crfs
-
L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille. Semantic image segmentation with deep convolutional nets and fully connected crfs. ICLR, 2015.
-
(2015)
ICLR
-
-
Chen, L.-C.1
Papandreou, G.2
Kokkinos, I.3
Murphy, K.4
Yuille, A.L.5
-
4
-
-
84957029470
-
Learning a recurrent visual representation for image caption generation
-
X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Chen, X.1
Zitnick, C.L.2
-
5
-
-
84919728106
-
-
arXiv preprint arXiv:1406.1078
-
K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoderdecoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
-
(2014)
Learning Phrase Representations Using Rnn Encoderdecoder for Statistical Machine Translation
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bougares, F.4
Schwenk, H.5
Bengio, Y.6
-
6
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
26444565569
-
Finding structure in time
-
J. L. Elman. Finding structure in time. Cognitive science, 14(2):179-211, 1990.
-
(1990)
Cognitive Science
, vol.14
, Issue.2
, pp. 179-211
-
-
Elman, J.L.1
-
8
-
-
84959250180
-
From captions to visual concepts and back
-
H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
-
9
-
-
84925422907
-
Visual turing test for computer vision systems
-
D. Geman, S. Geman, N. Hallonquist, and L. Younes. Visual turing test for computer vision systems. PNAS, 112(12):3618-3623, 2015.
-
(2015)
PNAS
, vol.112
, Issue.12
, pp. 3618-3623
-
-
Geman, D.1
Geman, S.2
Hallonquist, N.3
Younes, L.4
-
10
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
11
-
-
38049183286
-
The iapr tc-12 benchmark: A new evaluation resource for visual information systems
-
M. Grubinger, P. Clough, H. Müller, and T. Deselaers. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, pages 13-23, 2006.
-
(2006)
International Workshop OntoImage
, pp. 13-23
-
-
Grubinger, M.1
Clough, P.2
Müller, H.3
Deselaers, T.4
-
13
-
-
84926283798
-
Recurrent continuous translation models
-
N. Kalchbrenner and P. Blunsom. Recurrent continuous translation models. In EMNLP, pages 1700-1709, 2013.
-
(2013)
EMNLP
, pp. 1700-1709
-
-
Kalchbrenner, N.1
Blunsom, P.2
-
14
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
15
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. TACL, 2015.
-
(2015)
TACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
17
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
18
-
-
85120046073
-
Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgements
-
Association for Computational Linguistics
-
A. Lavie and A. Agarwal. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgements. In Workshop on Statistical Machine Translation, pages 228-231. Association for Computational Linguistics, 2007.
-
(2007)
Workshop on Statistical Machine Translation
, pp. 228-231
-
-
Lavie, A.1
Agarwal, A.2
-
21
-
-
84906505935
-
-
arXiv preprint arXiv:1405.0312
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. arXiv preprint arXiv:1405.0312, 2014.
-
(2014)
Microsoft Coco: Common Objects in Context
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
22
-
-
84937822746
-
A multi-world approach to question answering about real-world scenes based on uncertain input
-
M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In Advances in Neural Information Processing Systems, pages 1682-1690, 2014.
-
(2014)
Advances in Neural Information Processing Systems
, pp. 1682-1690
-
-
Malinowski, M.1
Fritz, M.2
-
24
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
25
-
-
84965160495
-
-
arXiv preprint arXiv:1504.06692
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. arXiv preprint arXiv:1504.06692, 2015.
-
(2015)
Learning Like a Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
26
-
-
84951072975
-
Explain images with multimodal recurrent neural networks
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Explain images with multimodal recurrent neural networks. NIPS DeepLearning Workshop, 2014.
-
(2014)
NIPS DeepLearning Workshop
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
27
-
-
84939804661
-
-
arXiv preprint arXiv:1412.7753
-
T. Mikolov, A. Joulin, S. Chopra, M. Mathieu, and M. Ranzato. Learning longer memory in recurrent neural networks. arXiv preprint arXiv:1412.7753, 2014.
-
(2014)
Learning Longer Memory in Recurrent Neural Networks
-
-
Mikolov, T.1
Joulin, A.2
Chopra, S.3
Mathieu, M.4
Ranzato, M.5
-
28
-
-
79959829092
-
Recurrent neural network based language model
-
T. Mikolov, M. Karafiát, L. Burget, J. Cernockỳ, and S. Khudanpur. Recurrent neural network based language model. In INTERSPEECH, pages 1045-1048, 2010.
-
(2010)
INTERSPEECH
, pp. 1045-1048
-
-
Mikolov, T.1
Karafiát, M.2
Burget, L.3
Cernockỳ, J.4
Khudanpur, S.5
-
29
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, pages 3111-3119, 2013.
-
(2013)
NIPS
, pp. 3111-3119
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
30
-
-
77956509090
-
Rectified linear units improve restricted boltzmann machines
-
V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807-814, 2010.
-
(2010)
ICML
, pp. 807-814
-
-
Nair, V.1
Hinton, G.E.2
-
31
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002.
-
(2002)
ACL
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
33
-
-
84909978410
-
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ImageNet Large Scale Visual Recognition Challenge, 2014.
-
(2014)
ImageNet Large Scale Visual Recognition Challenge
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
34
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
35
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104-3112, 2014.
-
(2014)
NIPS
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
36
-
-
84964983441
-
-
arXiv preprint arXiv:1409.4842
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. arXiv preprint arXiv:1409.4842, 2014.
-
(2014)
Going Deeper with Convolutions
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
37
-
-
84901405262
-
Joint video and text parsing for understanding events and answering queries
-
K. Tu, M. Meng, M. W. Lee, T. E. Choe, and S.-C. Zhu. Joint video and text parsing for understanding events and answering queries. MultiMedia, IEEE, 21(2):42-70, 2014.
-
(2014)
MultiMedia, IEEE
, vol.21
, Issue.2
, pp. 42-70
-
-
Tu, K.1
Meng, M.2
Lee, M.W.3
Choe, T.E.4
Zhu, S.-C.5
-
38
-
-
0002988210
-
Computing machinery and intelligence
-
A. M. Turing. Computing machinery and intelligence. Mind, pages 433-460, 1950.
-
(1950)
Mind
, pp. 433-460
-
-
Turing, A.M.1
-
39
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
41
-
-
85146676791
-
Verbs semantics and lexical selection
-
Z. Wu and M. Palmer. Verbs semantics and lexical selection. In ACL, pages 133-138, 1994.
-
(1994)
ACL
, pp. 133-138
-
-
Wu, Z.1
Palmer, M.2
-
42
-
-
84939821074
-
-
arXiv preprint arXiv:1502.03044
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
-
43
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In ACL, pages 479-488, 2014.
-
(2014)
ACL
, pp. 479-488
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
44
-
-
84937851238
-
Learning from weakly supervised data by the expectation loss SVM (e-SVM) algorithm
-
J. Zhu, J. Mao, and A. L. Yuille. Learning from weakly supervised data by the expectation loss svm (e-svm) algorithm. In NIPS, pages 1125-1133, 2014.
-
(2014)
NIPS
, pp. 1125-1133
-
-
Zhu, J.1
Mao, J.2
Yuille, A.L.3
|