-
5
-
-
84986262382
-
-
arXiv preprint arXiv:1511.05960
-
Chen, K.; Wang, J.; Chen, L.-C.; Gao, H.; Xu, W.; and Nevatia, R. 2015. Abc-cnn: An attention based convolutional neural network for visual question answering. arXiv preprint arXiv:1511.05960.
-
(2015)
Abc-cnn: An Attention Based Convolutional Neural Network for Visual Question Answering
-
-
Chen, K.1
Wang, J.2
Chen, L.-C.3
Gao, H.4
Xu, W.5
Nevatia, R.6
-
6
-
-
84919728106
-
-
arXiv preprint arXiv:1406.1078
-
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; and Bengio, Y. 2014. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078.
-
(2014)
Learning Phrase Representations Using Rnn Encoder-decoder for Statistical Machine Translation
-
-
Cho, K.1
Van Merriënboer, B.2
Gulcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
7
-
-
84990044140
-
-
arXiv preprint arXiv:1606.03556
-
Das, A.; Agrawal, H.; Zitnick, C. L.; Parikh, D.; and Batra, D. 2016. Human attention in visual question answering: Do humans and deep networks look at the same regions? arXiv preprint arXiv:1606.03556.
-
(2016)
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?
-
-
Das, A.1
Agrawal, H.2
Zitnick, C.L.3
Parikh, D.4
Batra, D.5
-
8
-
-
85198028989
-
Imagenet: A large-scale hierarchical image database
-
IEEE
-
Deng, J.; Dong, W.; Socher, R.; Li, L.-J.; Li, K.; and Fei-Fei, L. 2009. Imagenet: A large-scale hierarchical image database. In CVPR, 248-255. IEEE.
-
(2009)
CVPR
, pp. 248-255
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.-J.4
Li, K.5
Fei-Fei, L.6
-
9
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Donahue, J.; Anne Hendricks, L.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; and Darrell, T. 2015. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2625-2634.
-
(2015)
CVPR
, pp. 2625-2634
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
10
-
-
84959250180
-
From captions to visual concepts and back
-
Fang, H.; Gupta, S.; Iandola, F.; Srivastava, R. K.; Deng, L.; Dollár, P.; Gao, J.; He, X.; Mitchell, M.; Platt, J. C.; et al. 2015. From captions to visual concepts and back. In CVPR, 1473-1482.
-
(2015)
CVPR
, pp. 1473-1482
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.K.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
-
13
-
-
84986302997
-
-
arXiv preprint arXiv:1511.04164
-
Hu, R.; Xu, H.; Rohrbach, M.; Feng, J.; Saenko, K.; and Darrell, T. 2015. Natural language object retrieval. arXiv preprint arXiv:1511.04164.
-
(2015)
Natural Language Object Retrieval
-
-
Hu, R.1
Xu, H.2
Rohrbach, M.3
Feng, J.4
Saenko, K.5
Darrell, T.6
-
14
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
Karpathy, A., and Fei-Fei, L. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR, 3128-3137.
-
(2015)
CVPR
, pp. 3128-3137
-
-
Karpathy, A.1
Fei-Fei, L.2
-
17
-
-
84937834115
-
-
arXiv preprint arXiv:1405.0312
-
Lin, T.-Y.; Maire, M.; Belongie, S.; Bourdev, L.; Girshick, R.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C. L.; and Dollár, P. 2014. Microsoft coco: Common objects in context. arXiv preprint arXiv:1405.0312.
-
(2014)
Microsoft Coco: Common Objects in Context
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Bourdev, L.4
Girshick, R.5
Hays, J.6
Perona, P.7
Ramanan, D.8
Zitnick, C.L.9
Dollár, P.10
-
19
-
-
85117622017
-
The stanford corenlp natural language processing toolkit
-
Manning, C. D.; Surdeanu, M.; Bauer, J.; Finkel, J. R.; Bethard, S.; and McClosky, D. 2014. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), 55-60.
-
(2014)
ACL (System Demonstrations)
, pp. 55-60
-
-
Manning, C.D.1
Surdeanu, M.2
Bauer, J.3
Finkel, J.R.4
Bethard, S.5
McClosky, D.6
-
20
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Huang, Z.; and Yuille, A. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
21
-
-
84986260074
-
Generation and comprehension of unambiguous object descriptions
-
Mao, J.; Huang, J.; Toshev, A.; Camburu, O.; Yuille, A.; and Murphy, K. 2016. Generation and comprehension of unambiguous object descriptions. In CVPR.
-
(2016)
CVPR
-
-
Mao, J.1
Huang, J.2
Toshev, A.3
Camburu, O.4
Yuille, A.5
Murphy, K.6
-
22
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G. S.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, 3111-3119.
-
(2013)
Advances in Neural Information Processing Systems
, pp. 3111-3119
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
24
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
Papineni, K.; Roukos, S.; Ward, T.; and Zhu, W.-J. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL, 311-318.
-
(2002)
ACL
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
25
-
-
84973856017
-
Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
-
Plummer, B. A.; Wang, L.; Cervantes, C. M.; Caicedo, J. C.; Hockenmaier, J.; and Lazebnik, S. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In ICCV, 2641-2649.
-
(2015)
ICCV
, pp. 2641-2649
-
-
Plummer, B.A.1
Wang, L.2
Cervantes, C.M.3
Caicedo, J.C.4
Hockenmaier, J.5
Lazebnik, S.6
-
28
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014. Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research 15(1):1929-1958.
-
(2014)
The Journal of Machine Learning Research
, vol.15
, Issue.1
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
29
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Vinyals, O.; Toshev, A.; Bengio, S.; and Erhan, D. 2015. Show and tell: A neural image caption generator. In CVPR, 3156-3164.
-
(2015)
CVPR
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
31
-
-
84939821074
-
-
arXiv preprint arXiv:1502.03044
-
Xu, K.; Ba, J.; Kiros, R.; Courville, A.; Salakhutdinov, R.; Zemel, R.; and Bengio, Y. 2015. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044.
-
(2015)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Courville, A.4
Salakhutdinov, R.5
Zemel, R.6
Bengio, Y.7
-
32
-
-
84995439884
-
-
arXiv preprint arXiv:1603.03925
-
You, Q.; Jin, H.; Wang, Z.; Fang, C.; and Luo, J. 2016. Image captioning with semantic attention. arXiv preprint arXiv:1603.03925.
-
(2016)
Image Captioning with Semantic Attention
-
-
You, Q.1
Jin, H.2
Wang, Z.3
Fang, C.4
Luo, J.5
-
33
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Young, P.; Lai, A.; Hodosh, M.; and Hockenmaier, J. 2014. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the Association for Computational Linguistics 2:67-78.
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
|