-
1
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
1, 2, 3, 4
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In ICLR, 2015. 1, 2, 3, 4
-
(2015)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
3
-
-
84957029470
-
Mind's eye: A recurrent visual representation for image caption generation
-
6
-
X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015. 6
-
(2015)
CVPR
-
-
Chen, X.1
Zitnick, C.L.2
-
4
-
-
85097641926
-
On the properties of neural machine translation: Encoderdecoder approaches
-
4, 5
-
K. Cho, B. van Merrienboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoderdecoder approaches. In Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8), 2014. 4, 5
-
(2014)
Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8)
-
-
Cho, K.1
Van Merrienboer, B.2
Bahdanau, D.3
Bengio, Y.4
-
5
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
1, 2, 3
-
K. Cho, B. van Merrienboer, C. Gülcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, 2014. 1, 2, 3
-
(2014)
EMNLP
-
-
Cho, K.1
Van Merrienboer, B.2
Gülcehre, C.3
Bahdanau, D.4
Bougares, F.5
Schwenk, H.6
Bengio, Y.7
-
7
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
1, 2, 3, 8
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015. 1, 2, 3, 8
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
8
-
-
84959250180
-
From captions to visual concepts and back
-
6
-
H. Fang, S. Gupta, F. N. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, C. L. Zitnick, and G. Zweig. From captions to visual concepts and back. In CVPR, 2015. 6
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
9
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
1, 2
-
A. Farhadi, S. M. M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. A. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV (4), 2010. 1, 2
-
(2010)
ECCV
, Issue.4
-
-
Farhadi, A.1
Hejrati, S.M.M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.A.7
-
10
-
-
84894905366
-
A multi-view embedding space for modeling internet images, tags, and their semantics
-
4, 6
-
Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 106 (2): 210-233, 2014. 4, 6
-
(2014)
IJCV
, vol.106
, Issue.2
, pp. 210-233
-
-
Gong, Y.1
Ke, Q.2
Isard, M.3
Lazebnik, S.4
-
13
-
-
84943739264
-
-
CoRR, abs/1503. 04069 3
-
K. Greff, R. K. Srivastava, J. Koutník, B. R. Steunebrink, and J. Schmidhuber. LSTM: A search space odyssey. CoRR, abs/1503. 04069, 2015. 3
-
(2015)
LSTM: A Search Space Odyssey
-
-
Greff, K.1
Srivastava, R.K.2
Koutník, J.3
Steunebrink, B.R.4
Schmidhuber, J.5
-
14
-
-
0031573117
-
Long short-term memory
-
2, 3
-
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9 (8): 1735-1780, 1997. 2, 3
-
(1997)
Neural Comput.
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
15
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
6
-
M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47: 853-899, 2013. 6
-
(2013)
JAIR
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
16
-
-
0000107975
-
Relations between two sets of variates
-
4
-
H. Hotelling. Relations between two sets of variates. Biometrika, pages 321-377, 1936. 4
-
(1936)
Biometrika
, pp. 321-377
-
-
Hotelling, H.1
-
17
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
1, 2, 3, 6, 8
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015. 1, 2, 3, 6, 8
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
18
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
6
-
A. Karpathy, A. Joulin, and F. Li. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, 2014. 6
-
(2014)
NIPS
-
-
Karpathy, A.1
Joulin, A.2
Li, F.3
-
19
-
-
84919921461
-
Multimodal neural language models
-
1, 2, 8
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Multimodal neural language models. In ICML, 2014. 1, 2, 8
-
(2014)
ICML
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
20
-
-
84887601544
-
Babytalk: Understanding and generating simple image descriptions
-
1, 2
-
G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. TPAMI, 35 (12): 2891-2903, 2013. 1, 2
-
(2013)
TPAMI
, vol.35
, Issue.12
, pp. 2891-2903
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.C.7
Berg, T.L.8
-
21
-
-
84878189119
-
Collective generation of natural image descriptions
-
1, 2
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, 2012. 1, 2
-
(2012)
ACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
22
-
-
84907331257
-
Generalizing image captions for image-text parallel corpus
-
1, 2
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Generalizing image captions for image-text parallel corpus. In ACL, 2013. 1, 2
-
(2013)
ACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
23
-
-
84934873221
-
Treetalk: Composition and compression of trees for image descriptions
-
1, 2
-
P. Kuznetsova, V. Ordonez, T. Berg, and Y. Choi. Treetalk: Composition and compression of trees for image descriptions. TACL, 2: 351-362, 2014. 1, 2
-
(2014)
TACL
, vol.2
, pp. 351-362
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.3
Choi, Y.4
-
24
-
-
52149112996
-
Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments
-
6
-
A. Lavie and A. Agarwal. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgments. In Second Workshop on Statistical Machine Translation, 2007. 6
-
(2007)
Second Workshop on Statistical Machine Translation
-
-
Lavie, A.1
Agarwal, A.2
-
25
-
-
84937834115
-
Microsoft COCO: Common objects in context
-
6
-
T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: common objects in context. In ECCV, 2014. 6
-
(2014)
ECCV
-
-
Lin, T.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
26
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (mrnn)
-
1, 2, 3, 8
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Deep captioning with multimodal recurrent neural networks (mrnn). In ICLR, 2015. 1, 2, 3, 8
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
27
-
-
84906925144
-
Nonparametric method for datadriven image captioning
-
1, 2
-
R. Mason and E. Charniak. Nonparametric method for datadriven image captioning. In ACL, 2014. 1, 2
-
(2014)
ACL
-
-
Mason, R.1
Charniak, E.2
-
28
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
1, 2
-
M. Mitchell, J. Dodge, A. Goyal, K. Yamaguchi, K. Stratos, X. Han, A. Mensch, A. C. Berg, T. L. Berg, and H. D. III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012. 1, 2
-
(2012)
EACL
-
-
Mitchell, M.1
Dodge, J.2
Goyal, A.3
Yamaguchi, K.4
Stratos, K.5
Han, X.6
Mensch, A.7
Berg, A.C.8
Berg, T.L.9
-
29
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
6
-
K. Papineni, S. Roukos, T. Ward, and W. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002. 6
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.4
-
30
-
-
85083953063
-
Very deep convolutional networks for large-scale image recognition
-
6, 8
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015. 6, 8
-
(2015)
ICLR
-
-
Simonyan, K.1
Zisserman, A.2
-
31
-
-
80053459857
-
Generating text with recurrent neural networks
-
2
-
I. Sutskever, J. Martens, and G. Hinton. Generating text with recurrent neural networks. In ICML, 2011. 2
-
(2011)
ICML
-
-
Sutskever, I.1
Martens, J.2
Hinton, G.3
-
32
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
1, 2, 3, 5
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014. 1, 2, 3, 5
-
(2014)
NIPS
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
33
-
-
84973926705
-
Going deeper with convolutions
-
8
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In CVPR, 2014. 8
-
(2014)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
36
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
6
-
R. Vedantam, C. L. Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, 2015. 6
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
37
-
-
84946747440
-
Show and tell: A neural image caption generator
-
1, 2, 3, 4, 6, 8
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015. 1, 2, 3, 4, 6, 8
-
(2015)
CVPR
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
38
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
1, 2, 3, 6, 8
-
K. Xu, J. Ba, R. Kiros, K. Cho, A. C. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015. 1, 2, 3, 6, 8
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.C.5
Salakhutdinov, R.6
Zemel, R.S.7
Bengio, Y.8
-
39
-
-
80053258778
-
Corpusguided sentence generation of natural images
-
1, 2
-
Y. Yang, C. L. Teo, H. D. III, and Y. Aloimonos. Corpusguided sentence generation of natural images. In EMNLP, 2011. 1, 2
-
(2011)
EMNLP
-
-
Yang, Y.1
Teo, C.L.2
Aloimonos, Y.3
-
40
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
6
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2: 67-78, 2014. 6
-
(2014)
TACL
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
|