-
1
-
-
84947041871
-
ImageNet large scale visual recognition challenge
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, "ImageNet large scale visual recognition challenge," Int. J. Comput. Vis. (IJCV), vol. 115, no. 3, pp. 211-252, 2015.
-
(2015)
Int. J. Comput. Vis. (IJCV)
, vol.115
, Issue.3
, pp. 211-252
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
2
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi "Every picture tells a story: Generating sentences from images," in Proc. 11th Eur. Conf. Comput. Vis.: Part IV, 2010, pp. 15-29.
-
(2010)
Proc. 11th Eur. Conf. Comput. Vis.: Part IV
, pp. 15-29
-
-
Farhadi, A.1
-
3
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
G. Kulkarni, "Baby talk: Understanding and generating simple image descriptions," in Proc. IEEE Conf Comput. Vis. Pattern Recog., 2011, pp. 1601-1608.
-
(2011)
Proc. IEEE Conf Comput. Vis. Pattern Recog.
, pp. 1601-1608
-
-
Kulkarni, G.1
-
4
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio, "Learning phrase representations using RNN encoder-decoder for statistical machine translation," in Proc. Empirical Methods Natural Lang. Process., 2014.
-
(2014)
Proc. Empirical Methods Natural Lang. Process
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bougares, F.4
Schwenk, H.5
Bengio, Y.6
-
7
-
-
84906347546
-
-
arXiv:1312.6229
-
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, "Overfeat: Integrated recognition, localization and detection using convolutional networks," arXiv:1312.6229, 2013.
-
(2013)
Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks
-
-
Sermanet, P.1
Eigen, D.2
Zhang, X.3
Mathieu, M.4
Fergus, R.5
LeCun, Y.6
-
8
-
-
0030397830
-
Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences
-
R. Gerber and H.-H. Nagel, "Knowledge representation for the generation of quantified natural language descriptions of vehicle traffic in image sequences," in Proc. Int. Conf. Image Process, 1996, pp. 805-808.
-
(1996)
Proc. Int. Conf. Image Process
, pp. 805-808
-
-
Gerber, R.1
Nagel, H.-H.2
-
9
-
-
77954862144
-
I2t: Image parsing to text description
-
Aug.
-
B. Z. Yao, X. Yang, L. Lin, M. W. Lee, and S.-C. Zhu, "I2t: Image parsing to text description," in Proc. IEEE, vol. 98, no. 8, pp. 1485-1508, Aug. 2010.
-
(2010)
Proc. IEEE
, vol.98
, Issue.8
, pp. 1485-1508
-
-
Yao, B.Z.1
Yang, X.2
Lin, L.3
Lee, M.W.4
Zhu, S.-C.5
-
10
-
-
84862279067
-
Composing simple image descriptions using web-scale n-grams
-
S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi, "Composing simple image descriptions using web-scale n-grams," in Proc. Conf. Comput. Natural Lang. Learn., 2011, pp. 220-228.
-
(2011)
Proc. Conf. Comput. Natural Lang. Learn.
, pp. 220-228
-
-
Li, S.1
Kulkarni, G.2
Berg, T.L.3
Berg, A.C.4
Choi, Y.5
-
13
-
-
84878189119
-
Collective generation of natural image descriptions
-
P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi, "Collective generation of natural image descriptions," in Proc. 50th Annu. Meet. Assoc. Comput. Linguistics: Long Papers-Vol. 1, 2012, pp. 359-368.
-
(2012)
Proc. 50th Annu. Meet. Assoc. Comput. Linguistics: Long Papers-Vol. 1
, pp. 359-368
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.C.3
Berg, T.L.4
Choi, Y.5
-
14
-
-
84934873221
-
Treetalk: Composition and compression of trees for image descriptions
-
P. Kuznetsova, V. Ordonez, T. Berg, and Y. Choi, "Treetalk: Composition and compression of trees for image descriptions," in Proc. Assoc. Comput. Linguistics, vol. 2, no. 10, 2014.
-
(2014)
Proc. Assoc. Comput. Linguistics
, vol.2
, Issue.10
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.3
Choi, Y.4
-
16
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
M. Hodosh, P. Young, and J. Hockenmaier, "Framing image description as a ranking task: Data, models and evaluation metrics," J. Artif. Intell. Res., vol. 47, pp. 853-899, 2013.
-
(2013)
J. Artif. Intell. Res.
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
17
-
-
84906484732
-
Improving image-sentence embeddings using large weakly annotated photo collections
-
Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik, "Improving image-sentence embeddings using large weakly annotated photo collections," in Proc. Eur. Conf. Comput. Vis., 2014, pp. 529-545.
-
(2014)
Proc. Eur. Conf. Comput. Vis.
, pp. 529-545
-
-
Gong, Y.1
Wang, L.2
Hodosh, M.3
Hockenmaier, J.4
Lazebnik, S.5
-
19
-
-
84965102873
-
-
arXiv:1505.04467
-
J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick, "Exploring nearest neighbor approaches for image captioning," arXiv:1505.04467, 2015.
-
(2015)
Exploring Nearest Neighbor Approaches for Image Captioning
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.3
Mitchell, M.4
Zitnick, C.L.5
-
21
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, A. Karpathy, Q. V. Le, C. Manning, and A. Y. Ng, "Grounded compositional semantics for finding and describing images with sentences," in Proc. Assoc. Comput. Linguistics, 2014.
-
(2014)
Proc. Assoc. Comput. Linguistics
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.4
Ng, A.Y.5
-
24
-
-
84969584486
-
Batch normalization: Accelerating deep network training by reducing internal covariate shift
-
S. Ioffe and C. Szegedy, "Batch normalization: Accelerating deep network training by reducing internal covariate shift," in Proc. Int. Conf. Mach. Learn., 2015.
-
(2015)
Proc. Int. Conf. Mach. Learn.
-
-
Ioffe, S.1
Szegedy, C.2
-
25
-
-
0031573117
-
Long short-term memory
-
S. Hochreiter and J. Schmidhuber, "Long short-term memory," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Comput.
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
27
-
-
84951072975
-
-
in arXiv:1410.1090
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille, "Explain images with multimodal recurrent neural networks," in arXiv:1410.1090, 2014.
-
(2014)
Explain Images with Multimodal Recurrent Neural Networks
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.5
-
28
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-RNN)
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille, "Deep captioning with multimodal recurrent neural networks (m-RNN)," Int. Conf. Learn. Representations, 2015.
-
(2015)
Int. Conf. Learn. Representations
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.5
-
30
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, "Long-term recurrent convolutional networks for visual recognition and description," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015.
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog.
-
-
Donahue, J.1
-
31
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, "Show, attend and tell: Neural image caption generation with visual attention," in Proc. Int. Conf. Mach. Learn., 2015.
-
(2015)
Proc. Int. Conf. Mach. Learn.
-
-
Xu, K.1
-
33
-
-
85015796277
-
Mind's eye: A recurrent visual representation for image caption generation
-
X. Chen and C. L. Zitnick, "Mind's eye: A recurrent visual representation for image caption generation," Neural Comput., vol. 9, no. 8, pp. 1735-1780, 1997.
-
(1997)
Neural Comput.
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Chen, X.1
Zitnick, C.L.2
-
34
-
-
84944096380
-
Language models for image captioning: The quirks and what works
-
J. Devlin, "Language models for image captioning: The quirks and what works," in Proc. Assoc. Comput. Linguistics, 2015.
-
(2015)
Proc. Assoc. Comput. Linguistics
-
-
Devlin, J.1
-
35
-
-
84919881041
-
Decaf: A deep convolutional activation feature for generic visual recognition
-
J. Donahue, et al., "Decaf: A deep convolutional activation feature for generic visual recognition," in Proc. Int. Conf. Mach. Learn., 2014.
-
(2014)
Proc. Int. Conf. Mach. Learn.
-
-
Donahue, J.1
-
36
-
-
85083951332
-
Efficient estimation of word representations in vector space
-
T. Mikolov, K. Chen, G. Corrado, and J. Dean, "Efficient estimation of word representations in vector space," in Int. Conf. Learn. Representations, 2013.
-
(2013)
Int. Conf. Learn. Representations
-
-
Mikolov, T.1
Chen, K.2
Corrado, G.3
Dean, J.4
-
38
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W. J. Zhu, "BLEU: A method for automatic evaluation of machine translation," in Proc. 40th Annu. Meeting Assoc. Comput. Linguistics, 2002, pp. 311-318.
-
(2002)
Proc. 40th Annu. Meeting Assoc. Comput. Linguistics
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.J.4
-
42
-
-
85090348677
-
Collecting image annotations using Amazon's Mechanical Turk
-
C. Rashtchian, P. Young, M. Hodosh, and J. Hockenmaier, "Collecting image annotations using Amazon's Mechanical Turk," in Proc. NAACL HLT Workshop Creating Speech Lang. Data Amazon's Mech. Turk, 2010, pp. 139-147.
-
(2010)
Proc. NAACL HLT Workshop Creating Speech Lang. Data Amazon's Mech. Turk
, pp. 139-147
-
-
Rashtchian, C.1
Young, P.2
Hodosh, M.3
Hockenmaier, J.4
-
43
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Young, A. Lai, M. Hodosh, and J. Hockenmaier, "From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions," Trans. Assoc. Comput. Linguistics, vol. 2, pp. 67-78, 2014.
-
(2014)
Trans. Assoc. Comput. Linguistics
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
46
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, "Show and tell: A neural image caption generator," in Proc. IEEE Conf. Comput. Vis. Pattern Recog., 2015, pp. 3156-3164.
-
(2015)
Proc. IEEE Conf. Comput. Vis. Pattern Recog.
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
47
-
-
84965179228
-
Scheduled sampling for sequence prediction with recurrent neural networks
-
S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer, "Scheduled sampling for sequence prediction with recurrent neural networks," Adv. Neural Inf. Process. Syst., pp. 1171-1179, 2015.
-
(2015)
Adv. Neural Inf. Process. Syst.
, pp. 1171-1179
-
-
Bengio, S.1
Vinyals, O.2
Jaitly, N.3
Shazeer, N.4
-
49
-
-
0030211964
-
Bagging predictors
-
L. Breiman, "Bagging predictors," in Proc. Mach. Learn., vol. 24, 1996, pp. 123-140.
-
(1996)
Proc. Mach. Learn.
, vol.24
, pp. 123-140
-
-
Breiman, L.1
|