-
1
-
-
80052889458
-
Recognition using visual phrases
-
Sadeghi, M.A., Farhadi, A.: Recognition using visual phrases. In: CVPR, pp. 1745– 1752 (2011)
-
(2011)
CVPR
-
-
Sadeghi, M.A.1
Farhadi, A.2
-
2
-
-
84868289993
-
Choosing linguistics over vision to describe images
-
Gupta, A., Verma, Y., Jawahar, C.: Choosing linguistics over vision to describe images. In: AAAI, pp. 606–612 (2012)
-
(2012)
AAAI
, pp. 606-612
-
-
Gupta, A.1
Verma, Y.2
Jawahar, C.3
-
3
-
-
84960130911
-
Automatic description generation from images: A survey of models, datasets, and evaluation measures
-
Bernardi, R., Cakici, R., Elliott, D., Erdem, A., Erdem, E., Ikizler-Cinbis, N., Keller, F., Muscat, A., Plank, B.: Automatic description generation from images: a survey of models, datasets, and evaluation measures. J. Artif. Intell. Res. 55, 409–442 (2016)
-
(2016)
J. Artif. Intell. Res.
, vol.55
, pp. 409-442
-
-
Bernardi, R.1
Cakici, R.2
Elliott, D.3
Erdem, A.4
Erdem, E.5
Ikizler-Cinbis, N.6
Keller, F.7
Muscat, A.8
Plank, B.9
-
4
-
-
78650967345
-
A new approach to cross-modal multimedia retrieval
-
Rasiwasia, N., Costa Pereira, J., Coviello, E., Doyle, G., Lanckriet, G.R., Levy, R., Vasconcelos, N.: A new approach to cross-modal multimedia retrieval. In: ACMMM, pp. 251–260 (2010)
-
(2010)
ACMMM
, pp. 251-260
-
-
Rasiwasia, N.1
Costa Pereira, J.2
Coviello, E.3
Doyle, G.4
Lanckriet, G.R.5
Levy, R.6
Vasconcelos, N.7
-
5
-
-
84951072975
-
-
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (M-RNN). arXiv preprint arXiv:1412.6632 (2014)
-
(2014)
Deep Captioning with Multimodal Recurrent Neural Networks (M-RNN). Arxiv Preprint Arxiv
, vol.1412
, pp. 6632
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
6
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: a neural image caption generator. In: CVPR, pp. 3156–3164 (2015)
-
(2015)
CVPR
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
7
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR, pp. 3128–3137 (2015)
-
(2015)
CVPR
, pp. 3128-3137
-
-
Karpathy, A.1
Fei-Fei, L.2
-
8
-
-
84946802533
-
-
Kiros, R., Salakhutdinov, R., Zemel, R.: Unifying visual-semantic embeddings with multimodal neural language models. arXiv preprint arXiv:1411.2539 (2014)
-
(2014)
Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models. Arxiv Preprint Arxiv
, vol.1411
, pp. 2539
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.3
-
9
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)
-
(2015)
CVPR
, pp. 2625-2634
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
10
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Zemel, R., Bengio, Y.: Show, attend and tell: neural image caption generation with visual attention. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 2048–2057(2015)
-
(2015)
Proceedings of the 32Nd International Conference on Machine Learning (ICML 2015
, pp. 2048-2057
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhudinov, R.6
Zemel, R.7
Bengio, Y.8
-
11
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)
-
(2012)
NIPS
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.3
-
13
-
-
0000754012
-
A model and an hypothesis for language structure
-
Yngve, V.: A model and an hypothesis for language structure. Proc. Am. Philos. Soc. 104, 444–466 (1960)
-
(1960)
Proc. Am. Philos. Soc.
, vol.104
, pp. 444-466
-
-
Yngve, V.1
-
14
-
-
84994130334
-
-
Tai, K.S., Socher, R., Manning, C.: Improved semantic representations from treestructured long short-term memory networks. arXiv preprint arXiv:1503.00075 (2015)
-
(2015)
Improved Semantic Representations from Treestructured Long Short-Term Memory Networks. Arxiv Preprint Arxiv
, vol.1503
, pp. 00075
-
-
Tai, K.S.1
Socher, R.2
Manning, C.3
-
15
-
-
85090348677
-
Collecting image annotations using Amazon’s mechanical turk
-
Rashtchian, C., Young, P., Hodosh, M., Hockenmaier, J.: Collecting image annotations using Amazon’s mechanical turk. In: NAACL: Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 139–147 (2010)
-
(2010)
NAACL: Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk
, pp. 139-147
-
-
Rashtchian, C.1
Young, P.2
Hodosh, M.3
Hockenmaier, J.4
-
16
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. 2, 67–78 (2014)
-
(2014)
Trans. Assoc. Comput. Linguist.
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
18
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: data, models and evaluation metrics. J. Artif. Intell. Res. 47, 853–899 (2013)
-
(2013)
J. Artif. Intell. Res.
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
19
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
Frome, A., Corrado, G.S., Shlens, J., Bengio, S., Dean, J., Mikolov, T., et al.: Devise: a deep visual-semantic embedding model. In: NIPS, pp. 2121–2129 (2013)
-
(2013)
NIPS
, pp. 2121-2129
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
20
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
Socher, R., Karpathy, A., Le, Q.V., Manning, C.D., Ng, A.: Grounded compositional semantics for finding and describing images with sentences. Trans. Assoc. Comput. Linguist. 2, 207–218 (2014)
-
(2014)
Trans. Assoc. Comput. Linguist.
, vol.2
, pp. 207-218
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.5
-
21
-
-
84937843643
-
Deep fragment embeddings for bidirectional image sentence mapping
-
Karpathy, A., Joulin, A., Fei-Fei, L.: Deep fragment embeddings for bidirectional image sentence mapping. In: NIPS, pp. 1889–1897 (2014)
-
(2014)
NIPS
, pp. 1889-1897
-
-
Karpathy, A.1
Joulin, A.2
Fei-Fei, L.3
-
22
-
-
84877724347
-
Multimodal learning with deep Boltzmann machines
-
Srivastava, N., Salakhutdinov, R.: Multimodal learning with deep Boltzmann machines. In: NIPS, pp. 2222–2230 (2012)
-
(2012)
NIPS
, pp. 2222-2230
-
-
Srivastava, N.1
Salakhutdinov, R.2
-
23
-
-
84856653718
-
Learning cross-modality similarity for multinomial data
-
Jia, Y., Salzmann, M., Darrell, T.: Learning cross-modality similarity for multinomial data. In: ICCV, pp. 2407–2414 (2011)
-
(2011)
ICCV
, pp. 2407-2414
-
-
Jia, Y.1
Salzmann, M.2
Darrell, T.3
-
24
-
-
84929363334
-
Multimodal neural language models
-
Kiros, R., Salakhutdinov, R., Zemel, R.: Multimodal neural language models. In: ICML, pp. 595–603 (2014)
-
(2014)
ICML
, pp. 595-603
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.3
-
25
-
-
84934873221
-
Treetalk: Composition and compression of trees for image descriptions
-
Kuznetsova, P., Ordonez, V., Berg, T., Choi, Y.: Treetalk: composition and compression of trees for image descriptions. Trans. Assoc. Computat. Linguist. 2, 351– 362 (2014)
-
(2014)
Trans. Assoc. Computat. Linguist.
, vol.2
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.3
Choi, Y.4
-
26
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Daniilidis, K., Maragos, P., Paragios, N., Springer, Heidelberg
-
Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010). doi:10.1007/978-3-642-15561-1 2
-
(2010)
ECCV 2010. LNCS
, vol.6314
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
27
-
-
84887601544
-
Babytalk: Understanding and generating simple image descriptions
-
Kulkarni, G., Premraj, V., Ordonez, V., Dhar, S., Li, S., Choi, Y., Berg, A., Berg, T.: Babytalk: understanding and generating simple image descriptions. IEEE Trans. Pattern Anal. Mach. Intell. 35, 2891–2903 (2013)
-
(2013)
IEEE Trans. Pattern Anal. Mach. Intell.
, vol.35
, pp. 2891-2903
-
-
Kulkarni, G.1
Premraj, V.2
Ordonez, V.3
Dhar, S.4
Li, S.5
Choi, Y.6
Berg, A.7
Berg, T.8
-
28
-
-
80053258778
-
Corpus-guided sentence generation of natural images
-
Yang, Y., Teo, C.L., Daumé III, H., Aloimonos, Y.: Corpus-guided sentence generation of natural images. In: EMNLP, pp. 444–454 (2011)
-
(2011)
EMNLP
, pp. 444-454
-
-
Yang, Y.1
Teo, C.L.2
Daumé, H.3
Aloimonos, Y.4
-
29
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Mitchell, M., Han, X., Dodge, J., Mensch, A., Goyal, A., Berg, A., Yamaguchi, K., Berg, T., Stratos, K., Daumé III, H.: Midge: generating image descriptions from computer vision detections. In: EACL, pp. 747–756 (2012)
-
(2012)
EACL
, pp. 747-756
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Daumé, H.10
-
30
-
-
84869018122
-
From image annotation to image description
-
Huang, T., Zeng, Z., Li, C., Leung, C.S., Springer, Heidelberg
-
Gupta, A., Mannem, P.: From image annotation to image description. In: Huang, T., Zeng, Z., Li, C., Leung, C.S. (eds.) ICONIP 2012. LNCS, vol. 7667, pp. 196–204. Springer, Heidelberg (2012). doi:10.1007/978-3-642-34500-5 24
-
(2012)
ICONIP 2012. LNCS
, vol.7667
, pp. 196-204
-
-
Gupta, A.1
Mannem, P.2
-
31
-
-
84862279067
-
Composing simple image descriptions using web-scale n-grams
-
Li, S., Kulkarni, G., Berg, T., Berg, A., Choi, Y.: Composing simple image descriptions using web-scale n-grams. In: CoNLL, pp. 220–228 (2011)
-
(2011)
Conll
, pp. 220-228
-
-
Li, S.1
Kulkarni, G.2
Berg, T.3
Berg, A.4
Choi, Y.5
-
32
-
-
84878189119
-
Collective generation of natural image descriptions
-
Kuznetsova, P., Ordonez, V., Berg, A., Berg, T., Choi, Y.: Collective generation of natural image descriptions. In: ACL, pp. 359–368 (2012)
-
(2012)
ACL
, pp. 359-368
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, A.3
Berg, T.4
Choi, Y.5
-
34
-
-
85117622017
-
The Stanford CoreNLP natural language processing toolkit
-
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: ACL, pp. 55–60 (2014)
-
(2014)
ACL
, pp. 55-60
-
-
Manning, C.1
Surdeanu, M.2
Bauer, J.3
Finkel, J.4
Bethard, S.5
McClosky, D.6
-
36
-
-
85198028989
-
Imagenet: A large-scale hierarchical image database
-
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
-
(2009)
CVPR
, pp. 248-255
-
-
Deng, J.1
Dong, W.2
Socher, R.3
Li, L.J.4
Li, K.5
Fei-Fei, L.6
-
37
-
-
85016240817
-
Lecture 6a overview of mini-batch gradient descent
-
Hinton, G., Srivastava, N., Swersky, K.: Lecture 6a overview of mini-batch gradient descent (2012). Coursera Lecture slides https://class.coursera.org/neuralnets-2012-001/lecture
-
(2012)
Coursera Lecture Slides
-
-
Hinton, G.1
Srivastava, N.2
Swersky, K.3
-
38
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
-
(2014)
J. Mach. Learn. Res.
, vol.15
, pp. 1929-1958
-
-
Srivastava, N.1
Hinton, G.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
39
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL, pp. 311–318 (2002)
-
(2002)
ACL
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.J.4
|