-
1
-
-
0041876117
-
Matching words and pictures
-
Barnard, Kobus, Duygulu, Pinar, Forsyth, David, De Freitas, Nando, Blei, David M, and Jordan, Michael I. Matching words and pictures. JMLR, 3:1107–1135, 2003.
-
(2003)
JMLR
, vol.3
, pp. 1107-1135
-
-
Barnard, K.1
Duygulu, P.2
Forsyth, D.3
De Freitas, N.4
Blei, D.M.5
Jordan, M.I.6
-
2
-
-
84952349295
-
-
arXiv preprint
-
Chen, X., Fang, H., Lin, TY, Vedantam, R., Gupta, S., Dollr, P., and Zitnick, C. L. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015.
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.Y.3
Vedantam, R.4
Gupta, S.5
Dollr, P.6
Zitnick, C.L.7
-
4
-
-
84919728106
-
-
arXiv preprint
-
Cho, Kyunghyun, van Merrienboer, Bart, Gulcehre, Caglar, Bougares, Fethi, Schwenk, Holger, and Bengio, Yoshua. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
-
(2014)
Learning Phrase Representations Using Rnn Encoder-Decoder for Statistical Machine Translation
-
-
Cho, K.1
van Merrienboer, B.2
Gulcehre, C.3
Bougares, F.4
Schwenk, H.5
Bengio, Y.6
-
5
-
-
84952349296
-
-
arXiv preprint
-
Devlin, Jacob, Cheng, Hao, Fang, Hao, Gupta, Saurabh, Deng, Li, He, Xiaodong, Zweig, Geoffrey, and Mitchell, Margaret. Language models for image captioning: The quirks and what works. arXiv preprint arXiv:1505.01809, 2015a.
-
(2015)
Language Models for Image Captioning: The Quirks and What Works
-
-
Devlin, J.1
Cheng, H.2
Fang, H.3
Gupta, S.4
Deng, L.5
He, X.6
Zweig, G.7
Mitchell, M.8
-
6
-
-
84965102873
-
-
arXiv preprint
-
Devlin, Jacob, Gupta, Saurabh, Girshick, Ross, Mitchell, Margaret, and Zitnick, C Lawrence. Exploring nearest neighbor approaches for image captioning. arXiv preprint arXiv:1505.04467, 2015b.
-
(2015)
Exploring Nearest Neighbor Approaches for Image Captioning
-
-
Devlin, J.1
Gupta, S.2
Girshick, R.3
Mitchell, M.4
Zitnick, C.L.5
-
7
-
-
84944046597
-
-
arXiv preprint
-
Donahue, Jeff, Hendricks, Lisa Anne, Guadarrama, Sergio, Rohrbach, Marcus, Venugopalan, Subhashini, Saenko, Kate, and Darrell, Trevor. Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411.4389, 2014.
-
(2014)
Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
8
-
-
26444565569
-
Finding structure in time
-
Elman, Jeffrey L. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
-
(1990)
Cognitive Science
, vol.14
, Issue.2
, pp. 179-211
-
-
Elman, J.L.1
-
9
-
-
84944115860
-
-
arXiv preprint
-
Fang, Hao, Gupta, Saurabh, Iandola, Forrest, Srivastava, Rupesh, Deng, Li, Dollár, Piotr, Gao, Jianfeng, He, Xiaodong, Mitchell, Margaret, Platt, John, et al. From captions to visual concepts and back. arXiv preprint arXiv:1411.4952, 2014.
-
(2014)
From Captions to Visual Concepts and Back
-
-
Fang, H.1
Gupta, S.2
Iandola, F.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.10
-
10
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
Farhadi, Ali, Hejrati, Mohsen, Sadeghi, Mohammad Amin, Young, Peter, Rashtchian, Cyrus, Hockenmaier, Julia, and Forsyth, David. Every picture tells a story: Generating sentences from images. In ECCV, pp. 15–29. 2010.
-
(2010)
ECCV
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
11
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
Frome, Andrea, Corrado, Greg S, Shlens, Jon, Bengio, Samy, Dean, Jeff, Mikolov, Tomas, et al. Devise: A deep visual-semantic embedding model. In NIPS, pp. 2121–2129, 2013.
-
(2013)
NIPS
, pp. 2121-2129
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
12
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
Girshick, R., Donahue, J., Darrell, T., and Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
13
-
-
38049183286
-
The iapr tc-12 benchmark: A new evaluation resource for visual information systems
-
Grubinger, Michael, Clough, Paul, Müller, Henning, and Deselaers, Thomas. The iapr tc-12 benchmark: A new evaluation resource for visual information systems. In International Workshop OntoImage, pp. 13–23, 2006.
-
(2006)
International Workshop OntoImage
, pp. 13-23
-
-
Grubinger, M.1
Clough, P.2
Müller, H.3
Deselaers, T.4
-
14
-
-
78149341381
-
Multiple instance metric learning from automatically labeled bags of faces
-
Guillaumin, Matthieu, Verbeek, Jakob, and Schmid, Cordelia. Multiple instance metric learning from automatically labeled bags of faces. In ECCV, pp. 634–647, 2010.
-
(2010)
ECCV
, pp. 634-647
-
-
Guillaumin, M.1
Verbeek, J.2
Schmid, C.3
-
15
-
-
84973931408
-
From image annotation to image description
-
Gupta, Ankush and Mannem, Prashanth. From image annotation to image description. In ICONIP, 2012.
-
(2012)
ICONIP
-
-
Gupta, A.1
Mannem, P.2
-
16
-
-
85059866463
-
Choosing linguistics over vision to describe images
-
Gupta, Ankush, Verma, Yashaswi, and Jawahar, CV. Choosing linguistics over vision to describe images. In AAAI, 2012.
-
(2012)
AAAI
-
-
Gupta, A.1
Verma, Y.2
Jawahar, C.V.3
-
18
-
-
84883394520
-
Framing image description as a ranking task: Data, models and evaluation metrics
-
Hodosh, Micah, Young, Peter, and Hockenmaier, Julia. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47:853–899, 2013.
-
(2013)
JAIR
, vol.47
, pp. 853-899
-
-
Hodosh, M.1
Young, P.2
Hockenmaier, J.3
-
19
-
-
84856653718
-
Learning cross-modality similarity for multinomial data
-
Jia, Yangqing, Salzmann, Mathieu, and Darrell, Trevor. Learning cross-modality similarity for multinomial data. In ICCV, pp. 2407–2414, 2011.
-
(2011)
ICCV
, pp. 2407-2414
-
-
Jia, Y.1
Salzmann, M.2
Darrell, T.3
-
20
-
-
84926283798
-
Recurrent continuous translation models
-
Kalchbrenner, Nal and Blunsom, Phil. Recurrent continuous translation models. In EMNLP, pp. 1700–1709, 2013.
-
(2013)
EMNLP
, pp. 1700-1709
-
-
Kalchbrenner, N.1
Blunsom, P.2
-
25
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. In NIPS, pp. 1097–1105, 2012.
-
(2012)
NIPS
, pp. 1097-1105
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
26
-
-
80052901011
-
Baby talk: Understanding and generating image descriptions
-
Kulkarni, Girish, Premraj, Visruth, Dhar, Sagnik, Li, Siming, Choi, Yejin, Berg, Alexander C, and Berg, Tamara L. Baby talk: Understanding and generating image descriptions. In CVPR, 2011.
-
(2011)
CVPR
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
27
-
-
84934873221
-
TreeTalk: Composition and compression of trees for image descriptions
-
Kuznetsova, Polina, Ordonez, Vicente, Berg, Tamara L, and Choi, Yejin. Treetalk: Composition and compression of trees for image descriptions. Transactions of the Association for Computational Linguistics, 2(10):351–362, 2014.
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, Issue.10
, pp. 351-362
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.L.3
Choi, Y.4
-
28
-
-
84872543023
-
Efficient backprop
-
Springer
-
LeCun, Yann A, Bottou, Léon, Orr, Genevieve B, and Müller, Klaus-Robert. Efficient backprop. In Neural networks: Tricks of the trade, pp. 9–48. Springer, 2012.
-
(2012)
Neural Networks: Tricks of the Trade
, pp. 9-48
-
-
LeCun, Y.A.1
Bottou, L.2
Orr, G.B.3
Müller, K.-R.4
-
29
-
-
84937834115
-
-
arXiv preprint
-
Lin, Tsung-Yi, Maire, Michael, Belongie, Serge, Hays, James, Perona, Pietro, Ramanan, Deva, Dollár, Piotr, and Zitnick, C Lawrence. Microsoft coco: Common objects in context. arXiv preprint arXiv:1405.0312, 2014.
-
(2014)
Microsoft Coco: Common Objects in Context
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
30
-
-
84951072975
-
Explain images with multimodal recurrent neural networks
-
Mao, Junhua, Xu, Wei, Yang, Yi, Wang, Jiang, and Yuille, Alan L. Explain images with multimodal recurrent neural networks. NIPS DeepLearning Workshop, 2014.
-
(2014)
NIPS DeepLearning Workshop
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
31
-
-
84965160495
-
-
arXiv preprint
-
Mao, Junhua, Xu, Wei, Yang, Yi, Wang, Jiang, Huang, Zhiheng, and Yuille, Alan. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. arXiv preprint arXiv:1504.06692, 2015.
-
(2015)
Learning like A Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
32
-
-
79959829092
-
Recurrent neural network based language model
-
Mikolov, Tomas, Karafiát, Martin, Burget, Lukas, Cernocky, Jan, and Khudanpur, Sanjeev. Recurrent neural network based language model. In INTERSPEECH, pp. 1045–1048, 2010.
-
(2010)
INTERSPEECH
, pp. 1045-1048
-
-
Mikolov, T.1
Karafiát, M.2
Burget, L.3
Cernocky, J.4
Khudanpur, S.5
-
33
-
-
80051643236
-
Extensions of recurrent neural network language model
-
Mikolov, Tomas, Kombrink, Stefan, Burget, Lukas, Cernocky, JH, and Khudanpur, Sanjeev. Extensions of recurrent neural network language model. In ICASSP, pp. 5528–5531, 2011.
-
(2011)
ICASSP
, pp. 5528-5531
-
-
Mikolov, T.1
Kombrink, S.2
Burget, L.3
Cernocky, J.H.4
Khudanpur, S.5
-
34
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
Mikolov, Tomas, Sutskever, Ilya, Chen, Kai, Corrado, Greg S, and Dean, Jeff. Distributed representations of words and phrases and their compositionality. In NIPS, pp. 3111–3119, 2013.
-
(2013)
NIPS
, pp. 3111-3119
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.S.4
Dean, J.5
-
35
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Mitchell, Margaret, Han, Xufeng, Dodge, Jesse, Mensch, Alyssa, Goyal, Amit, Berg, Alex, Yamaguchi, Kota, Berg, Tamara, Stratos, Karl, and Daumé III, Hal. Midge: Generating image descriptions from computer vision detections. In EACL, 2012.
-
(2012)
EACL
-
-
Mitchell, M.1
Han, X.2
Dodge, J.3
Mensch, A.4
Goyal, A.5
Berg, A.6
Yamaguchi, K.7
Berg, T.8
Stratos, K.9
Daumé, H.10
-
36
-
-
34547970628
-
Three new graphical models for statistical language modelling
-
ACM
-
Mnih, Andriy and Hinton, Geoffrey. Three new graphical models for statistical language modelling. In ICML, pp. 641–648. ACM, 2007.
-
(2007)
ICML
, pp. 641-648
-
-
Mnih, A.1
Hinton, G.2
-
37
-
-
77956509090
-
Rectified linear units improve restricted boltzmann machines
-
Nair, Vinod and Hinton, Geoffrey E. Rectified linear units improve restricted boltzmann machines. In ICML, pp. 807–814, 2010.
-
(2010)
ICML
, pp. 807-814
-
-
Nair, V.1
Hinton, G.E.2
-
38
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
Papineni, Kishore, Roukos, Salim, Ward, Todd, and Zhu, Wei-Jing. Bleu: a method for automatic evaluation of machine translation. In ACL, pp. 311–318, 2002.
-
(2002)
ACL
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
39
-
-
85090348677
-
Collecting image annotations using amazon’s mechanical turk
-
Rashtchian, Cyrus, Young, Peter, Hodosh, Micah, and Hockenmaier, Julia. Collecting image annotations using amazon’s mechanical turk. In NAACL-HLT workshop 2010, pp. 139–147, 2010.
-
(2010)
NAACL-HLT Workshop 2010
, pp. 139-147
-
-
Rashtchian, C.1
Young, P.2
Hodosh, M.3
Hockenmaier, J.4
-
41
-
-
84909978410
-
-
Russakovsky, Olga, Deng, Jia, Su, Hao, Krause, Jonathan, Satheesh, Sanjeev, Ma, Sean, Huang, Zhiheng, Karpathy, Andrej, Khosla, Aditya, Bernstein, Michael, Berg, Alexander C., and Fei-Fei, Li. ImageNet Large Scale Visual Recognition Challenge, 2014.
-
(2014)
ImageNet Large Scale Visual Recognition Challenge
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
43
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
Socher, Richard, Le, Q, Manning, C, and Ng, A. Grounded compositional semantics for finding and describing images with sentences. In TACL, 2014.
-
(2014)
TACL
-
-
Socher, R.1
Le, Q.2
Manning, C.3
Ng, A.4
-
44
-
-
84877724347
-
Multimodal learning with deep boltzmann machines
-
Srivastava, Nitish and Salakhutdinov, Ruslan. Multimodal learning with deep boltzmann machines. In NIPS, pp. 2222–2230, 2012.
-
(2012)
NIPS
, pp. 2222-2230
-
-
Srivastava, N.1
Salakhutdinov, R.2
-
45
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
Sutskever, Ilya, Vinyals, Oriol, and Le, Quoc VV. Sequence to sequence learning with neural networks. In NIPS, pp. 3104–3112, 2014.
-
(2014)
NIPS
, pp. 3104-3112
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.V.3
-
47
-
-
84939821075
-
-
arXiv preprint
-
Vinyals, Oriol, Toshev, Alexander, Bengio, Samy, and Erhan, Dumitru. Show and tell: A neural image caption generator. arXiv preprint arXiv:1411.4555, 2014.
-
(2014)
Show and Tell: A Neural Image Caption Generator
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
48
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Young, Peter, Lai, Alice, Hodosh, Micah, and Hockenmaier, Julia. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In ACL, pp. 479–488, 2014.
-
(2014)
ACL
, pp. 479-488
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
|