-
1
-
-
84885996388
-
Video in sentences out
-
Barbu, A., Bridge, A., Burchill, Z., Coroian, D., Dickinson, S., Fidler, S., Michaux, A., Mussman, S., Narayanaswamy, S., Salvi, D., Schmidt, L., Shangguan, J., Siskind, J.M., Waggoner, J., Wang, S., Wei, J., Yin, Y., Zhang, Z.: Video in sentences out. In: UAI (2012)
-
(2012)
UAI
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Narayanaswamy, S.9
Salvi, D.10
Schmidt, L.11
Shangguan, J.12
Siskind, J.M.13
Waggoner, J.14
Wang, S.15
Wei, J.16
Yin, Y.17
Zhang, Z.18
-
2
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
Chen, D., Dolan, W.: Collecting highly parallel data for paraphrase evaluation. In: ACL (2011)
-
(2011)
ACL
-
-
Chen, D.1
Dolan, W.2
-
3
-
-
84952349295
-
-
arXiv:1504.00325
-
Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollr, P., Zitnick, C.L.: Microsoft coco captions: data collection and evaluation server (2015). arXiv:1504.00325
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.3
Vedantam, R.4
Gupta, S.5
Dollr, P.6
Zitnick, C.L.7
-
4
-
-
84887345951
-
Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
-
Das, P., Xu, C., Doell, R., Corso, J.: Thousand frames in just a few words: lingual description of videos through latent topics and sparse object stitching. In: CVPR (2013)
-
(2013)
CVPR
-
-
Das, P.1
Xu, C.2
Doell, R.3
Corso, J.4
-
5
-
-
84952349296
-
-
arXiv:1505.01809
-
Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., Mitchell, M.: Language models for image captioning: the quirks and what works (2015). arXiv:1505.01809
-
(2015)
Language Models for Image Captioning: The Quirks and What Works
-
-
Devlin, J.1
Cheng, H.2
Fang, H.3
Gupta, S.4
Deng, L.5
He, X.6
Zweig, G.7
Mitchell, M.8
-
6
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
84906929591
-
Image description using visual dependency representations
-
Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292-1302 (2013)
-
(2013)
EMNLP
, pp. 1292-1302
-
-
Elliott, D.1
Keller, F.2
-
8
-
-
84959250180
-
From captions to visual concepts and back
-
Fang, H., Gupta, S., Iandola, F.N., Srivastava, R., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J.C., Zitnick, C.L., Zweig, G.: From captions to visual concepts and back. In: CVPR (2015)
-
(2015)
CVPR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollar, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
9
-
-
78149311145
-
Every picture tells a story: Generating sentences from images
-
In: Daniilidis, K., Maragos, P., Paragios, N. (eds.), Springer, Heidelberg
-
Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 15-29. Springer, Heidelberg (2010)
-
(2010)
ECCV 2010, Part IV. LNCS
, vol.6314
, pp. 15-29
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
11
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
-
Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Venugopalan, S., Mooney, R., Darrell, T., Saenko, K.: Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition. In: ICCV (2013)
-
(2013)
ICCV
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
12
-
-
84867720412
-
-
arXiv:1207.0580
-
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580
-
(2012)
Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors
-
-
Hinton, G.E.1
Srivastava, N.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.R.5
-
14
-
-
84924803045
-
LSDA: Large scale detection through adaptation
-
Hoffman, J., Guadarrama, S., Tzeng, E., Donahue, J., Girshick, R., Darrell, T., Saenko, K.: LSDA: large scale detection through adaptation. In: NIPS (2014)
-
(2014)
NIPS
-
-
Hoffman, J.1
Guadarrama, S.2
Tzeng, E.3
Donahue, J.4
Girshick, R.5
Darrell, T.6
Saenko, K.7
-
15
-
-
84913555165
-
-
arXiv:1408.5093
-
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding (2014). arXiv:1408.5093
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
16
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
17
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. TACL (2015)
-
(2015)
TACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
18
-
-
0036843382
-
Natural language description of human activities from video images based on concept hierarchy of actions
-
Kojima, A., Tamura, T., Fukunaga, K.: Natural language description of human activities from video images based on concept hierarchy of actions. IJCV 50(2), 171-184 (2002)
-
(2002)
IJCV
, vol.50
, Issue.2
, pp. 171-184
-
-
Kojima, A.1
Tamura, T.2
Fukunaga, K.3
-
19
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Baby talk: understanding and generating simple image descriptions. In: CVPR (2011)
-
(2011)
CVPR
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
20
-
-
84934873221
-
Treetalk: Composition and compression of trees for image descriptions
-
Kuznetsova, P., Ordonez, V., Berg, T.L., Hill, U.C., Choi, Y.: Treetalk: composition and compression of trees for image descriptions. In: TACL (2014)
-
(2014)
TACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.L.3
Hill, U.C.4
Choi, Y.5
-
21
-
-
85107661995
-
Meteor universal: Language specific translation evaluation for any target language
-
Lavie, M.D.A.: Meteor universal: language specific translation evaluation for any target language. In: ACL 2014, p. 376 (2014)
-
(2014)
ACL 2014
, pp. 376
-
-
Lavie, M.D.A.1
-
22
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (M-RNN)
-
Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (m-RNN). In: ICLR (2015)
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
23
-
-
85034832841
-
Midge: Generating image descriptions from computer vision detections
-
Mitchell, M., Dodge, J., Goyal, A., Yamaguchi, K., Stratos, K., Han, X., Mensch, A., Berg, A.C., Berg, T.L., Daume, H.: Midge: generating image descriptions from computer vision detections. In: EACL (2012)
-
(2012)
EACL
-
-
Mitchell, M.1
Dodge, J.2
Goyal, A.3
Yamaguchi, K.4
Stratos, K.5
Han, X.6
Mensch, A.7
Berg, A.C.8
Berg, T.L.9
Daume, H.10
-
24
-
-
84952349300
-
-
arXiv:1505.01861
-
Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y.: Jointly modeling embedding and translation to bridge video and language (2015). arXiv:1505.01861
-
(2015)
Jointly Modeling Embedding and Translation to Bridge Video and Language
-
-
Pan, Y.1
Mei, T.2
Yao, T.3
Li, H.4
Rui, Y.5
-
25
-
-
85133336275
-
BLEU: A method for automatic evaluation of machine translation
-
Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.J.4
-
26
-
-
84908670256
-
Coherent multi-sentence video description with variable level of detail
-
In: Jiang, X., Hornegger, J., Koch, R. (eds.), Springer, Heidelberg
-
Rohrbach, A., Rohrbach, M., Qiu, W., Friedrich, A., Pinkal, M., Schiele, B.: Coherent multi-sentence video description with variable level of detail. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 184-195. Springer, Heidelberg (2014)
-
(2014)
GCPR 2014. LNCS
, vol.8753
, pp. 184-195
-
-
Rohrbach, A.1
Rohrbach, M.2
Qiu, W.3
Friedrich, A.4
Pinkal, M.5
Schiele, B.6
-
28
-
-
84959211977
-
A dataset for movie description
-
Rohrbach, A., Rohrbach, M., Tandon, N., Schiele, B.: A dataset for movie description. In: CVPR (2015)
-
(2015)
CVPR
-
-
Rohrbach, A.1
Rohrbach, M.2
Tandon, N.3
Schiele, B.4
-
29
-
-
84898775239
-
Translating video content to natural language descriptions
-
Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., Schiele, B.: Translating video content to natural language descriptions. In: ICCV (2013)
-
(2013)
ICCV
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
30
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
Thomason, J., Venugopalan, S., Guadarrama, S., Saenko, K., Mooney, R.J.: Integrating language and vision to generate natural language descriptions of videos in the wild. In: COLING (2014)
-
(2014)
COLING
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.J.5
-
31
-
-
84952349304
-
-
arXiv:1503.01070v1
-
Torabi, A., Pal, C., Larochelle, H., Courville, A.: Using descriptive video services to create a large data source for video annotation research (2015). arXiv:1503.01070v1
-
-
-
Torabi, A.1
Pal, C.2
Larochelle, H.3
Courville, A.4
-
32
-
-
84956980995
-
Cider: Consensus-based image description evaluation
-
Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: Consensus-based image description evaluation. In: CVPR (2015)
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
33
-
-
84952349305
-
-
arXiv:1505.00487
-
Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence - video to text (2015). arXiv:1505.00487
-
(2015)
Sequence to Sequence - Video to Text
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.4
Darrell, T.5
Saenko, K.6
-
34
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko, K.: Translating videos to natural language using deep recurrent neural networks. In: NAACL (2015)
-
(2015)
NAACL
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
35
-
-
84946747440
-
Show and tell: A neural image caption generator
-
Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image caption generator. In: CVPR (2015)
-
(2015)
CVPR
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
36
-
-
84898805910
-
Action recognition with improved trajectories
-
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)
-
(2013)
ICCV
-
-
Wang, H.1
Schmid, C.2
-
37
-
-
84952349307
-
Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
-
Xu, R., Xiong, C., Chen, W., Corso, J.J.: Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In: AAAI (2015)
-
(2015)
AAAI
-
-
Xu, R.1
Xiong, C.2
Chen, W.3
Corso, J.J.4
-
38
-
-
84952349308
-
-
arXiv:1502.08029v4
-
Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure (2015). arXiv:1502.08029v4
-
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
39
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL 2, 67-78 (2014)
-
(2014)
TACL
, vol.2
, pp. 67-78
-
-
Young, P.1
Lai, A.2
Hodosh, M.3
Hockenmaier, J.4
-
40
-
-
84937964578
-
Learning Deep Features for Scene Recognition using Places Database
-
Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning Deep Features for Scene Recognition using Places Database. In: NIPS (2014)
-
(2014)
NIPS
-
-
Zhou, B.1
Lapedriza, A.2
Xiao, J.3
Torralba, A.4
Oliva, A.5
|