-
1
-
-
84885996388
-
Video in sentences out
-
Barbu, A.; Bridge, A.; Burchill, Z.; Coroian, D.; Dickinson, S.; Fidler, S.; Michaux, A.; Mussman, S.; Narayanaswamy, S.; Salvi, D.; Schmidt, L.; Shangguan, J.; rey Mark Siskind, 1; Waggoner, J.; Wang, S.; Wei, J.; Yin, Y; and Zhang, Z. 2012. Video in sentences out. In UAL
-
(2012)
UAL
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Narayanaswamy, S.9
Salvi, D.10
Schmidt, L.11
Shangguan, J.12
Rey Mark Siskind, J.13
Waggoner, J.14
Wang, S.15
Wei, J.16
Yin, Y.17
Zhang, Z.18
-
2
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
Chen, D. L., and Dolan, W. B. 2011. Collecting highly parallel data for paraphrase evaluation. In ACL.
-
(2011)
ACL
-
-
Chen, D.L.1
Dolan, W.B.2
-
3
-
-
84887345951
-
A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
-
Das, P.; Xu, C; Doell, R. F.; and Corso, J. J. 2013. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR.
-
(2013)
CVPR
-
-
Das, P.1
Xu, C.2
Doell, R.F.3
Corso, J.J.4
-
4
-
-
84874280480
-
Translating related words to videos and back through latent topics
-
Das, P.; Srihari, R. K.; and Corso, J. J. 2013. Translating related words to videos and back through latent topics. In WS DM.
-
(2013)
WS DM
-
-
Das, P.1
Srihari, R.K.2
Corso, J.J.3
-
5
-
-
84946590544
-
Construction and analysis of a large scale image ontology
-
Deng, J.; Li, K.; Do, M.; Su, H.; and Fei-Fei, L. 2009. Construction and analysis of a large scale image ontology. In Vision Science Society.
-
(2009)
Vision Science Society
-
-
Deng, J.1
Li, K.2
Do, M.3
Su, H.4
Fei-Fei, L.5
-
6
-
-
84904482223
-
-
arXiv:1310.1531
-
Donahua, J.; Jia, Y.; Vinyals, O.; Hoffman, J.; Zhang, N.; Tzeng, E.; and Darrell, T. 2013. Decaf: A deep convoiutional activation feature for generic visual recognition. In arXiv:1310.1531.
-
(2013)
Decaf: A Deep Convoiutional Activation Feature for Generic Visual Recognition
-
-
Donahua, J.1
Jia, Y.2
Vinyals, O.3
Hoffman, J.4
Zhang, N.5
Tzeng, E.6
Darrell, T.7
-
8
-
-
84898958665
-
Devise: A deep visual-semantic embedding model
-
Frome, A.; Corrado, G. S.; Shlens, J.; Bengio, S.; Dean, J.; Mikolov, T.; et al. 2013. Devise: A deep visual-semantic embedding model. In NIPS.
-
(2013)
NIPS
-
-
Frome, A.1
Corrado, G.S.2
Shlens, J.3
Bengio, S.4
Dean, J.5
Mikolov, T.6
-
9
-
-
0029727454
-
Learning task-dependent distributed representations by backpropagation through structure
-
Goller, C, and Kuchler, A. 1996. Learning task-dependent distributed representations by backpropagation through structure. In International Conference on Neural Networks.
-
(1996)
International Conference on Neural Networks
-
-
Goller, C.1
Kuchler, A.2
-
10
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
-
Guadarrama, S.; Krishnamoorthy, N.; Malkarnenkar, G.; Venu-gopalan, S.; Mooney, R.; Darrell, T.; and Saenko, K. 2013. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV.
-
(2013)
ICCV
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venu-Gopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
11
-
-
70450202741
-
Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
-
Gupta, A.; Srinivasan, P.; Shi, J.; and Davis, L. S. 2009. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In CVPR.
-
(2009)
CVPR
-
-
Gupta, A.1
Srinivasan, P.2
Shi, J.3
Davis, L.S.4
-
12
-
-
84911364368
-
Large-scale video classification with convoiutional neural networks
-
Karpathy, A.; Toderici, G.; Shetty, S.; Leung, T.; Sukthankar, R.; and Fei-Fei, L. 2014. Large-scale video classification with convoiutional neural networks. In CVPR.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
13
-
-
84977906791
-
Accurate unlexicalized parsing
-
Klein, D., and Manning, C. D. 2013. Accurate unlexicalized parsing. In ACL.
-
(2013)
ACL
-
-
Klein, D.1
Manning, C.D.2
-
14
-
-
84893398951
-
Generating natural-language video descriptions using text-mined knowledge
-
Krishnamoorthy, N.; Malkarnenkar, G.; Mooney, R. J.; Saenko, K.; and Guadarrama, S. 2013. Generating natural-language video descriptions using text-mined knowledge. In AAAI.
-
(2013)
AAAI
-
-
Krishnamoorthy, N.1
Malkarnenkar, G.2
Mooney, R.J.3
Saenko, K.4
Guadarrama, S.5
-
15
-
-
84876231242
-
Lmagenet classification with deep convolutional neural networks
-
Krizhevsky, A.; Sutskever, I.; and Hinton, G. E. 2012. lmagenet classification with deep convolutional neural networks. In NIPS.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
16
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
Kulkarni, G.; Premraj, V.; Dhar, S.; Li, S.; Choi, Y.; Berg, A. C; and Berg, T. L. 2011. Baby talk: Understanding and generating simple image descriptions. In CVPR.
-
(2011)
CVPR
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
17
-
-
84856653481
-
Object bank: A high-level image representation for scene classification and semantic feature sparsification
-
Li, L.-.I.; Su, H.; Xing, E. P.; and Fei-Fei, L. 2011. Object bank: A high-level image representation for scene classification and semantic feature sparsification. In NIPS.
-
(2011)
NIPS
-
-
Li, L.-I.1
Su, H.2
Xing, E.P.3
Fei-Fei, L.4
-
18
-
-
84898956512
-
Distributed representations of words and phrases and their compositionality
-
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; and Dean, J. 2013. Distributed representations of words and phrases and their compositionality. In MPS.
-
(2013)
MPS
-
-
Mikolov, T.1
Sutskever, I.2
Chen, K.3
Corrado, G.4
Dean, J.5
-
19
-
-
84976702763
-
Wordnet: A lexical database for english
-
Miller, G. A. 1995. Wordnet: A lexical database for english. In Communications of the ACM, 39-41.
-
(1995)
Communications of the ACM
, pp. 39-41
-
-
Miller, G.A.1
-
20
-
-
84959182849
-
Improving video activity recognition using object recognition and text mining
-
Motwani, T., and Mooney, R. 2012. improving video activity recognition using object recognition and text mining. In ECAL.
-
(2012)
ECAL
-
-
Motwani, T.1
Mooney, R.2
-
22
-
-
84898465467
-
Evaluation of dimensionality reduction methods for image auto-annotation
-
Nakayama, H.; Harada, T.; and Kuniyoshi, Y. 2010. Evaluation of dimensionality reduction methods for image auto-annotation. In BMVC.
-
(2010)
BMVC
-
-
Nakayama, H.1
Harada, T.2
Kuniyoshi, Y.3
-
23
-
-
33645236134
-
-
Word-net: similarity - Measuring the relatedness of concepts
-
Pedersen, T.; Patwardhan, S.; and Michelizzi, J. 2004. Word-net: similarity - measuring the relatedness of concepts. In HLT-NAACL.
-
(2004)
HLT-NAACL
-
-
Pedersen, T.1
Patwardhan, S.2
Michelizzi, J.3
-
24
-
-
85123966307
-
Distributional clustering of english words
-
Pereira, F.; Tishby, N.; and Lee, L. 1993. Distributional clustering of english words. In ACL.
-
(1993)
ACL
-
-
Pereira, F.1
Tishby, N.2
Lee, L.3
-
25
-
-
84898775557
-
Video event understanding using natural language description
-
Ramanathan, V.; Liang, P.; and Fei-Fei, L. 2013. Video event understanding using natural language description. In ICCV.
-
(2013)
ICCV
-
-
Ramanathan, V.1
Liang, P.2
Fei-Fei, L.3
-
26
-
-
84898775239
-
Translating video content to natural language descriptions
-
Rohrbach, M.; Qiu, W.; Titov, I.; Thater, S.; Pinkal, M.; and Schiele, B. 2013. Translating video content to natural language descriptions. In ICCV.
-
(2013)
ICCV
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
27
-
-
84866718894
-
Action bank: A high-level representation of activity in video
-
Sadanand, S., and Corso, J. J. 2012. Action bank: A high-level representation of activity in video. In CVPR.
-
(2012)
CVPR
-
-
Sadanand, S.1
Corso, J.J.2
-
28
-
-
77955998009
-
Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
-
Socher, R., and Fei-Fei, L. 2010. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR.
-
(2010)
CVPR
-
-
Socher, R.1
Fei-Fei, L.2
-
29
-
-
84906925854
-
Grounded compositional semantics for finding and describing images with sentences
-
Socher, R.: Karpathy, A.; Le, Q. V.; Manning, C. D.; and Ng, A. Y. 2013. Grounded compositional semantics for finding and describing images with sentences. In Transactions of the ACL.
-
(2013)
Transactions of the ACL
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
31
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
Thomason, J.; Venugopalan, S.; Guadarrama, S.; Saenko, K.; and Mooney, R. 2014. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING.
-
(2014)
COLING
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.5
-
32
-
-
84887356306
-
Spatiotemporal de-formable part models for action detection
-
Tian, Y; Sukthankar, R.; and Shah, M. 2013. Spatiotemporal de-formable part models for action detection. In CVPR.
-
(2013)
CVPR
-
-
Tian, Y.1
Sukthankar, R.2
Shah, M.3
-
35
-
-
84897743886
-
Grounded language learning from video described with sentences
-
Yu, H., and Siskind, J. M. 2013. Grounded language learning from video described with sentences. In ACL.
-
(2013)
ACL
-
-
Yu, H.1
Siskind, J.M.2
-
36
-
-
84898795297
-
From ademes to action: A strongly-supervised representation for detailed action understanding
-
Zhang, W.; Zhu, M.; and Derpanis, K. G. 2013. From ademes to action: A strongly-supervised representation for detailed action understanding. In ICCV.
-
(2013)
ICCV
-
-
Zhang, W.1
Zhu, M.2
Derpanis, K.G.3
|