-
1
-
-
77951155435
-
Video2text: Learning to annotate video content
-
H. Aradhye, G. Toderici, and J. Yagnik. Video2text: Learning to annotate video content. In ICDMW, 2009.
-
(2009)
ICDMW
-
-
Aradhye, H.1
Toderici, G.2
Yagnik, J.3
-
2
-
-
35048833329
-
High accuracy optical flow estimation based on a theory for warping
-
T. Brox, A. Bruhn, N. Papenberg, and J. Weickert. High accuracy optical flow estimation based on a theory for warping. In ECCV, pages 25-36, 2004.
-
(2004)
ECCV
, pp. 25-36
-
-
Brox, T.1
Bruhn, A.2
Papenberg, N.3
Weickert, J.4
-
3
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
-
(2011)
ACL
-
-
Chen, D.L.1
Dolan, W.B.2
-
4
-
-
84952349295
-
-
arXiv:1504. 00325
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dol-lar, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. ArXiv:1504. 00325, 2015.
-
(2015)
Microsoft COCO Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dol-Lar, P.6
Zitnick, C.L.7
-
5
-
-
84957029470
-
Learning a recurrent visual representation for image caption generation
-
X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. CVPR, 2015.
-
(2015)
CVPR
-
-
Chen, X.1
Zitnick, C.L.2
-
7
-
-
85107661995
-
Meteor universal: Language specific translation evaluation for any target language
-
M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL, 2014.
-
(2014)
EACL
-
-
Denkowski, M.1
Lavie, A.2
-
8
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
10
-
-
84919832465
-
Towards end-to-end speech recognition with recurrent neural networks
-
A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In ICML, 2014.
-
(2014)
ICML
-
-
Graves, A.1
Jaitly, N.2
-
11
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
-
S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, R. Mooney, T. Darrell, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition. In ICCV, 2013.
-
(2013)
ICCV
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
13
-
-
84906494296
-
From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
-
P. Hodosh, A. Young, M. Lai, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. In TACL, 2014.
-
(2014)
TACL
-
-
Hodosh, P.1
Young, A.2
Lai, M.3
Hockenmaier, J.4
-
14
-
-
85072312550
-
A multi-modal clustering method for web videos
-
H. Huang, Y. Lu, F. Zhang, and S. Sun. A multi-modal clustering method for web videos. In ISCTCS. 2013.
-
(2013)
ISCTCS.
-
-
Huang, H.1
Lu, Y.2
Zhang, F.3
Sun, S.4
-
15
-
-
84913580146
-
Caffe: Convolutional architecture for fast feature embedding
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. ACMMM, 2014.
-
(2014)
ACMMM
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
16
-
-
84946734827
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. CVPR, 2015.
-
(2015)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
19
-
-
84893398951
-
Generating natural-language video descriptions using text-mined knowledge
-
July
-
N. Krishnamoorthy, G. Malkarnenkar, R. J. Mooney, K. Saenko, and S. Guadarrama. Generating natural-language video descriptions using text-mined knowledge. In AAAI, July 2013.
-
(2013)
AAAI
-
-
Krishnamoorthy, N.1
Malkarnenkar, G.2
Mooney, R.J.3
Saenko, K.4
Guadarrama, S.5
-
20
-
-
84934873221
-
Treetalk: Composition and compression of trees for image descriptions
-
P. Kuznetsova, V. Ordonez, T. L. Berg, U. C. Hill, and Y. Choi. Treetalk: Composition and compression of trees for image descriptions. In TACL, 2014.
-
(2014)
TACL
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.L.3
Hill, U.C.4
Choi, Y.5
-
22
-
-
84937834115
-
Microsoft coco: Common objects in context
-
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
23
-
-
84939821073
-
-
arXiv:1412. 6632
-
J. Mao, W. Xu, Y. Yang, J. Wang, and A. L. Yuille. Deep captioning with multimodal recurrent neural networks (mrnn). ArXiv:1412. 6632, 2014.
-
(2014)
Deep Captioning with Multimodal Recurrent Neural Networks (Mrnn)
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Yuille, A.L.5
-
24
-
-
84959228762
-
Beyond short snippets: Deep networks for video classification
-
J. Y. Ng, M. J. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. CVPR, 2015.
-
(2015)
CVPR
-
-
Ng, J.Y.1
Hausknecht, M.J.2
Vijayanarasimhan, S.3
Vinyals, O.4
Monga, R.5
Toderici, G.6
-
25
-
-
84905274625
-
TRECVID 2012-an overview of the goals, tasks, data, evaluation mechanisms and metrics
-
P. Over, G. Awad, M. Michel, J. Fiscus, G. Sanders, B. Shaw, A. F. Smeaton, and G. Quéenot. TRECVID 2012-an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2012, 2012.
-
(2012)
Proceedings of TRECVID 2012
-
-
Over, P.1
Awad, G.2
Michel, M.3
Fiscus, J.4
Sanders, G.5
Shaw, B.6
Smeaton, A.F.7
Quéenot, G.8
-
26
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002.
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
29
-
-
84898775239
-
Translating video content to natural language descriptions
-
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
-
(2013)
ICCV
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
30
-
-
84909978410
-
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. ILSVRC, 2014.
-
(2014)
ILSVRC
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
31
-
-
84937862424
-
Two-stream convolutional networks for action recognition in videos
-
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
-
(2014)
NIPS
-
-
Simonyan, K.1
Zisserman, A.2
-
33
-
-
84969544782
-
Unsupervised learning of video representations using LSTMs
-
N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. ICML, 2015.
-
(2015)
ICML
-
-
Srivastava, N.1
Mansimov, E.2
Salakhutdinov, R.3
-
34
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, 2014.
-
(2014)
NIPS
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.3
-
35
-
-
84937522268
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR, 2015.
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
36
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. J. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, 2014.
-
(2014)
COLING
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.J.5
-
38
-
-
84956980995
-
CIDEr: Consensus-based image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. CVPR, 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
39
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In NAACL, 2015.
-
(2015)
NAACL
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
41
-
-
84898805910
-
Action recognition with improved trajectories
-
IEEE
-
H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, pages 3551-3558. IEEE, 2013.
-
(2013)
ICCV
, pp. 3551-3558
-
-
Wang, H.1
Schmid, C.2
-
42
-
-
77954177620
-
Multimodal fusion for video search reranking
-
S. Wei, Y. Zhao, Z. Zhu, and N. Liu. Multimodal fusion for video search reranking. TKDE, 2010.
-
(2010)
TKDE
-
-
Wei, S.1
Zhao, Y.2
Zhu, Z.3
Liu, N.4
-
43
-
-
84965160010
-
-
arXiv:1502. 08029v4
-
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. ArXiv:1502. 08029v4, 2015.
-
(2015)
Describing Videos by Exploiting Temporal Structure
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
|