-
4
-
-
84885996388
-
Video in sentences out
-
Andrei Barbu, Alexander Bridge, Zachary Burchill, Dan Coroian, Sven Dickinson, Sanja Fidler, Aaron Michaux, Sam Mussman, Siddharth Narayanaswamy, Dhaval Salvi, Lara Schmidt, Jiangnan Shangguan, Jeffrey Mark Siskind, Jarrell Waggoner, Song Wang, Jinlian Wei, Yifan Yin, and Zhiqi Zhang. 2012. Video in sentences out. In Association for Uncertainty in Artificial Intelligence (UAI).
-
(2012)
Association for Uncertainty in Artificial Intelligence (UAI)
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Narayanaswamy, S.9
Salvi, D.10
Schmidt, L.11
Shangguan, J.12
Mark Siskind, J.13
Waggoner, J.14
Wang, S.15
Wei, J.16
Yin, Y.17
Zhang, Z.18
-
9
-
-
84864139941
-
Beyond audio and video retrieval: Towards multimedia summarization
-
ACM
-
D. Ding, F. Metze, S. Rawat, P.F. Schulam, S. Burger, E. Younessian, L. Bao, M.G. Christel, and A. Hauptmann. 2012. Beyond audio and video retrieval: towards multimedia summarization. In Proceedings of the 2nd ACM International Conference on Multimedia Retrieval (ICMR). ACM.
-
(2012)
Proceedings of the 2nd ACM International Conference on Multimedia Retrieval (ICMR)
-
-
Ding, D.1
Metze, F.2
Rawat, S.3
Schulam, P.F.4
Burger, S.5
Younessian, E.6
Bao, L.7
Christel, M.G.8
Hauptmann, A.9
-
10
-
-
84904482223
-
-
arXiv preprint arXiv:1310.1531
-
Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2013. Decaf: A deep convolutional activation feature for generic visual recognition. arXiv preprint arXiv:1310.1531.
-
(2013)
Decaf: A Deep Convolutional Activation Feature for Generic Visual Recognition
-
-
Donahue, J.1
Jia, Y.2
Vinyals, O.3
Hoffman, J.4
Zhang, N.5
Tzeng, E.6
Darrell, T.7
-
11
-
-
84946802546
-
Long-term recurrent convolutional networks for visual recognition and description
-
abs/1411.4389
-
Jeff Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2014. Long-term recurrent convolutional networks for visual recognition and description. CoRR, abs/1411.4389.
-
(2014)
CoRR
-
-
Donahue, J.1
Anne Hendricks, L.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
13
-
-
84946802531
-
From captions to visual concepts and back
-
abs/1411.4952
-
Hao Fang, Saurabh Gupta, Forrest N. Iandola, Rupesh Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C. Platt, C. Lawrence Zitnick, and Geoffrey Zweig. 2014. From captions to visual concepts and back. CoRR, abs/1411.4952.
-
(2014)
CoRR
-
-
Fang, H.1
Gupta, S.2
Iandola, F.N.3
Srivastava, R.4
Deng, L.5
Dollár, P.6
Gao, J.7
He, X.8
Mitchell, M.9
Platt, J.C.10
Zitnick, C.L.11
Zweig, G.12
-
14
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. 2010. Every picture tells a story: Generating sentences from images. European Conference on Computer Vision (ECCV).
-
(2010)
European Conference on Computer Vision (ECCV)
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
16
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
-
December
-
Sergio Guadarrama, Niveda Krishnamoorthy, Girish Malkarnenkar, Subhashini Venugopalan, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2013. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In IEEE International Conference on Computer Vision (ICCV), December.
-
(2013)
IEEE International Conference on Computer Vision (ICCV)
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, R.5
Darrell, T.6
Saenko, K.7
-
21
-
-
84913555165
-
-
arXiv preprint arXiv:1408.5093
-
Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093.
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
26
-
-
0036843382
-
Natural language description of human activities from video images based on concept hierarchy of actions
-
A. Kojima, T. Tamura, and K. Fukunaga. 2002. Natural language description of human activities from video images based on concept hierarchy of actions. International Journal of Computer Vision (IJCV), 50(2).
-
(2002)
International Journal of Computer Vision (IJCV)
, vol.50
, Issue.2
-
-
Kojima, A.1
Tamura, T.2
Fukunaga, K.3
-
29
-
-
80052901011
-
Baby talk: Understanding and generating simple image descriptions
-
Girish Kulkarni, Visruth Premraj, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. 2011. Baby talk: Understanding and generating simple image descriptions. In Conference on Computer Vision and Pattern Recognition (CVPR).
-
(2011)
Conference on Computer Vision and Pattern Recognition (CVPR)
-
-
Kulkarni, G.1
Premraj, V.2
Dhar, S.3
Li, S.4
Choi, Y.5
Berg, A.C.6
Berg, T.L.7
-
30
-
-
84934873221
-
Treetalk: Composition and compression of trees for image descriptions
-
Polina Kuznetsova, Vicente Ordonez, Tamara L Berg, UNC Chapel Hill, and Yejin Choi. 2014. Treetalk: Composition and compression of trees for image descriptions. Transactions of the Association for Computational Linguistics, 2(10).
-
(2014)
Transactions of the Association for Computational Linguistics
, vol.2
, Issue.10
-
-
Kuznetsova, P.1
Ordonez, V.2
Berg, T.L.3
Choi, Y.4
-
32
-
-
84906505935
-
-
arXiv preprint arXiv:1405.0312
-
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft COCO: Common objects in context. arXiv preprint arXiv:1405.0312.
-
(2014)
Microsoft COCO: Common Objects in Context
-
-
Lin, T.-Y.1
Maire, M.2
Belongie, S.3
Hays, J.4
Perona, P.5
Ramanan, D.6
Dollár, P.7
Zitnick, C.L.8
-
35
-
-
84905274625
-
TRECVID 2012 - An overview of the goals, tasks, data, evaluation mechanisms and metrics
-
Paul Over, George Awad, Martial Michel, Jonathan Fiscus, Greg Sanders, B Shaw, Alan F. Smeaton, and Georges Quéenot. 2012. TRECVID 2012 - an overview of the goals, tasks, data, evaluation mechanisms and metrics. In Proceedings of TRECVID 2012.
-
(2012)
Proceedings of TRECVID 2012
-
-
Over, P.1
Awad, G.2
Michel, M.3
Fiscus, J.4
Sanders, G.5
Shaw, B.6
Smeaton, A.F.7
Quéenot, G.8
-
37
-
-
84898775239
-
Translating video content to natural language descriptions
-
Marcus Rohrbach, Wei Qiu, Ivan Titov, Stefan Thater, Manfred Pinkal, and Bernt Schiele. 2013. Translating video content to natural language descriptions. In IEEE International Conference on Computer Vision (ICCV).
-
(2013)
IEEE International Conference on Computer Vision (ICCV)
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
38
-
-
84960170289
-
Coherent multi-sentence video description with variable level of detail
-
September
-
Anna Rohrbach, Marcus Rohrbach, Wei Qiu, Annemarie Friedrich, Manfred Pinkal, and Bernt Schiele. 2014. Coherent multi-sentence video description with variable level of detail. In German Conference on Pattern Recognition (GCPR), September.
-
(2014)
German Conference on Pattern Recognition (GCPR)
-
-
Rohrbach, A.1
Rohrbach, M.2
Qiu, W.3
Friedrich, A.4
Pinkal, M.5
Schiele, B.6
-
39
-
-
84909978410
-
-
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2014. ImageNet Large Scale Visual Recognition Challenge.
-
(2014)
ImageNet Large Scale Visual Recognition Challenge
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
41
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
August
-
J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R.J. Mooney. 2014. Integrating language and vision to generate natural language descriptions of videos in the wild. In Proceedings of the 25th International Conference on Computational Linguistics (COLING), August.
-
(2014)
Proceedings of the 25th International Conference on Computational Linguistics (COLING)
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.J.5
-
42
-
-
84951910303
-
Show and tell: A neural image caption generator
-
abs/1411.4555
-
Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2014. Show and tell: A neural image caption generator. CoRR, abs/1411.4555.
-
(2014)
CoRR
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
43
-
-
77954177620
-
Multimodal fusion for video search reranking
-
Shikui Wei, Yao Zhao, Zhenfeng Zhu, and Nan Liu. 2010. Multimodal fusion for video search reranking. IEEE Transactions on Knowledge and Data Engineering, 22(8).
-
(2010)
IEEE Transactions on Knowledge and Data Engineering
, vol.22
, Issue.8
-
-
Wei, S.1
Zhao, Y.2
Zhu, Z.3
Liu, N.4
-
45
-
-
77954862144
-
I2t: Image parsing to text description
-
B.Z. Yao, X. Yang, L. Lin, M.W. Lee, and S.C. Zhu. 2010. I2t: Image parsing to text description. Proceedings of the IEEE, 98(8).
-
(2010)
Proceedings of the IEEE
, vol.98
, Issue.8
-
-
Yao, B.Z.1
Yang, X.2
Lin, L.3
Lee, M.W.4
Zhu, S.C.5
|