-
3
-
-
84885996388
-
Video in sentences out
-
A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, N. Siddharth, D. Salvi, L. Schmidt, J. Shangguan, J. M. Siskind, J. Waggoner, S. Wang, J. Wei, Y. Yin, and Z. Zhang. Video in sentences out. In Proceedings of the Conference on Uncertainty in Artificial Intelligence, pages 102-112, 2012.
-
(2012)
Proceedings of the Conference on Uncertainty in Artificial Intelligence
, pp. 102-112
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Siddharth, N.9
Salvi, D.10
Schmidt, L.11
Shangguan, J.12
Siskind, J.M.13
Waggoner, J.14
Wang, S.15
Wei, J.16
Yin, Y.17
Zhang, Z.18
-
4
-
-
84965179228
-
Scheduled sampling for sequence prediction with recurrent neural networks
-
S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In Advances in Neural Information Processing Systems, pages 1171-1179, 2015.
-
(2015)
Advances in Neural Information Processing Systems
, pp. 1171-1179
-
-
Bengio, S.1
Vinyals, O.2
Jaitly, N.3
Shazeer, N.4
-
7
-
-
84986240725
-
Microsoft COCO captions: Data collection and evaluation server
-
abs/1504.00325
-
X. Chen, H. Fang, T. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. CoRR, abs/1504.00325, 2015.
-
(2015)
CoRR
-
-
Chen, X.1
Fang, H.2
Lin, T.3
Vedantam, R.4
Gupta, S.5
Dollár, P.6
Zitnick, C.L.7
-
9
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
K. Cho, B. van Merrienboer, Ç Gülçehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Conference on Empirical Methods in Natural Language Processing, 2014.
-
(2014)
Conference on Empirical Methods in Natural Language Processing
-
-
Cho, K.1
Van Merrienboer, B.2
Gülçehre, Ç.3
Bougares, F.4
Schwenk, H.5
Bengio, Y.6
-
10
-
-
84887345951
-
A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
-
P. Das, C. Xu, R. F. Doell, and J. J. Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2634-2641, 2013.
-
(2013)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 2634-2641
-
-
Das, P.1
Xu, C.2
Doell, R.F.3
Corso, J.J.4
-
11
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015.
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
12
-
-
26444565569
-
Finding structure in time
-
J. L. Elman. Finding structure in time. COGNITIVE SCIENCE, 14(2):179-211, 1990.
-
(1990)
COGNITIVE SCIENCE
, vol.14
, Issue.2
, pp. 179-211
-
-
Elman, J.L.1
-
14
-
-
84898773262
-
Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition
-
S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, S. Venugopalan, T. D. R. Mooney, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shot recognition. In ICCV'13 Int. Conf. on Computer Vision 2013, December 2013.
-
(2013)
ICCV'13 Int. Conf. on Computer Vision 2013, December
-
-
Guadarrama, S.1
Krishnamoorthy, N.2
Malkarnenkar, G.3
Venugopalan, S.4
Mooney, T.D.R.5
Saenko, K.6
-
16
-
-
0031573117
-
Long short-term memory
-
Nov.
-
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735-1780, Nov. 1997.
-
(1997)
Neural Comput.
, vol.9
, Issue.8
, pp. 1735-1780
-
-
Hochreiter, S.1
Schmidhuber, J.2
-
17
-
-
84865584175
-
Aggregating local image descriptors into compact codes
-
Sept.
-
H. Jegou, F. Perronnin, M. Douze, J. S&nchez, P. Perez, and C. Schmid. Aggregating local image descriptors into compact codes. IEEE Trans. Pattern Anal. Mach. Intell., 34(9):1704-1716, Sept. 2012.
-
(2012)
IEEE Trans. Pattern Anal. Mach. Intell.
, vol.34
, Issue.9
, pp. 1704-1716
-
-
Jegou, H.1
Perronnin, F.2
Douze, M.3
Pereznchez, P.4
Schmid, C.5
-
19
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
23
-
-
0036843382
-
Natural language description of human activities from video images based on concept hierarchy of actions
-
A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. International Journal of Computer Vision, 50(2):171-184, 2002.
-
(2002)
International Journal of Computer Vision
, vol.50
, Issue.2
, pp. 171-184
-
-
Kojima, A.1
Tamura, T.2
Fukunaga, K.3
-
24
-
-
84893398951
-
Generating natural-language video descriptions using text-mined knowledge
-
N. Krishnamoorthy, G. Malkarnenkar, R. J. Mooney, K. Saenko, and S. Guadarrama. Generating natural-language video descriptions using text-mined knowledge. In AAAI Conference on Artificial Intelligence, pages 541-547, 2013.
-
(2013)
AAAI Conference on Artificial Intelligence
, pp. 541-547
-
-
Krishnamoorthy, N.1
Malkarnenkar, G.2
Mooney, R.J.3
Saenko, K.4
Guadarrama, S.5
-
26
-
-
51849094354
-
Save: A framework for semantic annotation of visual events
-
M.W. Lee, A. Hakeem, N. Haering, and S.-C. Zhu. SAVE: A framework for semantic annotation of visual events. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1-8, 2008.
-
(2008)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops
, pp. 1-8
-
-
Lee, M.W.1
Hakeem, A.2
Haering, N.3
Zhu, S.-C.4
-
28
-
-
84959935599
-
Hierarchical recurrent neural network for document modeling
-
Sept.
-
R. Lin, S. Liu, M. Yang, M. Li, M. Zhou, and S. Li. Hierarchical recurrent neural network for document modeling. pages 899-907. Conference on Empirical Methods in Natural Language Processing, Sept. 2015.
-
(2015)
Conference on Empirical Methods in Natural Language Processing
, pp. 899-907
-
-
Lin, R.1
Liu, S.2
Yang, M.3
Li, M.4
Zhou, M.5
Li, S.6
-
29
-
-
85083950512
-
Deep captioning with multimodal recurrent neural networks (m-rnn)
-
J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). ICLR, 2015.
-
(2015)
ICLR
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.6
-
30
-
-
84965160495
-
-
J. Mao,W. Xu, Y. Yang, J.Wang, Z. Huang, and A. L. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. 2015.
-
(2015)
Learning Like A Child: Fast Novel Visual Concept Learning from Sentence Descriptions of Images
-
-
Mao, J.1
Xu, W.2
Yang, Y.3
Wang, J.4
Huang, Z.5
Yuille, A.L.6
-
31
-
-
77956509090
-
Rectified linear units improve restricted boltzmann machines
-
V. Nair and G. E. Hinton. Rectified linear units improve restricted boltzmann machines. In ICML, pages 807-814, 2010.
-
(2010)
ICML
, pp. 807-814
-
-
Nair, V.1
Hinton, G.E.2
-
32
-
-
85060437486
-
Jointly modeling embedding and translation to bridge video and language
-
abs/1505.01861
-
Y. Pan, T. Mei, T. Yao, H. Li, and Y. Rui. Jointly modeling embedding and translation to bridge video and language. CoRR, abs/1505.01861, 2015.
-
(2015)
CoRR
-
-
Pan, Y.1
Mei, T.2
Yao, T.3
Li, H.4
Rui, Y.5
-
33
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W. jing Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002.
-
(2002)
ACL
, pp. 311-318
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Jing Zhu, W.4
-
35
-
-
84994149921
-
Sequence level training with recurrent neural networks
-
M. Ranzato, S. Chopra, M. Auli, and W. Zaremba. Sequence level training with recurrent neural networks. CoRR, abs/1511.06732, 2015.
-
(2015)
CoRR, abs/1511.06732
-
-
Ranzato, M.1
Chopra, S.2
Auli, M.3
Zaremba, W.4
-
36
-
-
84960170289
-
Coherent multi-sentence video description with variable level of detail
-
A. Rohrbach, M. Rohrbach,W. Qiu, A. Friedrich, M. Pinkal, and B. Schiele. Coherent multi-sentence video description with variable level of detail. In German Conference on Pattern Recognition (GCPR), September 2014.
-
(2014)
German Conference on Pattern Recognition (GCPR), September
-
-
Rohrbach, A.1
Rohrbach, M.2
Qiu, W.3
Friedrich, A.4
Pinkal, M.5
Schiele, B.6
-
37
-
-
84898775239
-
Translating video content to natural language descriptions
-
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In Proceedings of the IEEE International Conference on Computer Vision, pages 433-440, 2013.
-
(2013)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 433-440
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
38
-
-
84947041871
-
Imagenet large scale visual recognition challenge
-
Apr.
-
O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV), pages 1-42, Apr. 2015.
-
(2015)
International Journal of Computer Vision (IJCV)
, pp. 1-42
-
-
Russakovsky, O.1
Deng, J.2
Su, H.3
Krause, J.4
Satheesh, S.5
Ma, S.6
Huang, Z.7
Karpathy, A.8
Khosla, A.9
Bernstein, M.10
Berg, A.C.11
Fei-Fei, L.12
-
41
-
-
84904163933
-
Dropout: A simple way to prevent neural networks from overfitting
-
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929-1958, 2014.
-
(2014)
Journal of Machine Learning Research, 15:1929-1958
-
-
Srivastava, N.1
Hinton, G.2
Krizhevsky, A.3
Sutskever, I.4
Salakhutdinov, R.5
-
45
-
-
84969504307
-
C3D: Generic features for video analysis
-
D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: generic features for video analysis. In Proceedings of the IEEE International Conference on Computer Vision, 2015.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
-
-
Tran, D.1
Bourdev, L.D.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
47
-
-
84973882730
-
Sequence to sequence-video to text
-
S. Venugopalan, M. Rohrbach, J. Donahue, R. J. Mooney, T. Darrell, and K. Saenko. Sequence to sequence-video to text. In Proceedings of the IEEE International Conference on Computer Vision, pages 4534-4542, 2015.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 4534-4542
-
-
Venugopalan, S.1
Rohrbach, M.2
Donahue, J.3
Mooney, R.J.4
Darrell, T.5
Saenko, K.6
-
48
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. In Proceedings of the North American Chapter of the Association for Computational Linguistics, pages 1494-1504, 2015.
-
(2015)
Proceedings of the North American Chapter of the Association for Computational Linguistics
, pp. 1494-1504
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.J.5
Saenko, K.6
-
50
-
-
84946747440
-
Show and tell: A neural image caption generator
-
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3156-3164, 2015.
-
(2015)
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pp. 3156-3164
-
-
Vinyals, O.1
Toshev, A.2
Bengio, S.3
Erhan, D.4
-
51
-
-
80052877143
-
Action recognition by dense trajectories
-
June
-
H. Wang, A. Kläser, C. Schmid, and C.-L. Liu. Action Recognition by Dense Trajectories. In IEEE Conference on Computer Vision & Pattern Recognition, pages 3169-3176, June 2011.
-
(2011)
IEEE Conference on Computer Vision & Pattern Recognition
, pp. 3169-3176
-
-
Wang, H.1
Kläser, A.2
Schmid, C.3
Liu, C.-L.4
-
52
-
-
84959897734
-
Semantically conditioned lstm-based natural language generation for spoken dialogue systems
-
T. Wen, M. Gasic, N. Mrksic, P. Su, D. Vandyke, and S. J. Young. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In Conference on Empirical Methods in Natural Language Processing, 2015.
-
(2015)
Conference on Empirical Methods in Natural Language Processing
-
-
Wen, T.1
Gasic, M.2
Mrksic, N.3
Su, P.4
Vandyke, D.5
Young, S.J.6
-
53
-
-
0025503558
-
Backpropagation through time: What does it do and how to do it
-
P. Werbos. Backpropagation through time: what does it do and how to do it. In Proceedings of IEEE, volume 78, pages 1550-1560, 1990.
-
(1990)
Proceedings of IEEE, Volume 78
, pp. 1550-1560
-
-
Werbos, P.1
-
54
-
-
84980404991
-
A multi-scale multiple instance video description network
-
abs/1505.05914
-
H. Xu, S. Venugopalan, V. Ramanishka, M. Rohrbach, and K. Saenko. A multi-scale multiple instance video description network. CoRR, abs/1505.05914, 2015.
-
(2015)
CoRR
-
-
Xu, H.1
Venugopalan, S.2
Ramanishka, V.3
Rohrbach, M.4
Saenko, K.5
-
56
-
-
84973884896
-
Describing videos by exploiting temporal structure
-
L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In Proceedings of the IEEE International Conference on Computer Vision, pages 4507-4515, 2015.
-
(2015)
Proceedings of the IEEE International Conference on Computer Vision
, pp. 4507-4515
-
-
Yao, L.1
Torabi, A.2
Cho, K.3
Ballas, N.4
Pal, C.5
Larochelle, H.6
Courville, A.7
-
57
-
-
84961226145
-
Learning to describe video with weak supervision by exploiting negative sentential information
-
Jan.
-
H. Yu and J. M. Siskind. Learning to describe video with weak supervision by exploiting negative sentential information. In AAAI Conference on Artificial Intelligence, pages 3855-3863, Jan. 2015.
-
(2015)
AAAI Conference on Artificial Intelligence
, pp. 3855-3863
-
-
Yu, H.1
Siskind, J.M.2
|