-
1
-
-
85083953689
-
Neural machine translation by jointly learning to align and translate
-
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2015.
-
(2015)
ICLR
-
-
Bahdanau, D.1
Cho, K.2
Bengio, Y.3
-
2
-
-
84885996388
-
Video in sentences out
-
A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, et al. Video in sentences out. UAI, 2012.
-
(2012)
UAI
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Narayanaswamy, S.9
Salvi, D.10
-
3
-
-
84897544737
-
Theano: New features and speed improvements
-
F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, and Y. Bengio. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012Workshop, 2012.
-
(2012)
Deep Learning and Unsupervised Feature Learning NIPS 2012Workshop
-
-
Bastien, F.1
Lamblin, P.2
Pascanu, R.3
Bergstra, J.4
Goodfellow, I.J.5
Bergeron, A.6
Bouchard, N.7
Bengio, Y.8
-
4
-
-
84857819132
-
Theano: A CPU and GPU math expression compiler
-
J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. Theano: A CPU and GPU math expression compiler. In Pro-ceedings of the Python for Scientific Computing Conference (SciPy), 2010.
-
(2010)
Pro-ceedings of the Python for Scientific Computing Conference (SciPy)
-
-
Bergstra, J.1
Breuleux, O.2
Bastien, F.3
Lamblin, P.4
Pascanu, R.5
Desjardins, G.6
Turian, J.7
Warde-Farley, D.8
Bengio, Y.9
-
5
-
-
84943800045
-
Weakly supervised action labeling in videos under ordering constraints
-
P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Weakly supervised action labeling in videos under ordering constraints. In ECCV. 2014.
-
(2014)
ECCV
-
-
Bojanowski, P.1
Lajugie, R.2
Bach, F.3
Laptev, I.4
Ponce, J.5
Schmid, C.6
Sivic, J.7
-
6
-
-
84859089502
-
Collecting highly parallel data for paraphrase evaluation
-
D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, 2011.
-
(2011)
ACL
-
-
Chen, D.L.1
Dolan, W.B.2
-
7
-
-
84952349295
-
-
arXiv 1504. 00325
-
X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. ArXiv 1504. 00325, 2015.
-
(2015)
Microsoft Coco Captions: Data Collection and Evaluation Server
-
-
Chen, X.1
Fang, H.2
Lin, T.-Y.3
Vedantam, R.4
Gupta, S.5
Dollar, P.6
Zitnick, C.L.7
-
8
-
-
84961291190
-
Learning phrase representations using RNN encoder-decoder for statistical machine translation
-
Oct.
-
K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. In EMNLP, Oct. 2014.
-
(2014)
EMNLP
-
-
Cho, K.1
Van Merrienboer, B.2
Gulcehre, C.3
Bougares, F.4
Schwenk, H.5
Bengio, Y.6
-
9
-
-
34948855444
-
Human detection using oriented histograms of flow and appearance
-
N. Dalal, B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. In ECCV. 2006.
-
(2006)
ECCV
-
-
Dalal, N.1
Triggs, B.2
Schmid, C.3
-
10
-
-
84926007060
-
Meteor universal: Language specific translation evaluation for any target language
-
M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In EACL Workshop, 2014.
-
(2014)
EACL Workshop
-
-
Denkowski, M.1
Lavie, A.2
-
11
-
-
84959236502
-
Long-term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. CVPR, 2015.
-
(2015)
CVPR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
12
-
-
84973872525
-
Temporal localization of actions with actoms
-
A. Gaidon, Z. Harchaoui, and C. Schmid. Temporal localization of actions with actoms. PAMI, 2013.
-
(2013)
PAMI
-
-
Gaidon, A.1
Harchaoui, Z.2
Schmid, C.3
-
15
-
-
84870183903
-
3d convolutional neural networks for human action recognition
-
S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. PAMI, 2013.
-
(2013)
PAMI
-
-
Ji, S.1
Xu, W.2
Yang, M.3
Yu, K.4
-
16
-
-
84913555165
-
-
arXiv:1408. 5093
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. ArXiv:1408. 5093, 2014.
-
(2014)
Caffe: Convolutional Architecture for Fast Feature Embedding
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
17
-
-
84952902559
-
Deep visual-semantic alignments for generating image descriptions
-
A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Fei-Fei, L.2
-
18
-
-
84996541359
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR. IEEE, 2014.
-
(2014)
CVPR. IEEE
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
19
-
-
84952349298
-
Unifying visual-semantic embeddings with multimodal neural language models
-
R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. ACL, 2014.
-
(2014)
ACL
-
-
Kiros, R.1
Salakhutdinov, R.2
Zemel, R.S.3
-
20
-
-
0036843382
-
Natural language description of human activities from video images based on concept hierarchy of actions
-
A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. IJCV, 2002.
-
(2002)
IJCV
-
-
Kojima, A.1
Tamura, T.2
Fukunaga, K.3
-
21
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
23
-
-
85133336275
-
Bleu: A method for automatic evaluation of machine translation
-
K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, 2002.
-
(2002)
ACL
-
-
Papineni, K.1
Roukos, S.2
Ward, T.3
Zhu, W.-J.4
-
24
-
-
84965108042
-
-
arXiv: 1412. 6604
-
M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert, and S. Chopra. Video (language) modeling: A baseline for generative models of natural videos. ArXiv: 1412. 6604, 2014.
-
(2014)
Video (Language) Modeling: A Baseline for Generative Models of Natural Videos
-
-
Ranzato, M.1
Szlam, A.2
Bruna, J.3
Mathieu, M.4
Collobert, R.5
Chopra, S.6
-
26
-
-
84898775239
-
Translating video content to natural language descriptions
-
M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
-
(2013)
ICCV
-
-
Rohrbach, M.1
Qiu, W.2
Titov, I.3
Thater, S.4
Pinkal, M.5
Schiele, B.6
-
27
-
-
85083951635
-
Overfeat: Integrated recognition, localization and detection using convolutional networks
-
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. ICLR, 2014.
-
(2014)
ICLR
-
-
Sermanet, P.1
Eigen, D.2
Zhang, X.3
Mathieu, M.4
Fergus, R.5
LeCun, Y.6
-
28
-
-
84937862424
-
Two-stream convolutional networks for action recognition in videos
-
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. NIPS, 2014.
-
(2014)
NIPS
-
-
Simonyan, K.1
Zisserman, A.2
-
30
-
-
84928547704
-
Sequence to sequence learning with neural networks
-
I. Sutskever, O. Vinyals, and Q. V. V. Le. Sequence to sequence learning with neural networks. In NIPS. 2014.
-
(2014)
NIPS
-
-
Sutskever, I.1
Vinyals, O.2
Le, Q.V.V.3
-
31
-
-
84937522268
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. CVPR, 2015.
-
(2015)
CVPR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
32
-
-
84887372329
-
Learning latent temporal structure for complex event detection
-
K. Tang, L. Fei-Fei, and D. Koller. Learning latent temporal structure for complex event detection. In CVPR. IEEE, 2012.
-
(2012)
CVPR. IEEE
-
-
Tang, K.1
Fei-Fei, L.2
Koller, D.3
-
33
-
-
78149336740
-
Convolutional learning of spatio-temporal features
-
Springer
-
G. W. Taylor, R. Fergus, Y. LeCun, and C. Bregler. Convolutional learning of spatio-temporal features. In Computer Vision-ECCV 2010, pages 140-153. Springer, 2010.
-
(2010)
Computer Vision-ECCV 2010
, pp. 140-153
-
-
Taylor, G.W.1
Fergus, R.2
LeCun, Y.3
Bregler, C.4
-
34
-
-
84959932469
-
Integrating language and vision to generate natural language descriptions of videos in the wild
-
J. Thomason, S. Venugopalan, S. Guadarrama, K. Saenko, and R. Mooney. Integrating language and vision to generate natural language descriptions of videos in the wild. In COLING, 2014.
-
(2014)
COLING
-
-
Thomason, J.1
Venugopalan, S.2
Guadarrama, S.3
Saenko, K.4
Mooney, R.5
-
36
-
-
84969504307
-
-
arXiv:1412. 0767
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3D: Generic features for video analysis. ArXiv:1412. 0767, 2014.
-
(2014)
C3D: Generic Features for Video Analysis
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
37
-
-
84956980995
-
CIDEr: Consensus-based image description evaluation
-
R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. CVPR, 2015.
-
(2015)
CVPR
-
-
Vedantam, R.1
Zitnick, C.L.2
Parikh, D.3
-
38
-
-
84959876769
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. NAACL, 2015.
-
(2015)
NAACL
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.5
Saenko, K.6
-
40
-
-
77958592879
-
-
University of Central Florida, U. S. A
-
H. Wang, M. M. Ullah, A. Klser, I. Laptev, and C. Schmid. Evaluation of local spatio-temporal features for action recognition. In University of Central Florida, U. S. A, 2009.
-
(2009)
Evaluation of Local Spatio-temporal Features for Action Recognition
-
-
Wang, H.1
Ullah, M.M.2
Klser, A.3
Laptev, I.4
Schmid, C.5
-
41
-
-
84970002232
-
Show, attend and tell: Neural image caption generation with visual attention
-
K. Xu, J. Ba, R. Kiros,, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ICML, 2015.
-
(2015)
ICML
-
-
Xu, K.1
Ba, J.2
Kiros, R.3
Cho, K.4
Courville, A.5
Salakhutdinov, R.6
Zemel, R.7
Bengio, Y.8
|