-
1
-
-
84866678025
-
Three things everyone should know to improve object retrieval
-
R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, 2012.
-
(2012)
CVPR
-
-
Arandjelovic, R.1
Zisserman, A.2
-
2
-
-
14344252374
-
Multiple kernel learning, conic duality, and the smo algorithm
-
F. R. Bach, G. R. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. In ICML, 2004.
-
(2004)
ICML
-
-
Bach, F.R.1
Lanckriet, G.R.2
Jordan, M.I.3
-
3
-
-
84872560515
-
Practical recommendations for gradient-based training of deep architectures
-
Springer
-
Y. Bengio. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade. Springer, 2012.
-
(2012)
Neural Networks: Tricks of the Trade
-
-
Bengio, Y.1
-
5
-
-
84055222005
-
Context-dependent pre-Trained deep neural networks for large-vocabulary speech recognition
-
G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE TASLP, 2012.
-
(2012)
IEEE TASLP
-
-
Dahl, G.E.1
Yu, D.2
Deng, L.3
Acero, A.4
-
6
-
-
84946802546
-
Long-Term recurrent convolutional networks for visual recognition and description
-
J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-Term recurrent convolutional networks for visual recognition and description. CoRR, 2014.
-
(2014)
CoRR
-
-
Donahue, J.1
Hendricks, L.A.2
Guadarrama, S.3
Rohrbach, M.4
Venugopalan, S.5
Saenko, K.6
Darrell, T.7
-
7
-
-
84911400494
-
Rich feature hierarchies for accurate object detection and semantic segmentation
-
R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
-
(2014)
CVPR
-
-
Girshick, R.1
Donahue, J.2
Darrell, T.3
Malik, J.4
-
8
-
-
84890543083
-
Speech recognition with deep recurrent neural networks
-
A. Graves, A. Mohamed, and G. E. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP, 2013.
-
(2013)
ICASSP
-
-
Graves, A.1
Mohamed, A.2
Hinton, G.E.3
-
9
-
-
27744588611
-
Framewise phoneme classification with bidirectional lstm and other neural network architectures
-
A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 2005.
-
(2005)
Neural Networks
-
-
Graves, A.1
Schmidhuber, J.2
-
12
-
-
84894901576
-
Discovering joint audio-visual codewords for video event detection
-
I.-H. Jhuo, G. Ye, S. Gao, D. Liu, Y.-G. Jiang, D. T. Lee, and S.-F. Chang. Discovering joint audio-visual codewords for video event detection. Machine Vision and Applications, 2014.
-
(2014)
Machine Vision and Applications
-
-
Jhuo, I.-H.1
Ye, G.2
Gao, S.3
Liu, D.4
Jiang, Y.-G.5
Lee, D.T.6
Chang, S.-F.7
-
13
-
-
77956507967
-
3d convolutional neural networks for human action recognition
-
S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. In ICML, 2010.
-
(2010)
ICML
-
-
Ji, S.1
Xu, W.2
Yang, M.3
Yu, K.4
-
14
-
-
84913580146
-
Caffe: Convolutional architecture for fast feature embedding
-
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional architecture for fast feature embedding. In ACM Multimedia, 2014.
-
(2014)
ACM Multimedia
-
-
Jia, Y.1
Shelhamer, E.2
Donahue, J.3
Karayev, S.4
Long, J.5
Girshick, R.6
Guadarrama, S.7
Darrell, T.8
-
15
-
-
72549099611
-
Short-Term audio-visual atoms for generic video concept classification
-
W. Jiang, C. Cotton, S.-F. Chang, D. Ellis, and A. Loui. Short-Term audio-visual atoms for generic video concept classification. In ACM Multimedia, 2009.
-
(2009)
ACM Multimedia
-
-
Jiang, W.1
Cotton, C.2
Chang, S.-F.3
Ellis, D.4
Loui, A.5
-
17
-
-
84905052261
-
-
Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. http://crcv.ucf.edu/THUMOS14/, 2014.
-
(2014)
THUMOS Challenge: Action Recognition with A Large Number of Classes
-
-
Jiang, Y.-G.1
Liu, J.2
Roshan Zamir, A.3
Toderici, G.4
Laptev, I.5
Shah, M.6
Sukthankar, R.7
-
18
-
-
79959766559
-
Consumer video understanding: A benchmark database and an evaluation of human and machine performance
-
Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In ICMR, 2011.
-
(2011)
ICMR
-
-
Jiang, Y.-G.1
Ye, G.2
Chang, S.-F.3
Ellis, D.4
Loui, A.C.5
-
19
-
-
84911364368
-
Large-scale video classification with convolutional neural networks
-
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In CVPR, 2014.
-
(2014)
CVPR
-
-
Karpathy, A.1
Toderici, G.2
Shetty, S.3
Leung, T.4
Sukthankar, R.5
Fei-Fei, L.6
-
20
-
-
84898426452
-
A spatio-Temporal descriptor based on 3d-gradients
-
A. Klaser, M. Marsza lek, and C. Schmid. A spatio-Temporal descriptor based on 3d-gradients. In BMVC, 2008.
-
(2008)
BMVC
-
-
Klaser, A.1
Marszalek, M.2
Schmid, C.3
-
22
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
23
-
-
84962801975
-
Beyond Gaussian pyramid: Multi-skip feature stacking for action recognition
-
Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj. Beyond Gaussian pyramid: Multi-skip feature stacking for action recognition. CoRR, 2014.
-
(2014)
CoRR
-
-
Lan, Z.1
Lin, M.2
Li, X.3
Hauptmann, A.G.4
Raj, B.5
-
24
-
-
84962874838
-
On space-Time interest points
-
I. Laptev. On space-Time interest points. IJCV, 2007.
-
(2007)
IJCV
-
-
Laptev, I.1
-
25
-
-
55149112799
-
Expandable data-driven graphical modeling of human actions based on salient postures
-
W. Li, Z. Zhang, and Z. Liu. Expandable data-driven graphical modeling of human actions based on salient postures. IEEE TCSVT, 2008.
-
(2008)
IEEE TCSVT
-
-
Li, W.1
Zhang, Z.2
Liu, Z.3
-
26
-
-
84887331855
-
Sample-specific late fusion for visual category recognition
-
D. Liu, K.-T. Lai, G. Ye, M.-S. Chen, and S.-F. Chang. Sample-specific late fusion for visual category recognition. In CVPR, 2013.
-
(2013)
CVPR
-
-
Liu, D.1
Lai, K.-T.2
Ye, G.3
Chen, M.-S.4
Chang, S.-F.5
-
27
-
-
3042535216
-
Distinctive image features from scale-invariant keypoints
-
D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
-
(2004)
IJCV
-
-
Lowe, D.G.1
-
28
-
-
84905653531
-
Reduced analytic dependency modeling: Robust fusion for visual recognition
-
A. J. Ma and P. C. Yuen. Reduced analytic dependency modeling: Robust fusion for visual recognition. IJCV, 2014.
-
(2014)
IJCV
-
-
Ma, A.J.1
Yuen, P.C.2
-
29
-
-
80053437179
-
Multimodal deep learning
-
J. Ngiam, A. Khosla, M. Kim, J. Nam, H. Lee, and A. Ng. Multimodal deep learning. In ICML, 2011.
-
(2011)
ICML
-
-
Ngiam, J.1
Khosla, A.2
Kim, M.3
Nam, J.4
Lee, H.5
Ng, A.6
-
30
-
-
84898791167
-
Action and event recognition with fisher vectors on a compact feature set
-
D. Oneata, J. Verbeek, C. Schmid, et al. Action and event recognition with fisher vectors on a compact feature set. In ICCV, 2013.
-
(2013)
ICCV
-
-
Oneata, D.1
Verbeek, J.2
Schmid, C.3
-
31
-
-
84962897886
-
Video (language) modeling: A baseline for generative models of natural videos
-
M. Ranzato, A. Szlam, J. Bruna, M. Mathieu, R. Collobert, and S. Chopra. Video (language) modeling: A baseline for generative models of natural videos. CoRR, 2014.
-
(2014)
CoRR
-
-
Ranzato, M.1
Szlam, A.2
Bruna, J.3
Mathieu, M.4
Collobert, R.5
Chopra, S.6
-
33
-
-
84937862424
-
Two-stream convolutional networks for action recognition in videos
-
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
-
(2014)
NIPS
-
-
Simonyan, K.1
Zisserman, A.2
-
34
-
-
84933585162
-
Very deep convolutional networks for large-scale image recognition
-
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, 2014.
-
(2014)
CoRR
-
-
Simonyan, K.1
Zisserman, A.2
-
35
-
-
84893702065
-
UCF101: A dataset of 101 human actions classes from videos in the wild
-
K. Soomro, A. R. Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, 2012.
-
(2012)
CoRR
-
-
Soomro, K.1
Zamir, A.R.2
Shah, M.3
-
36
-
-
84962900096
-
Unsupervised learning of video representations using LSTMs
-
N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. CoRR, 2015.
-
(2015)
CoRR
-
-
Srivastava, N.1
Mansimov, E.2
Salakhutdinov, R.3
-
37
-
-
84877724347
-
Multimodal learning with deep boltzmann machines
-
N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, 2012.
-
(2012)
NIPS
-
-
Srivastava, N.1
Salakhutdinov, R.2
-
38
-
-
84941122549
-
Going deeper with convolutions
-
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going Deeper with Convolutions. CoRR, 2014.
-
(2014)
CoRR
-
-
Szegedy, C.1
Liu, W.2
Jia, Y.3
Sermanet, P.4
Reed, S.5
Anguelov, D.6
Erhan, D.7
Vanhoucke, V.8
Rabinovich, A.9
-
39
-
-
84866658784
-
Learning latent temporal structure for complex event detection
-
K. Tang, L. Fei-Fei, and D. Koller. Learning latent temporal structure for complex event detection. In CVPR, 2012.
-
(2012)
CVPR
-
-
Tang, K.1
Fei-Fei, L.2
Koller, D.3
-
40
-
-
84962823150
-
C3d: Generic features for video analysis
-
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. C3d: Generic features for video analysis. CoRR, 2014.
-
(2014)
CoRR
-
-
Tran, D.1
Bourdev, L.2
Fergus, R.3
Torresani, L.4
Paluri, M.5
-
41
-
-
84856105124
-
Conditional random fields for activity recognition
-
D. L. Vail, M. M. Veloso, and J. D. Lafferty. Conditional random fields for activity recognition. In AAAMS, 2007.
-
(2007)
AAAMS
-
-
Vail, D.L.1
Veloso, M.M.2
Lafferty, J.D.3
-
43
-
-
84944069490
-
Translating videos to natural language using deep recurrent neural networks
-
S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. J. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. CoRR, 2014.
-
(2014)
CoRR
-
-
Venugopalan, S.1
Xu, H.2
Donahue, J.3
Rohrbach, M.4
Mooney, R.J.5
Saenko, K.6
-
44
-
-
84898805910
-
Action recognition with improved trajectories
-
H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
-
(2013)
ICCV
-
-
Wang, H.1
Schmid, C.2
-
45
-
-
84898890371
-
Evaluation of local spatio-Temporal features for action recognition
-
H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-Temporal features for action recognition. In BMVC, 2009.
-
(2009)
BMVC
-
-
Wang, H.1
Ullah, M.M.2
Klaser, A.3
Laptev, I.4
Schmid, C.5
-
46
-
-
70450216856
-
Max-margin hidden conditional random fields for human action recognition
-
Y. Wang and G. Mori. Max-margin hidden conditional random fields for human action recognition. In CVPR, 2009.
-
(2009)
CVPR
-
-
Wang, Y.1
Mori, G.2
-
47
-
-
84913586072
-
Exploring inter-feature and inter-class relationships with deep neural networks for video classification
-
Z. Wu, Y.-G. Jiang, J. Wang, J. Pu, and X. Xue. Exploring inter-feature and inter-class relationships with deep neural networks for video classification. In ACM Multimedia, 2014.
-
(2014)
ACM Multimedia
-
-
Wu, Z.1
Jiang, Y.-G.2
Wang, J.3
Pu, J.4
Xue, X.5
-
48
-
-
84898834622
-
Feature weighting via optimal thresholding for video analysis
-
Z. Xu, Y. Yang, I. Tsang, N. Sebe, and A. Hauptmann. Feature weighting via optimal thresholding for video analysis. In ICCV, 2013.
-
(2013)
ICCV
-
-
Xu, Z.1
Yang, Y.2
Tsang, I.3
Sebe, N.4
Hauptmann, A.5
-
49
-
-
84962874851
-
Efficient online learning for multitask feature selection
-
H. Yang, M. R. Lyu, and I. King. Efficient online learning for multitask feature selection. ACM SIGKDD, 2013.
-
(2013)
ACM SIGKDD
-
-
Yang, H.1
Lyu, M.R.2
King, I.3
-
51
-
-
84962376858
-
Evaluating two-stream cnn for video classification
-
H. Ye, Z. Wu, R.-W. Zhao, X. Wang, Y.-G. Jiang, and X. Xue. Evaluating two-stream cnn for video classification. In ICMR, 2015.
-
(2015)
ICMR
-
-
Ye, H.1
Wu, Z.2
Zhao, R.-W.3
Wang, X.4
Jiang, Y.-G.5
Xue, X.6
-
52
-
-
80054879214
-
Knowledge based activity recognition with dynamic Bayesian network
-
Z. Zeng and Q. Ji. Knowledge based activity recognition with dynamic Bayesian network. In ECCV, 2010.
-
(2010)
ECCV
-
-
Zeng, Z.1
Ji, Q.2
-
53
-
-
84962833704
-
Exploiting image-Trained cnn architectures for unconstrained video classification
-
S. Zha, F. Luisier, W. Andrews, N. Srivastava, and R. Salakhutdinov. Exploiting image-Trained cnn architectures for unconstrained video classification. CoRR, 2015.
-
(2015)
CoRR
-
-
Zha, S.1
Luisier, F.2
Andrews, W.3
Srivastava, N.4
Salakhutdinov, R.5
-
54
-
-
33846580425
-
Local features and kernels for classification of texture and object categories: A comprehensive study
-
J. Zhang, M. Marsza lek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 2007.
-
(2007)
IJCV
-
-
Zhang, J.1
Marszalek, M.2
Lazebnik, S.3
Schmid, C.4
|