메뉴 건너뛰기




Volumn , Issue , 2015, Pages 461-470

Modeling spatial-Temporal clues in a hybrid deep learning framework for video classification

Author keywords

CNN; Deep Learning; Fusion.; LSTM; Video Classification

Indexed keywords

BENCHMARKING; FUSION REACTIONS; IMAGE CLASSIFICATION; MOTION ESTIMATION; NEURAL NETWORKS; SEMANTICS; VIDEO STREAMING;

EID: 84962921420     PISSN: None     EISSN: None     Source Type: Conference Proceeding    
DOI: 10.1145/2733373.2806222     Document Type: Conference Paper
Times cited : (414)

References (54)
  • 1
    • 84866678025 scopus 로고    scopus 로고
    • Three things everyone should know to improve object retrieval
    • R. Arandjelovic and A. Zisserman. Three things everyone should know to improve object retrieval. In CVPR, 2012.
    • (2012) CVPR
    • Arandjelovic, R.1    Zisserman, A.2
  • 2
    • 14344252374 scopus 로고    scopus 로고
    • Multiple kernel learning, conic duality, and the smo algorithm
    • F. R. Bach, G. R. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. In ICML, 2004.
    • (2004) ICML
    • Bach, F.R.1    Lanckriet, G.R.2    Jordan, M.I.3
  • 3
    • 84872560515 scopus 로고    scopus 로고
    • Practical recommendations for gradient-based training of deep architectures
    • Springer
    • Y. Bengio. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade. Springer, 2012.
    • (2012) Neural Networks: Tricks of the Trade
    • Bengio, Y.1
  • 5
    • 84055222005 scopus 로고    scopus 로고
    • Context-dependent pre-Trained deep neural networks for large-vocabulary speech recognition
    • G. E. Dahl, D. Yu, L. Deng, and A. Acero. Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition. IEEE TASLP, 2012.
    • (2012) IEEE TASLP
    • Dahl, G.E.1    Yu, D.2    Deng, L.3    Acero, A.4
  • 7
    • 84911400494 scopus 로고    scopus 로고
    • Rich feature hierarchies for accurate object detection and semantic segmentation
    • R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
    • (2014) CVPR
    • Girshick, R.1    Donahue, J.2    Darrell, T.3    Malik, J.4
  • 8
    • 84890543083 scopus 로고    scopus 로고
    • Speech recognition with deep recurrent neural networks
    • A. Graves, A. Mohamed, and G. E. Hinton. Speech recognition with deep recurrent neural networks. In ICASSP, 2013.
    • (2013) ICASSP
    • Graves, A.1    Mohamed, A.2    Hinton, G.E.3
  • 9
    • 27744588611 scopus 로고    scopus 로고
    • Framewise phoneme classification with bidirectional lstm and other neural network architectures
    • A. Graves and J. Schmidhuber. Framewise phoneme classification with bidirectional lstm and other neural network architectures. Neural Networks, 2005.
    • (2005) Neural Networks
    • Graves, A.1    Schmidhuber, J.2
  • 13
    • 77956507967 scopus 로고    scopus 로고
    • 3d convolutional neural networks for human action recognition
    • S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. In ICML, 2010.
    • (2010) ICML
    • Ji, S.1    Xu, W.2    Yang, M.3    Yu, K.4
  • 15
  • 18
    • 79959766559 scopus 로고    scopus 로고
    • Consumer video understanding: A benchmark database and an evaluation of human and machine performance
    • Y.-G. Jiang, G. Ye, S.-F. Chang, D. Ellis, and A. C. Loui. Consumer video understanding: A benchmark database and an evaluation of human and machine performance. In ICMR, 2011.
    • (2011) ICMR
    • Jiang, Y.-G.1    Ye, G.2    Chang, S.-F.3    Ellis, D.4    Loui, A.C.5
  • 20
    • 84898426452 scopus 로고    scopus 로고
    • A spatio-Temporal descriptor based on 3d-gradients
    • A. Klaser, M. Marsza lek, and C. Schmid. A spatio-Temporal descriptor based on 3d-gradients. In BMVC, 2008.
    • (2008) BMVC
    • Klaser, A.1    Marszalek, M.2    Schmid, C.3
  • 22
    • 84876231242 scopus 로고    scopus 로고
    • Imagenet classification with deep convolutional neural networks
    • A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
    • (2012) NIPS
    • Krizhevsky, A.1    Sutskever, I.2    Hinton, G.E.3
  • 23
    • 84962801975 scopus 로고    scopus 로고
    • Beyond Gaussian pyramid: Multi-skip feature stacking for action recognition
    • Z. Lan, M. Lin, X. Li, A. G. Hauptmann, and B. Raj. Beyond Gaussian pyramid: Multi-skip feature stacking for action recognition. CoRR, 2014.
    • (2014) CoRR
    • Lan, Z.1    Lin, M.2    Li, X.3    Hauptmann, A.G.4    Raj, B.5
  • 24
    • 84962874838 scopus 로고    scopus 로고
    • On space-Time interest points
    • I. Laptev. On space-Time interest points. IJCV, 2007.
    • (2007) IJCV
    • Laptev, I.1
  • 25
    • 55149112799 scopus 로고    scopus 로고
    • Expandable data-driven graphical modeling of human actions based on salient postures
    • W. Li, Z. Zhang, and Z. Liu. Expandable data-driven graphical modeling of human actions based on salient postures. IEEE TCSVT, 2008.
    • (2008) IEEE TCSVT
    • Li, W.1    Zhang, Z.2    Liu, Z.3
  • 26
    • 84887331855 scopus 로고    scopus 로고
    • Sample-specific late fusion for visual category recognition
    • D. Liu, K.-T. Lai, G. Ye, M.-S. Chen, and S.-F. Chang. Sample-specific late fusion for visual category recognition. In CVPR, 2013.
    • (2013) CVPR
    • Liu, D.1    Lai, K.-T.2    Ye, G.3    Chen, M.-S.4    Chang, S.-F.5
  • 27
    • 3042535216 scopus 로고    scopus 로고
    • Distinctive image features from scale-invariant keypoints
    • D. G. Lowe. Distinctive image features from scale-invariant keypoints. IJCV, 2004.
    • (2004) IJCV
    • Lowe, D.G.1
  • 28
    • 84905653531 scopus 로고    scopus 로고
    • Reduced analytic dependency modeling: Robust fusion for visual recognition
    • A. J. Ma and P. C. Yuen. Reduced analytic dependency modeling: Robust fusion for visual recognition. IJCV, 2014.
    • (2014) IJCV
    • Ma, A.J.1    Yuen, P.C.2
  • 30
    • 84898791167 scopus 로고    scopus 로고
    • Action and event recognition with fisher vectors on a compact feature set
    • D. Oneata, J. Verbeek, C. Schmid, et al. Action and event recognition with fisher vectors on a compact feature set. In ICCV, 2013.
    • (2013) ICCV
    • Oneata, D.1    Verbeek, J.2    Schmid, C.3
  • 31
  • 32
    • 84906341074 scopus 로고    scopus 로고
    • CNN features off-The-shelf: An astounding baseline for recognition
    • A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-The-shelf: An astounding baseline for recognition. CoRR, 2014.
    • (2014) CoRR
    • Razavian, A.S.1    Azizpour, H.2    Sullivan, J.3    Carlsson, S.4
  • 33
    • 84937862424 scopus 로고    scopus 로고
    • Two-stream convolutional networks for action recognition in videos
    • K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. In NIPS, 2014.
    • (2014) NIPS
    • Simonyan, K.1    Zisserman, A.2
  • 34
    • 84933585162 scopus 로고    scopus 로고
    • Very deep convolutional networks for large-scale image recognition
    • K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. CoRR, 2014.
    • (2014) CoRR
    • Simonyan, K.1    Zisserman, A.2
  • 35
    • 84893702065 scopus 로고    scopus 로고
    • UCF101: A dataset of 101 human actions classes from videos in the wild
    • K. Soomro, A. R. Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. CoRR, 2012.
    • (2012) CoRR
    • Soomro, K.1    Zamir, A.R.2    Shah, M.3
  • 36
    • 84962900096 scopus 로고    scopus 로고
    • Unsupervised learning of video representations using LSTMs
    • N. Srivastava, E. Mansimov, and R. Salakhutdinov. Unsupervised learning of video representations using LSTMs. CoRR, 2015.
    • (2015) CoRR
    • Srivastava, N.1    Mansimov, E.2    Salakhutdinov, R.3
  • 37
    • 84877724347 scopus 로고    scopus 로고
    • Multimodal learning with deep boltzmann machines
    • N. Srivastava and R. Salakhutdinov. Multimodal learning with deep boltzmann machines. In NIPS, 2012.
    • (2012) NIPS
    • Srivastava, N.1    Salakhutdinov, R.2
  • 39
    • 84866658784 scopus 로고    scopus 로고
    • Learning latent temporal structure for complex event detection
    • K. Tang, L. Fei-Fei, and D. Koller. Learning latent temporal structure for complex event detection. In CVPR, 2012.
    • (2012) CVPR
    • Tang, K.1    Fei-Fei, L.2    Koller, D.3
  • 41
    • 84856105124 scopus 로고    scopus 로고
    • Conditional random fields for activity recognition
    • D. L. Vail, M. M. Veloso, and J. D. Lafferty. Conditional random fields for activity recognition. In AAAMS, 2007.
    • (2007) AAAMS
    • Vail, D.L.1    Veloso, M.M.2    Lafferty, J.D.3
  • 44
    • 84898805910 scopus 로고    scopus 로고
    • Action recognition with improved trajectories
    • H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
    • (2013) ICCV
    • Wang, H.1    Schmid, C.2
  • 45
    • 84898890371 scopus 로고    scopus 로고
    • Evaluation of local spatio-Temporal features for action recognition
    • H. Wang, M. M. Ullah, A. Klaser, I. Laptev, and C. Schmid. Evaluation of local spatio-Temporal features for action recognition. In BMVC, 2009.
    • (2009) BMVC
    • Wang, H.1    Ullah, M.M.2    Klaser, A.3    Laptev, I.4    Schmid, C.5
  • 46
    • 70450216856 scopus 로고    scopus 로고
    • Max-margin hidden conditional random fields for human action recognition
    • Y. Wang and G. Mori. Max-margin hidden conditional random fields for human action recognition. In CVPR, 2009.
    • (2009) CVPR
    • Wang, Y.1    Mori, G.2
  • 47
    • 84913586072 scopus 로고    scopus 로고
    • Exploring inter-feature and inter-class relationships with deep neural networks for video classification
    • Z. Wu, Y.-G. Jiang, J. Wang, J. Pu, and X. Xue. Exploring inter-feature and inter-class relationships with deep neural networks for video classification. In ACM Multimedia, 2014.
    • (2014) ACM Multimedia
    • Wu, Z.1    Jiang, Y.-G.2    Wang, J.3    Pu, J.4    Xue, X.5
  • 48
    • 84898834622 scopus 로고    scopus 로고
    • Feature weighting via optimal thresholding for video analysis
    • Z. Xu, Y. Yang, I. Tsang, N. Sebe, and A. Hauptmann. Feature weighting via optimal thresholding for video analysis. In ICCV, 2013.
    • (2013) ICCV
    • Xu, Z.1    Yang, Y.2    Tsang, I.3    Sebe, N.4    Hauptmann, A.5
  • 49
    • 84962874851 scopus 로고    scopus 로고
    • Efficient online learning for multitask feature selection
    • H. Yang, M. R. Lyu, and I. King. Efficient online learning for multitask feature selection. ACM SIGKDD, 2013.
    • (2013) ACM SIGKDD
    • Yang, H.1    Lyu, M.R.2    King, I.3
  • 50
    • 84866712367 scopus 로고    scopus 로고
    • Robust late fusion with rank minimization
    • G. Ye, D. Liu, I.-H. Jhuo, and S.-F. Chang. Robust late fusion with rank minimization. In CVPR, 2012.
    • (2012) CVPR
    • Ye, G.1    Liu, D.2    Jhuo, I.-H.3    Chang, S.-F.4
  • 52
    • 80054879214 scopus 로고    scopus 로고
    • Knowledge based activity recognition with dynamic Bayesian network
    • Z. Zeng and Q. Ji. Knowledge based activity recognition with dynamic Bayesian network. In ECCV, 2010.
    • (2010) ECCV
    • Zeng, Z.1    Ji, Q.2
  • 53
    • 84962833704 scopus 로고    scopus 로고
    • Exploiting image-Trained cnn architectures for unconstrained video classification
    • S. Zha, F. Luisier, W. Andrews, N. Srivastava, and R. Salakhutdinov. Exploiting image-Trained cnn architectures for unconstrained video classification. CoRR, 2015.
    • (2015) CoRR
    • Zha, S.1    Luisier, F.2    Andrews, W.3    Srivastava, N.4    Salakhutdinov, R.5
  • 54
    • 33846580425 scopus 로고    scopus 로고
    • Local features and kernels for classification of texture and object categories: A comprehensive study
    • J. Zhang, M. Marsza lek, S. Lazebnik, and C. Schmid. Local features and kernels for classification of texture and object categories: A comprehensive study. IJCV, 2007.
    • (2007) IJCV
    • Zhang, J.1    Marszalek, M.2    Lazebnik, S.3    Schmid, C.4


* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.