SCOPUS 정보 검색 플랫폼

MM 2014 - Proceedings of the 2014 ACM Conference on Multimedia

Volumn , Issue , 2014, Pages 97-106

3D Human activity recognition with reconfigurable convolutional neural networks

(5) Wang, Keze a Wang, Xiaolong a Lin, Liang a Wang, Meng b Zuo, Wangmeng c

a SUN YAT SEN UNIVERSITY (China)

b HEFEI UNIVERSITY OF TECHNOLOGY (China)

c HARBIN INSTITUTE OF TECHNOLOGY (China)

Author keywords

3D activity; Deep learning; Structured model; Video parsing

Indexed keywords

BACKPROPAGATION; CONVOLUTION; DEEP LEARNING; IMAGE SEGMENTATION; ITERATIVE METHODS; PATTERN RECOGNITION;

AUTOMATIC ACTIVITY RECOGNITION; HUMAN ACTIVITY RECOGNITION; MULTIMEDIA PROCESSING; NETWORK CONFIGURATION; OPTIMIZATION METHOD; STATE-OF-THE-ART METHODS; STRUCTURED MODEL; VIDEO PARSING;

CONVOLUTIONAL NEURAL NETWORKS;

EID: 84913584483 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/2647868.2654912 Document Type: Conference Paper

Times cited : (88)

References (41)

1
- 84866674206
- Sum-product networks for modeling activities with stochastic structure
- M. R. Amer and S. Todorovic. Sum-product networks for modeling activities with stochastic structure. In CVPR, pages 1314-1321, 2012.
- (2012) CVPR , pp. 1314-1321
- Amer, M.R.¹ Todorovic, S.²

2
- 84856661125
- Learning spatiotemporal graphs of human activities
- W. Brendel and S. Todorovic. Learning spatiotemporal graphs of human activities. In ICCV, pages 778-785, 2011.
- (2011) ICCV , pp. 778-785
- Brendel, W.¹ Todorovic, S.²

3
- 84875494948
- A survey of video datasets for human action and activity recognition
- J. M. Chaquet, E. J. Carmona, and A. Fernandez-Caballero. A survey of video datasets for human action and activity recognition. Computer Vision and Image Understanding, 117(6):633-659, 2013.
- (2013) Computer Vision and Image Understanding , vol.117 , Issue.6 , pp. 633-659
- Chaquet, J.M.¹ Carmona, E.J.² Fernandez-Caballero, A.³

4
- 84455205109
- Human group activity analysis with fusion of motion and appearance information
- Z. Cheng, L. Qin, Q. Huang, S. Jiang, S. Yan, and Q. Tian. Human group activity analysis with fusion of motion and appearance information. In ACM Multimedia, pages 1401-1404, 2011.
- (2011) ACM Multimedia , pp. 1401-1404
- Cheng, Z.¹ Qin, L.² Huang, Q.³ Jiang, S.⁴ Yan, S.⁵ Tian, Q.⁶

5
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
- (2014) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

6
- 84887418625
- Human activities recognition using depth images
- R. Gupta, A. Y. Chia, D. Rajan, E. S. Ng, and E. H. Lung. Human activities recognition using depth images. In ACM Multimedia, pages 283-292, 2013.
- (2013) ACM Multimedia , pp. 283-292
- Gupta, R.¹ Chia, A.Y.² Rajan, D.³ Ng, E.S.⁴ Lung, E.H.⁵

7
- 33746600649
- Reducing the dimensionality of data with neural networks
- G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504-507, 2006.
- (2006) Science , vol.313 , Issue.5786 , pp. 504-507
- Hinton, G.E.¹ Salakhutdinov, R.R.²

8
- 84870183903
- 3d convolutional neural networks for human action recognition
- S. Ji, W. Xu, M. Yang, and K. Yu. 3d convolutional neural networks for human action recognition. IEEE Trans. Pattern Anal. Mach. Intell., 35(1):221-231, 2013.
- (2013) IEEE Trans. Pattern Anal. Mach. Intell. , vol.35 , Issue.1 , pp. 221-231
- Ji, S.¹ Xu, W.² Yang, M.³ Yu, K.⁴

9
- 84880311243
- Learning human activities and object affordances from rgb-d videos
- H. S. Koppula, R. Gupta, and A. Saxena. Learning human activities and object affordances from rgb-d videos. International Journal of Robotics Research (IJRR), 32(8):951-970, 2013.
- (2013) International Journal of Robotics Research (IJRR) , vol.32 , Issue.8 , pp. 951-970
- Koppula, H.S.¹ Gupta, R.² Saxena, A.³

10
- 84926464781
- Learning spatio-temporal structure from rgb-d videos for human activity detection and anticipation
- H. S. Koppula and A. Saxena. Learning spatio-temporal structure from rgb-d videos for human activity detection and anticipation. In ICML, pages 792-800, 2013.
- (2013) ICML , pp. 792-800
- Koppula, H.S.¹ Saxena, A.²

11
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097-1105, 2012.
- (2012) Advances in Neural Information Processing Systems , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

12
- 80052874098
- Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis
- Q. V. Le, W. Y. Zou, S. Y. Yeung, and A. Y. Ng. Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In CVPR, pages 3361-3368, 2011.
- (2011) CVPR , pp. 3361-3368
- Le, Q.V.¹ Zou, W.Y.² Yeung, S.Y.³ Ng, A.Y.⁴

13
- 0000494466
- Handwritten digit recognition with a back-propagation network
- Y. Le Cun, B. Boser, J. Denker, D. Henderson, R. Howard, W. Hubbard, and e. a. L. D. Jackel. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, 1990.
- (1990) Advances in Neural Information Processing Systems
- Le Cun, Y.¹ Boser, B.² Denker, J.³ Henderson, D.⁴ Howard, R.⁵ Hubbard, W.⁶ Jackel, L.D.⁷

14
- 84887476984
- Learning latent spatio-temporal compositional model for human action recognition
- X. Liang, L. Lin, and L. Cao. Learning latent spatio-temporal compositional model for human action recognition. In ACM Multimedia, pages 263-272, 2013.
- (2013) ACM Multimedia , pp. 263-272
- Liang, X.¹ Lin, L.² Cao, L.³

15
- 56049121516
- Semantic event representation and recognition using syntactic attribute graph grammar
- L. Lin, H. Gong, L. Li, and L. Wang. Semantic event representation and recognition using syntactic attribute graph grammar. Pattern Recognition Letters, 30(2):180-186, 2009.
- (2009) Pattern Recognition Letters , vol.30 , Issue.2 , pp. 180-186
- Lin, L.¹ Gong, H.² Li, L.³ Wang, L.⁴

16
- 62349137210
- A stochastic graph grammar for compositional object representation and recognition
- L. Lin, T. Wu, J. Porway, and Z. Xu. A stochastic graph grammar for compositional object representation and recognition. Pattern Recognition, 42(7):1297-1307, 2009.
- (2009) Pattern Recognition , vol.42 , Issue.7 , pp. 1297-1307
- Lin, L.¹ Wu, T.² Porway, J.³ Xu, Z.⁴

17
- 84898796864
- A deep sum-product architecture for robust facial attributes analysis
- P. Luo, X. Wang, and X. Tang. A deep sum-product architecture for robust facial attributes analysis. In ICCV, pages 2864-2871, 2013.
- (2013) ICCV , pp. 2864-2871
- Luo, P.¹ Wang, X.² Tang, X.³

18
- 84898770979
- Pedestrian parsing via deep decompositional neural network
- P. Luo, X. Wang, and X. Tang. Pedestrian parsing via deep decompositional neural network. In ICCV, pages 2648-2655, 2013.
- (2013) ICCV , pp. 2648-2655
- Luo, P.¹ Wang, X.² Tang, X.³

19
- 84881515103
- Integrating multi-stage depth-induced contextual information for human action recognition and localization
- B. Ni, Z. L. Y. Pei, L. Lin, and P. Moulin. Integrating multi-stage depth-induced contextual information for human action recognition and localization. In International Conference and Workshops on Automatic Face and Gesture Recognition, pages 1-8, 2013.
- (2013) International Conference and Workshops on Automatic Face and Gesture Recognition , pp. 1-8
- Ni, B.¹ Pei, Z.L.Y.² Lin, L.³ Moulin, P.⁴

20
- 84887375927
- Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences
- O. Oreifej and Z. Liu. Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In CVPR, pages 716-723, 2013.
- (2013) CVPR , pp. 716-723
- Oreifej, O.¹ Liu, Z.²

21
- 84866717619
- A combined pose, object, and feature model for action understanding
- B. Packer, K. Saenko, and D. Koller. A combined pose, object, and feature model for action understanding. In CVPR, pages 1378-1385, 2012.
- (2012) CVPR , pp. 1378-1385
- Packer, B.¹ Saenko, K.² Koller, D.³

22
- 84856646751
- Parsing video events with goal inference and intent prediction
- M. Pei, Y. Jia, and S. Zhu. Parsing video events with goal inference and intent prediction. In ICCV, pages 487-494, 2011.
- (2011) ICCV , pp. 487-494
- Pei, M.¹ Jia, Y.² Zhu, S.³

23
- 84866718894
- Action bank: A high-level representation of activity in video
- S. Sadanand and J. J. Corso. Action bank: A high-level representation of activity in video. In CVPR, pages 1234-1241, 2012.
- (2012) CVPR , pp. 1234-1241
- Sadanand, S.¹ Corso, J.J.²

24
- 37849037402
- A 3-dimensional sift descriptor and its application to action recognition
- P. Scovanner, S. Ali, and M. Shah. A 3-dimensional sift descriptor and its application to action recognition. In ACM Multimedia, pages 357-360, 2007.
- (2007) ACM Multimedia , pp. 357-360
- Scovanner, P.¹ Ali, S.² Shah, M.³

25
- 84864487638
- Unstructured human activity detection from rgbd images
- J. Sung, C. Ponce, B. Selman, and A. Saxena. Unstructured human activity detection from rgbd images. In ICRA, pages 842-849, 2012.
- (2012) ICRA , pp. 842-849
- Sung, J.¹ Ponce, C.² Selman, B.³ Saxena, A.⁴

26
- 84866658784
- Learning latent temporal structure for complex event detection
- K. Tang, L. Fei-Fei, and D. Koller. Learning latent temporal structure for complex event detection. In CVPR, pages 1250-1257, 2012.
- (2012) CVPR , pp. 1250-1257
- Tang, K.¹ Fei-Fei, L.² Koller, D.³

27
- 78149336740
- Convolutional learning of spatio-temporal features
- G. W. Taylor, R. Fergus, Y. Le Cun, and C. Bregler. Convolutional learning of spatio-temporal features. In ECCV, pages 140-153, 2010.
- (2010) ECCV , pp. 140-153
- Taylor, G.W.¹ Fergus, R.² Le Cun, Y.³ Bregler, C.⁴

28
- 84901405262
- Joint video and text parsing for understanding events and answering queries
- K. Tu, M. Meng, M. W. Lee, T. Choi, and S. Zhu. Joint video and text parsing for understanding events and answering queries. IEEE Transactions on Multimedia, 21(2):42-70, 2014.
- (2014) IEEE Transactions on Multimedia , vol.21 , Issue.2 , pp. 42-70
- Tu, K.¹ Meng, M.² Lee, M.W.³ Choi, T.⁴ Zhu, S.⁵

29
- 84887346790
- An approach to pose-based action recognition
- C. Wang, Y. Wang, and A. L. Yuille. An approach to pose-based action recognition. In CVPR, pages 915-922, 2013.
- (2013) CVPR , pp. 915-922
- Wang, C.¹ Wang, Y.² Yuille, A.L.³

30
- 84866672692
- Mining actionlet ensemble for action recognition with depth cameras
- J. Wang, Z. Liu, Y. Wu, and J. Yuan. Mining actionlet ensemble for action recognition with depth cameras. In CVPR, pages 1290-1297, 2012.
- (2012) CVPR , pp. 1290-1297
- Wang, J.¹ Liu, Z.² Wu, Y.³ Yuan, J.⁴

31
- 84898794902
- Learning maximum margin temporal warping for action recognition
- J. Wang and Y. Wu. Learning maximum margin temporal warping for action recognition. In ICCV, pages 2688-2695, 2013.
- (2013) ICCV , pp. 2688-2695
- Wang, J.¹ Wu, Y.²

32
- 84887381206
- Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection
- X. Wang, L. Lin, L. Huang, and S. Yan. Incorporating structural alternatives and sharing into hierarchy for multiclass object recognition and detection. In CVPR, pages 3334-3341, 2013.
- (2013) CVPR , pp. 3334-3341
- Wang, X.¹ Lin, L.² Huang, L.³ Yan, S.⁴

33
- 79957467077
- Hidden part models for human action recognition: Probabilistic vs
- Y. Wang and G. Mori. Hidden part models for human action recognition: Probabilistic vs. max-margin. IEEE Trans. Pattern Anal. Mach. Intell., 33(7):1310-1323, 2011.
- (2011) Max-margin. IEEE Trans. Pattern Anal. Mach. Intell. , vol.33 , Issue.7 , pp. 1310-1323
- Wang, Y.¹ Mori, G.²

34
- 84887419657
- Online multimodal deep similarity learning with application to image retrieval
- P. Wu, S. Hoi, H. Xia, P. Zhao, D. Wang, and C. Miao. Online multimodal deep similarity learning with application to image retrieval. In ACM Mutilmedia, pages 153-162, 2013.
- (2013) ACM Mutilmedia , pp. 153-162
- Wu, P.¹ Hoi, S.² Xia, H.³ Zhao, P.⁴ Wang, D.⁵ Miao, C.⁶

35
- 84887324355
- Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera
- L. Xia and J. Aggarwal. Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In CVPR, pages 2834-2841, 2013.
- (2013) CVPR , pp. 2834-2841
- Xia, L.¹ Aggarwal, J.²

36
- 84865033379
- View invariant human action recognition using histograms of 3d joints
- L. Xia, C. Chen, and J. K. Aggarwal. View invariant human action recognition using histograms of 3d joints. In CVPRW, pages 20-27, 2012.
- (2012) CVPRW , pp. 20-27
- Xia, L.¹ Chen, C.² Aggarwal, J.K.³

37
- 84871394796
- Recognizing actions using depth motion maps-based histograms of oriented gradients
- X. Yang, C. Zhang, and Y. Tian. Recognizing actions using depth motion maps-based histograms of oriented gradients. In ACM Multimedia, pages 1057-1060, 2012.
- (2012) ACM Multimedia , pp. 1057-1060
- Yang, X.¹ Zhang, C.² Tian, Y.³

38
- 80052889296
- Learning image representations from the pixel level via hierarchical sparse coding
- K. Yu, Y. Lin, and J. Lafferty. Learning image representations from the pixel level via hierarchical sparse coding. In CVPR, pages 1713-1720, 2011.
- (2011) CVPR , pp. 1713-1720
- Yu, K.¹ Lin, Y.² Lafferty, J.³

39
- 84887474318
- Exploring discriminative pose sub-patterns for effective action classification
- X. Zhao, Y. Liu, and Y. Fu. Exploring discriminative pose sub-patterns for effective action classification. In ACM Multimedia, pages 273-282, 2013.
- (2013) ACM Multimedia , pp. 273-282
- Zhao, X.¹ Liu, Y.² Fu, Y.³

40
- 70350676914
- Sift-bag kernel for video event analysis
- X. Zhou, X. Zhuang, S. Yan, S. F. Chang, M. H. Johnson, and T. S. Huang. Sift-bag kernel for video event analysis. In ACM Multimedia, pages 229-238, 2009.
- (2009) ACM Multimedia , pp. 229-238
- Zhou, X.¹ Zhuang, X.² Yan, S.³ Chang, S.F.⁴ Johnson, M.H.⁵ Huang, T.S.⁶

41
- 34548726226
- A stochastic grammar of images
- S. Zhu and D. Mumford. A stochastic grammar of images. Foundations and Trends in Computer Graphics and Vision, 2(4):259-362, 2007.
- (2007) Foundations and Trends in Computer Graphics and Vision , vol.2 , Issue.4 , pp. 259-362
- Zhu, S.¹ Mumford, D.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.