SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 2678-2687

End-to-End Learning of Action Detection from Frame Glimpses in Videos

(4) Yeung, Serena a Russakovsky, Olga a,b Mori, Greg c Fei Fei, Li a

a STANFORD UNIVERSITY (United States)

b CARNEGIE MELLON UNIVERSITY (United States)

c SIMON FRASER UNIVERSITY (Canada)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; RECURRENT NEURAL NETWORKS;

DECISION POLICY; END TO END; NON-DIFFERENTIABLE; STATE OF THE ART; VIDEO FRAME;

PATTERN RECOGNITION;

EID: 84986253505 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.293 Document Type: Conference Paper

Times cited : (684)

References (50)

1
- 84955518079
- arXiv preprint arXiv:1412.7755
- J. Ba, V. Mnih, and K. Kavukcuoglu. Multiple object recognition with visual attention. arXiv preprint arXiv:1412.7755, 2014.
- (2014) Multiple Object Recognition with Visual Attention
- Ba, J.¹ Mnih, V.² Kavukcuoglu, K.³

2
- 33745891801
- Actions as space-time shapes
- IEEE
- M. Blank, L. Gorelick, E. Shechtman, M. Irani, and R. Basri. Actions as space-time shapes. In Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on, volume 2, pages 1395-1402. IEEE, 2005.
- (2005) Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on , vol.2 , pp. 1395-1402
- Blank, M.¹ Gorelick, L.² Shechtman, E.³ Irani, M.⁴ Basri, R.⁵

3
- 84959216468
- Activitynet: A large-scale video benchmark for human activity understanding
- F. Caba Heilbron, V. Escorcia, B. Ghanem, and J. Carlos Niebles. Activitynet: A large-scale video benchmark for human activity understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 961-970, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 961-970
- Caba Heilbron, F.¹ Escorcia, V.² Ghanem, B.³ Carlos Niebles, J.⁴

4
- 85009912425
- arXiv preprint arXiv:1411.4389
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. arXiv preprint arXiv:1411.4389, 2014.
- (2014) Long-term Recurrent Convolutional Networks for Visual Recognition and Description
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

5
- 84911443425
- Scalable object detection using deep neural networks
- IEEE
- D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2155-2162. IEEE, 2014.
- (2014) Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on , pp. 2155-2162
- Erhan, D.¹ Szegedy, C.² Toshev, A.³ Anguelov, D.⁴

6
- 84955316677
- arXiv preprint arXiv:1504.08083
- R. Girshick. Fast r-cnn. arXiv preprint arXiv:1504.08083, 2015.
- (2015) Fast R-cnn
- Girshick, R.¹

7
- 84977644905
- arXiv preprint arXiv:1411.6031
- G. Gkioxari and J. Malik. Finding action tubes. arXiv preprint arXiv:1411.6031, 2014.
- (2014) Finding Action Tubes
- Gkioxari, G.¹ Malik, J.²

8
- 70450202741
- Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
- IEEE
- A. Gupta, P. Srinivasan, J. Shi, and L. S. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pages 2012-2019. IEEE, 2009.
- (2009) Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on , pp. 2012-2019
- Gupta, A.¹ Srinivasan, P.² Shi, J.³ Davis, L.S.⁴

9
- 84911453664
- Action localization with tubelets from motion
- IEEE
- M. Jain, J. Van Gemert, H. Jégou, P. Bouthemy, and C. G. Snoek. Action localization with tubelets from motion. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 740-747. IEEE, 2014.
- (2014) Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on , pp. 740-747
- Jain, M.¹ Van Gemert, J.² Jégou, H.³ Bouthemy, P.⁴ Snoek, C.G.⁵

10
- 84898819791
- Towards understanding action recognition
- IEEE
- H. Jhuang, J. Gall, S. Zuffi, C. Schmid, and M. J. Black. Towards understanding action recognition. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 3192-3199. IEEE, 2013.
- (2013) Computer Vision (ICCV), 2013 IEEE International Conference on , pp. 3192-3199
- Jhuang, H.¹ Gall, J.² Zuffi, S.³ Schmid, C.⁴ Black, M.J.⁵

11
- 84905052261
- Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. http: //crcv.ucf.edu/THUMOS14/, 2014.
- (2014) THUMOS Challenge: Action Recognition with A Large Number of Classes
- Jiang, Y.-G.¹ Liu, J.² Roshan Zamir, A.³ Toderici, G.⁴ Laptev, I.⁵ Shah, M.⁶ Sukthankar, R.⁷

12
- 84911441074
- Efficient feature extraction, encoding, and classification for action recognition
- IEEE
- V. Kantorov and I. Laptev. Efficient feature extraction, encoding, and classification for action recognition. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 2593-2600. IEEE, 2014.
- (2014) Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on , pp. 2593-2600
- Kantorov, V.¹ Laptev, I.²

13
- 84986316707
- S. Karaman, L. Seidenari, and A. Del Bimbo. Fast saliency based pooling of fisher encoded dense trajectories.
- Fast Saliency Based Pooling of Fisher Encoded Dense Trajectories
- Karaman, S.¹ Seidenari, L.² Del Bimbo, A.³

14
- 50649103739
- Event detection in crowded videos
- Y. Ke, R. Sukthankar, and M. Hebert. Event detection in crowded videos. In ICCV, 2007.
- (2007) ICCV
- Ke, Y.¹ Sukthankar, R.² Hebert, M.³

15
- 84887386994
- Multi-agent event detection: Localization and role assignment
- S. Kwak, B. Han, and J. H. Han. Multi-agent event detection: Localization and role assignment. In CVPR, 2013.
- (2013) CVPR
- Kwak, S.¹ Han, B.² Han, J.H.³

16
- 84863083227
- Discriminative figure-centric models for joint action localization and recognition
- IEEE
- T. Lan, Y. Wang, and G. Mori. Discriminative figure-centric models for joint action localization and recognition. In Computer Vision (ICCV), 2011 IEEE International Conference on, pages 2003-2010. IEEE, 2011.
- (2011) Computer Vision (ICCV), 2011 IEEE International Conference on , pp. 2003-2010
- Lan, T.¹ Wang, Y.² Mori, G.³

17
- 51949083365
- Learning realistic human actions from movies
- IEEE
- I. Laptev, M. Marsza?ek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on, pages 1-8. IEEE, 2008.
- (2008) Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on , pp. 1-8
- Laptev, I.¹ Marszaek, M.² Schmid, C.³ Rozenfeld, B.⁴

18
- 84865583235
- Incremental activity modeling in multiple disjoint cameras
- C. C. Loy, T. Xiang, and S. Gong. Incremental activity modeling in multiple disjoint cameras. TPAMI, 34(9):1799-1813, 2012.
- (2012) TPAMI , vol.34 , Issue.9 , pp. 1799-1813
- Loy, C.C.¹ Xiang, T.² Gong, S.³

19
- 84937959846
- Recurrent models of visual attention
- V. Mnih, N. Heess, A. Graves, et al. Recurrent models of visual attention. In Advances in Neural Information Processing Systems, pages 2204-2212, 2014.
- (2014) Advances in Neural Information Processing Systems , pp. 2204-2212
- Mnih, V.¹ Heess, N.² Graves, A.³

20
- 38049003892
- D. Moore and I. Essa. Recognizing multitasked activities from video using stochastic context-free grammar. 2002.
- (2002) Recognizing Multitasked Activities from Video Using Stochastic Context-free Grammar
- Moore, D.¹ Essa, I.²

21
- 84911397627
- Multiple granularity analysis for fine-grained action detection
- IEEE
- B. Ni, V. R. Paramathayalan, and P. Moulin. Multiple granularity analysis for fine-grained action detection. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on, pages 756-763. IEEE, 2014.
- (2014) Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on , pp. 756-763
- Ni, B.¹ Paramathayalan, V.R.² Moulin, P.³

22
- 84973924620
- D. Oneata, J. Verbeek, and C. Schmid. The lear submission at thumos 2014. 2014.
- (2014) The Lear Submission at Thumos 2014
- Oneata, D.¹ Verbeek, J.² Schmid, C.³

23
- 84911384466
- Parsing videos of actions with segmental grammars
- H. Pirsiavash and D. Ramanan. Parsing videos of actions with segmental grammars. In Computer Vision and Pattern Recognition (CVPR), 2014.
- (2014) Computer Vision and Pattern Recognition (CVPR)
- Pirsiavash, H.¹ Ramanan, D.²

24
- 77949275097
- A survey on vision-based human action recognition
- R. Poppe. A survey on vision-based human action recognition. IVC, 28:976-990, 2010.
- (2010) IVC , vol.28 , pp. 976-990
- Poppe, R.¹

25
- 84961917629
- arXiv preprint arXiv:1506.02640
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. arXiv preprint arXiv:1506.02640, 2015.
- (2015) You only Look Once: Unified, Real-time Object Detection
- Redmon, J.¹ Divvala, S.² Girshick, R.³ Farhadi, A.⁴

26
- 84955283951
- arXiv preprint arXiv:1506.01497
- S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. arXiv preprint arXiv:1506.01497, 2015.
- (2015) Faster R-cnn: Towards Real-time Object Detection with Region Proposal Networks
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

27
- 84866710901
- A database for fine grained activity detection of cooking activities
- IEEE
- M. Rohrbach, S. Amin, M. Andriluka, and B. Schiele. A database for fine grained activity detection of cooking activities. In Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pages 1194-1201. IEEE, 2012.
- (2012) Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on , pp. 1194-1201
- Rohrbach, M.¹ Amin, S.² Andriluka, M.³ Schiele, B.⁴

28
- 84945944033
- Imagenet large scale visual recognition challenge
- O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, et al. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, pages 1-42, 2014.
- (2014) International Journal of Computer Vision , pp. 1-42
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰

29
- 84906347546
- arXiv preprint arXiv:1312.6229
- P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229, 2013.
- (2013) Overfeat: Integrated Recognition, Localization and Detection Using Convolutional Networks
- Sermanet, P.¹ Eigen, D.² Zhang, X.³ Mathieu, M.⁴ Fergus, R.⁵ LeCun, Y.⁶

30
- 84973884051
- arXiv preprint arXiv:1412.7054
- P. Sermanet, A. Frome, and E. Real. Attention for finegrained categorization. arXiv preprint arXiv:1412.7054, 2014.
- (2014) Attention for Finegrained Categorization
- Sermanet, P.¹ Frome, A.² Real, E.³

31
- 33845574026
- Learning temporal sequence model from partially labeled data
- IEEE
- Y. Shi, A. Bobick, and I. Essa. Learning temporal sequence model from partially labeled data. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1631-1638. IEEE, 2006.
- (2006) Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on , vol.2 , pp. 1631-1638
- Shi, Y.¹ Bobick, A.² Essa, I.³

32
- 84959200790
- Joint inference of groups, events and human roles in aerial videos
- T. Shu, D. Xie, B. Rothrock, S. Todorovic, and S.-C. Zhu. Joint inference of groups, events and human roles in aerial videos. In CVPR, 2015.
- (2015) CVPR
- Shu, T.¹ Xie, D.² Rothrock, B.³ Todorovic, S.⁴ Zhu, S.-C.⁵

33
- 84925410541
- arXiv preprint arXiv:1409.1556
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

34
- 84986254317
- arXiv preprint arXiv:1504.00983
- C. Sun, S. Shetty, R. Sukthankar, and R. Nevatia. Temporal localization of fine-grained actions in videos by domain transfer from web images. arXiv preprint arXiv:1504.00983, 2015.
- (2015) Temporal Localization of Fine-grained Actions in Videos by Domain Transfer from Web Images
- Sun, C.¹ Shetty, S.² Sukthankar, R.³ Nevatia, R.⁴

35
- 84962336509
- arXiv preprint arXiv:1412.1441
- C. Szegedy, S. Reed, D. Erhan, and D. Anguelov. Scalable, high-quality object detection. arXiv preprint arXiv:1412.1441, 2014.
- (2014) Scalable, High-quality Object Detection
- Szegedy, C.¹ Reed, S.² Erhan, D.³ Anguelov, D.⁴

36
- 84887356306
- Spatiotemporal deformable part models for action detection
- IEEE
- Y. Tian, R. Sukthankar, and M. Shah. Spatiotemporal deformable part models for action detection. In Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on, pages 2642-2649. IEEE, 2013.
- (2013) Computer Vision and Pattern Recognition (CVPR), 2013 IEEE Conference on , pp. 2642-2649
- Tian, Y.¹ Sukthankar, R.² Shah, M.³

37
- 84943546021
- T. Tieleman and G. E. Hinton. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude., 2012.
- (2012) Lecture 6.5-rmsprop: Divide the Gradient by A Running Average of Its Recent Magnitude
- Tieleman, T.¹ Hinton, G.E.²

38
- 84898805910
- Action recognition with improved trajectories
- IEEE
- H. Wang and C. Schmid. Action recognition with improved trajectories. In Computer Vision (ICCV), 2013 IEEE International Conference on, pages 3551-3558. IEEE, 2013.
- (2013) Computer Vision (ICCV), 2013 IEEE International Conference on , pp. 3551-3558
- Wang, H.¹ Schmid, C.²

39
- 84986274451
- L. Wang, Y. Qiao, and X. Tang. Action recognition and detection by combining motion and appearance features.
- Action Recognition and Detection by Combining Motion and Appearance Features
- Wang, L.¹ Qiao, Y.² Tang, X.³

40
- 78751648503
- A survey of visionbased methods for action representation, segmentation and recognition
- D. Weinland, R. Ronfard, and E. Boyer. A survey of visionbased methods for action representation, segmentation and recognition. In Computer Vision and Image Understanding, Vol. 115, Issues 2, pp. 224,241, 2010.
- (2010) Computer Vision and Image Understanding , vol.115 , Issue.2 , pp. 224-241
- Weinland, D.¹ Ronfard, R.² Boyer, E.³

41
- 84978774824
- arXiv preprint arXiv:1506.01929
- P. Weinzaepfel, Z. Harchaoui, and C. Schmid. Learning to track for spatio-temporal action localization. arXiv preprint arXiv:1506.01929, 2015.
- (2015) Learning to Track for Spatio-temporal Action Localization
- Weinzaepfel, P.¹ Harchaoui, Z.² Schmid, C.³

42
- 0000337576
- Simple statistical gradient-following algorithms for connectionist reinforcement learning
- R. J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine learning, 8(3-4):229-256, 1992.
- (1992) Machine Learning , vol.8 , Issue.3-4 , pp. 229-256
- Williams, R.J.¹

43
- 85009857480
- arXiv preprint arXiv:1502.03044
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. arXiv preprint arXiv:1502.03044, 2015.
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

44
- 77955995201
- A hough transform-based voting framework for action recognition
- IEEE
- A. Yao, J. Gall, and L. Van Gool. A hough transform-based voting framework for action recognition. In Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pages 2061-2068. IEEE, 2010.
- (2010) Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , pp. 2061-2068
- Yao, A.¹ Gall, J.² Van Gool, L.³

45
- 84959191147
- Fast action proposals for human action detection and search
- G. Yu and J. Yuan. Fast action proposals for human action detection and search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1302-1311, 2015.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 1302-1311
- Yu, G.¹ Yuan, J.²

46
- 84986274231
- J. Yuan, Y. Pei, B. Ni, P. Moulin, and A. Kassim. Adsc submission at thumos challenge 2015.
- (2015) Adsc Submission at Thumos Challenge
- Yuan, J.¹ Pei, Y.² Ni, B.³ Moulin, P.⁴ Kassim, A.⁵

47
- 84943750581
- arXiv preprint arXiv:1505.00521
- W. Zaremba and I. Sutskever. Reinforcement learning neural turing machines. arXiv preprint arXiv:1505.00521, 2015.
- (2015) Reinforcement Learning Neural Turing Machines
- Zaremba, W.¹ Sutskever, I.²

48
- 84973898486
- arXiv preprint arXiv:1503.04144
- S. Zha, F. Luisier, W. Andrews, N. Srivastava, and R. Salakhutdinov. Exploiting image-trained cnn architectures for unconstrained video classification. arXiv preprint arXiv:1503.04144, 2015.
- (2015) Exploiting Image-trained Cnn Architectures for Unconstrained Video Classification
- Zha, S.¹ Luisier, F.² Andrews, W.³ Srivastava, N.⁴ Salakhutdinov, R.⁵

49
- 5044228350
- Detecting unusual activity in video
- IEEE
- H. Zhong, J. Shi, and M. Visontai. Detecting unusual activity in video. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 2, pages II-819. IEEE, 2004.
- (2004) Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on , vol.2 , pp. 11-819
- Zhong, H.¹ Shi, J.² Visontai, M.³

50
- 84959314189
- arXiv preprint arXiv:1506.06724
- Y. Zhu, R. Kiros, R. Zemel, R. Salakhutdinov, R. Urtasun, A. Torralba, and S. Fidler. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. arXiv preprint arXiv:1506.06724, 2015.
- (2015) Aligning Books and Movies: Towards Story-like Visual Explanations by Watching Movies and Reading Books
- Zhu, Y.¹ Kiros, R.² Zemel, R.³ Salakhutdinov, R.⁴ Urtasun, R.⁵ Torralba, A.⁶ Fidler, S.⁷

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.