SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2015 International Conference on Computer Vision, ICCV 2015, Issue , 2015, Pages 4480-4488

Unsupervised semantic parsing of video collections

(4) Sener, Ozan a,b Zamir, Amir R a Savarese, Silvio a Saxena, Ashutosh b,c

a STANFORD UNIVERSITY (United States)

b Department of Obstetrics Gynecology (United States)

c Brain of Things Inc (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; VISUAL LANGUAGES;

GENERATIVE MODEL; HUMAN COMMUNICATIONS; SEMANTIC PARSING; STORYLINES; TEXTUAL DESCRIPTION; USER-GENERATED VIDEO; VIDEO COLLECTIONS; VIDEO SEGMENTS;

SEMANTICS;

EID: 84973879618 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2015.509 Document Type: Conference Paper

Times cited : (119)

References (63)

1
- 84973883234
- Suuplementary material for the paper. http://cvgl. stanford. edu/watchandlearn/.
- Suuplementary Material for the Paper

2
- 84892408456
- Wikihow-how to do anything. http://www. wikihow. com.
- Wikihow-how to Do Anything

3
- 84885996388
- arXiv preprint arXiv:1204. 2742
- A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, et al. Video in sentences out. ArXiv preprint arXiv:1204. 2742, 2012.
- (2012) Video in Sentences Out
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰

4
- 0041876117
- Matching words and pictures
- K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 3.
- JMLR , vol.3
- Barnard, K.¹ Duygulu, P.² Forsyth, D.³ De Freitas, N.⁴ Blei, D.M.⁵ Jordan, M.I.⁶

5
- 84856318104
- Robotic roommates making pancakes
- M. Beetz, U. Klank, I. Kresse, A. Maldonado, L. Mosenlechner, D. Pangercic, T. Ruhr, and M. Tenorth. Robotic roommates making pancakes. In Humanoids, 2011.
- (2011) Humanoids
- Beetz, M.¹ Klank, U.² Kresse, I.³ Maldonado, A.⁴ Mosenlechner, L.⁵ Pangercic, D.⁶ Ruhr, T.⁷ Tenorth, M.⁸

6
- 84943800045
- Weakly supervised action labeling in videos under ordering constraints
- P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Weakly supervised action labeling in videos under ordering constraints. In ECCV, 2014.
- (2014) ECCV
- Bojanowski, P.¹ Lajugie, R.² Bach, F.³ Laptev, I.⁴ Ponce, J.⁵ Schmid, C.⁶ Sivic, J.⁷

7
- 84937626832
- Bakebot: Baking cookies with the pr2
- M. Bollini, J. Barry, and D. Rus. Bakebot: Baking cookies with the pr2. In The PR2 Workshop, IROS, 2011.
- (2011) PR2 Workshop, IROS
- Bollini, M.¹ Barry, J.² Rus, D.³

8
- 77956008665
- Constrained parametric min-cuts for automatic object segmentation
- J. Carreira and C. Sminchisescu. Constrained parametric min-cuts for automatic object segmentation. In CVPR, 2010.
- (2010) CVPR
- Carreira, J.¹ Sminchisescu, C.²

9
- 84887345951
- A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- P. Das, C. Xu, R. F. Doell, and J. J. Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR, 2013.
- (2013) CVPR
- Das, P.¹ Xu, C.² Doell, R.F.³ Corso, J.J.⁴

10
- 85081863350
- Automatic annotation of human actions in video
- O. Duchenne, I. Laptev, J. Sivic, F. Bash, and J. Ponce. Automatic annotation of human actions in video. In ICCV, 2009.
- (2009) ICCV
- Duchenne, O.¹ Laptev, I.² Sivic, J.³ Bash, F.⁴ Ponce, J.⁵

11
- 0344983342
- Recognizing action at a distance
- A. A. Efros, A. C. Berg, G. Mori, and J. Malik. Recognizing action at a distance. In ICCV, 2003.
- (2003) ICCV
- Efros, A.A.¹ Berg, A.C.² Mori, G.³ Malik, J.⁴

12
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV 2010. 2010.
- (2010) ECCV 2010.
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

13
- 84973881026
- A sentence is worth a thousand pixels
- S. Fidler, A. Sharma, and R. Urtasun. A sentence is worth a thousand pixels. In CVPR. IEEE, 2013.
- (2013) CVPR. IEEE
- Fidler, S.¹ Sharma, A.² Urtasun, R.³

14
- 84908309028
- Joint modeling of multiple related time series via the beta process with application to motion capture segmentation
- E. Fox, M. Hughes, E. Sudderth, and M. Jordan. Joint modeling of multiple related time series via the beta process with application to motion capture segmentation. Annals of Applied Statistics, 8(3):1281-1313, 2014.
- (2014) Annals of Applied Statistics , vol.8 , Issue.3 , pp. 1281-1313
- Fox, E.¹ Hughes, M.² Sudderth, E.³ Jordan, M.⁴

15
- 70349653329
- Generating photo manipulation tutorials by demonstration
- F. Grabler, M. Agrawala, W. Li, M. Dontcheva, and T. Igarashi. Generating photo manipulation tutorials by demonstration. TOG, 28(3):66, 2009.
- (2009) TOG , vol.28 , Issue.3 , pp. 66
- Grabler, F.¹ Agrawala, M.² Li, W.³ Dontcheva, M.⁴ Igarashi, T.⁵

16
- 33645039209
- T. Griffiths and Z. Ghahramani. Infinite latent feature models and the indian buffet process. 2005.
- (2005) Infinite Latent Feature Models and the Indian Buffet Process.
- Griffiths, T.¹ Ghahramani, Z.²

17
- 70450202741
- Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
- A. Gupta, P. Srinivasan, J. Shi, and L. S. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In CVPR, 2009.
- (2009) CVPR
- Gupta, A.¹ Srinivasan, P.² Shi, J.³ Davis, L.S.⁴

18
- 80052873938
- Joint segmentation and classification of human actions in video
- M. Hoai, Z.-Z. Lan, and F. De la Torre. Joint segmentation and classification of human actions in video. In CVPR, 2011.
- (2011) CVPR
- Hoai, M.¹ Lan, Z.-Z.² De La Torre, F.³

19
- 84887398298
- Better exploiting motion for better action recognition
- M. Jain, H. Jegou, and P. Bouthemy. Better exploiting motion for better action recognition. In CVPR, 2013.
- (2013) CVPR
- Jain, M.¹ Jegou, H.² Bouthemy, P.³

20
- 84986296521
- M. Jain, J. van Gemert, and C. G. Snoek. University of amsterdam at thumos challenge 2014.
- (2014) University of Amsterdam at Thumos Challenge
- Jain, M.¹ Van Gemert, J.² Snoek, C.G.³

21
- 84905052261
- Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. 2014.
- (2014) THUMOS Challenge: Action Recognition with A Large Number of Classes.
- Jiang, Y.-G.¹ Liu, J.² Roshan Zamir, A.³ Toderici, G.⁴ Laptev, I.⁵ Shah, M.⁶ Sukthankar, R.⁷

22
- 84911441074
- Efficient feature extraction, encoding and classification for action recognition
- V. Kantorov and I. Laptev. Efficient feature extraction, encoding and classification for action recognition. In Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014, 2014.
- (2014) Proc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2014
- Kantorov, V.¹ Laptev, I.²

23
- 84942676733
- ArXiv e-prints, Dec.
- A. Karpathy and L. Fei-Fei. Deep Visual-Semantic Alignments for Generating Image Descriptions. ArXiv e-prints, Dec. 2014.
- (2014) Deep Visual-Semantic Alignments for Generating Image Descriptions
- Karpathy, A.¹ Fei-Fei, L.²

24
- 84887392384
- Large-scale video summarization using web-image priors
- A. Khosla, R. Hamid, C.-J. Lin, and N. Sundaresan. Large-scale video summarization using web-image priors. In CVPR, 2013.
- (2013) CVPR
- Khosla, A.¹ Hamid, R.² Lin, C.-J.³ Sundaresan, N.⁴

25
- 84911405209
- Joint summarization of large-scale collections of web images and videos for storyline reconstruction
- G. Kim, L. Sigal, and E. P. Xing. Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In CVPR, 2014.
- (2014) CVPR
- Kim, G.¹ Sigal, L.² Xing, E.P.³

26
- 84911385330
- Reconstructing storyline graphs for image recommendation from web community photos
- G. Kim and E. P. Xing. Reconstructing storyline graphs for image recommendation from web community photos. In CVPR, 2014.
- (2014) CVPR
- Kim, G.¹ Xing, E.P.²

27
- 84919921461
- Multimodal neural language models
- R. Kiros, R. Salakhutdinov, and R. Zemel. Multimodal neural language models. In ICML, 2014.
- (2014) ICML
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

28
- 84911370987
- What are you talking about? Text-to-image coreference
- C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-to-image coreference. In CVPR, 2014.
- (2014) CVPR
- Kong, C.¹ Lin, D.² Bansal, M.³ Urtasun, R.⁴ Fidler, S.⁵

29
- 84876231242
- Imagenet classification with deep convolutional neural networks
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

30
- 84856682691
- A large video database for human motion recognition
- H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. A large video database for human motion recognition. In ICCV, 2011.
- (2011) ICCV
- Kuehne, H.¹ Jhuang, H.² Garrote, E.³ Poggio, T.⁴ Serre, T.⁵

31
- 84973919511
- Learning action primitives for multi-level video event understanding
- T. Lan, L. Chen, Z. Deng, G.-T. Zhou, and G. Mori. Learning action primitives for multi-level video event understanding. In Workshop on Visual Surveillance and Re-Identification, 2014.
- (2014) Workshop on Visual Surveillance and Re-Identification
- Lan, T.¹ Chen, L.² Deng, Z.³ Zhou, G.-T.⁴ Mori, G.⁵

32
- 84947609310
- A hierarchical representation for future action prediction
- T. Lan, T.-C. Chen, and S. Savarese. A hierarchical representation for future action prediction. In ECCV, 2014.
- (2014) ECCV
- Lan, T.¹ Chen, T.-C.² Savarese, S.³

33
- 51949083365
- Learning realistic human actions from movies
- I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008.
- (2008) CVPR
- Laptev, I.¹ Marszalek, M.² Schmid, C.³ Rozenfeld, B.⁴

34
- 84973914602
- Retrieving actions in movies
- I. Laptev and P. Pérez. Retrieving actions in movies. In ICCV, 07.
- ICCV , vol.7
- Laptev, I.¹ Pérez, P.²

35
- 84866723224
- Discovering important people and objects for egocentric video summarization
- Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, 2012.
- (2012) CVPR
- Lee, Y.J.¹ Ghosh, J.² Grauman, K.³

36
- 84863045576
- Key-segments for video object segmentation
- Y. J. Lee, J. Kim, and K. Grauman. Key-segments for video object segmentation. In ICCV, 2011.
- (2011) ICCV
- Lee, Y.J.¹ Kim, J.² Grauman, K.³

37
- 24044470614
- Clustering of time series dataa survey
- T. W. Liao. Clustering of time series dataa survey. Pattern recognition, 38(11):1857-1874, 2005.
- (2005) Pattern Recognition , vol.38 , Issue.11 , pp. 1857-1874
- Liao, T.W.¹

38
- 84887342438
- Story-driven summarization for egocentric video
- Z. Lu and K. Grauman. Story-driven summarization for egocentric video. In CVPR, 2013.
- (2013) CVPR
- Lu, Z.¹ Grauman, K.²

39
- 84994119551
- ArXiv e-prints, Mar.
- J. Malmaud, J. Huang, V. Rathod, N. Johnston, A. Rabinovich, and K. Murphy. What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision. ArXiv e-prints, Mar. 2015.
- (2015) What's Cookin'? Interpreting Cooking Videos Using Text, Speech and Vision
- Malmaud, J.¹ Huang, J.² Rathod, V.³ Johnston, N.⁴ Rabinovich, A.⁵ Murphy, K.⁶

40
- 84959922155
- Cooking with semantics
- J. Malmaud, E. J. Wagner, N. Chang, and K. Murphy. Cooking with semantics. ACL, 2014.
- (2014) ACL
- Malmaud, J.¹ Wagner, E.J.² Chang, N.³ Murphy, K.⁴

41
- 84959182849
- Improving video activity recognition using object recognition and text mining
- T. S. Motwani and R. J. Mooney. Improving video activity recognition using object recognition and text mining. In ECAI, 2012.
- (2012) ECAI
- Motwani, T.S.¹ Mooney, R.J.²

42
- 80052874353
- Modeling temporal structure of decomposable motion segments for activity classification
- J. C. Niebles, C.-W. Chen, and L. Fei-Fei. Modeling temporal structure of decomposable motion segments for activity classification. In ECCV, 2010.
- (2010) ECCV
- Niebles, J.C.¹ Chen, C.-W.² Fei-Fei, L.³

43
- 84973873629
- Single-cluster spectral graph partitioning for robotics applications
- E. Olson, M. Walter, S. J. Teller, and J. J. Leonard. Single-cluster spectral graph partitioning for robotics applications. In RSS, 05.
- RSS , vol.5
- Olson, E.¹ Walter, M.² Teller, S.J.³ Leonard, J.J.⁴

44
- 84973924620
- D. Oneata, J. Verbeek, and C. Schmid. The lear submission at thumos 2014. 2014.
- (2014) The Lear Submission at Thumos 2014
- Oneata, D.¹ Verbeek, J.² Schmid, C.³

45
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

46
- 0000851002
- A factorization approach to grouping
- P. Perona and W. Freeman. A factorization approach to grouping. In ECCV. 1998.
- (1998) ECCV
- Perona, P.¹ Freeman, W.²

47
- 84911384466
- Parsing videos of actions with segmental grammars
- H. Pirsiavash and D. Ramanan. Parsing videos of actions with segmental grammars. In CVPR, 2014.
- (2014) CVPR
- Pirsiavash, H.¹ Ramanan, D.²

48
- 84946510388
- Categoryspecific video summarization
- D. Potapov, M. Douze, Z. Harchaoui, and C. Schmid. Categoryspecific video summarization. In ECCV. 2014.
- (2014) ECCV
- Potapov, D.¹ Douze, M.² Harchaoui, Z.³ Schmid, C.⁴

49
- 0024610919
- A tutorial on hidden markov models and selected applications in speech recognition
- L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In PROCEEDINGS OF THE IEEE, pages 257-286, 1989.
- (1989) PROCEEDINGS of the IEEE , pp. 257-286
- Rabiner, L.R.¹

50
- 85014875102
- Automatically extracting highlights for tv baseball programs
- Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for tv baseball programs. In ACM MM, 2000.
- (2000) ACM MM
- Rui, Y.¹ Gupta, A.² Acero, A.³

51
- 77953187842
- Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities
- M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, 2009.
- (2009) ICCV
- Ryoo, M.¹ Aggarwal, J.²

52
- 84938237430
- Tech Report, Aug
- A. Saxena, A. Jain, O. Sener, A. Jami, D. K. Misra, and H. S. Koppula. Robo brain: Large-scale knowledge engine for robots. Tech Report, Aug 2014.
- (2014) Robo Brain: Large-scale Knowledge Engine for Robots
- Saxena, A.¹ Jain, A.² Sener, O.³ Jami, A.⁴ Misra, D.K.⁵ Koppula, H.S.⁶

53
- 84963788609
- A mathematical theory of communication
- C. E. Shannon. A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review, 5(1):3-55, 2001.
- (2001) ACM SIGMOBILE Mobile Computing and Communications Review , vol.5 , Issue.1 , pp. 3-55
- Shannon, C.E.¹

54
- 77955998009
- Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
- R. Socher and L. Fei-Fei. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, pages 966-973, 2010.
- (2010) CVPR , pp. 966-973
- Socher, R.¹ Fei-Fei, L.²

55
- 84964474107
- Grounded compositional semantics for finding and describing images with sentences
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. TACL, 2:207-218, 2014.
- (2014) TACL , vol.2 , pp. 207-218
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

56
- 84904972001
- UCF101: A dataset of 101 human actions classes from videos in the wild
- K. Soomro, A. Roshan Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01, 2012.
- (2012) CRCV-TR-12-01
- Soomro, K.¹ Roshan Zamir, A.² Shah, M.³

57
- 84911429593
- Discover: Discovering important segments for classification of video events and recounting
- C. Sun and R. Nevatia. Discover: Discovering important segments for classification of video events and recounting. In CVPR, 2014.
- (2014) CVPR
- Sun, C.¹ Nevatia, R.²

58
- 77955807439
- Understanding and executing instructions for everyday manipulation tasks from the world wide web
- M. Tenorth, D. Nyga, and M. Beetz. Understanding and executing instructions for everyday manipulation tasks from the world wide web. In ICRA, 2010.
- (2010) ICRA
- Tenorth, M.¹ Nyga, D.² Beetz, M.³

59
- 33847733529
- Video abstraction: A systematic review and classification
- B. T. Truong and S. Venkatesh. Video abstraction: A systematic review and classification. ACM TOMM, 3(1):3, 2007.
- (2007) ACM TOMM , vol.3 , Issue.1 , pp. 3
- Truong, B.T.¹ Venkatesh, S.²

60
- 77955988492
- Modeling mutual context of object and human pose in human-object interaction activities
- B. Yao and L. Fei-Fei. Modeling mutual context of object and human pose in human-object interaction activities. In CVPR, 2010.
- (2010) CVPR
- Yao, B.¹ Fei-Fei, L.²

61
- 84897743886
- Grounded language learning from video described with sentences
- H. Yu and J. M. Siskind. Grounded language learning from video described with sentences. In ACL, 2013.
- (2013) ACL
- Yu, H.¹ Siskind, J.M.²

62
- 84887338442
- Bringing semantics into focus using visual abstraction
- C. L. Zitnick and D. Parikh. Bringing semantics into focus using visual abstraction. In CVPR, 2013.
- (2013) CVPR
- Zitnick, C.L.¹ Parikh, D.²

63
- 84973919484
- Learning the visual interpretation of sentences
- C. L. Zitnick, D. Parikh, and L. Vanderwende. Learning the visual interpretation of sentences. In CVPR, 2013.
- (2013) CVPR
- Zitnick, C.L.¹ Parikh, D.² Vanderwende, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.