-
3
-
-
84885996388
-
-
arXiv preprint arXiv:1204. 2742
-
A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, et al. Video in sentences out. ArXiv preprint arXiv:1204. 2742, 2012.
-
(2012)
Video in Sentences Out
-
-
Barbu, A.1
Bridge, A.2
Burchill, Z.3
Coroian, D.4
Dickinson, S.5
Fidler, S.6
Michaux, A.7
Mussman, S.8
Narayanaswamy, S.9
Salvi, D.10
-
4
-
-
0041876117
-
Matching words and pictures
-
K. Barnard, P. Duygulu, D. Forsyth, N. De Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 3.
-
JMLR
, vol.3
-
-
Barnard, K.1
Duygulu, P.2
Forsyth, D.3
De Freitas, N.4
Blei, D.M.5
Jordan, M.I.6
-
5
-
-
84856318104
-
Robotic roommates making pancakes
-
M. Beetz, U. Klank, I. Kresse, A. Maldonado, L. Mosenlechner, D. Pangercic, T. Ruhr, and M. Tenorth. Robotic roommates making pancakes. In Humanoids, 2011.
-
(2011)
Humanoids
-
-
Beetz, M.1
Klank, U.2
Kresse, I.3
Maldonado, A.4
Mosenlechner, L.5
Pangercic, D.6
Ruhr, T.7
Tenorth, M.8
-
6
-
-
84943800045
-
Weakly supervised action labeling in videos under ordering constraints
-
P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Weakly supervised action labeling in videos under ordering constraints. In ECCV, 2014.
-
(2014)
ECCV
-
-
Bojanowski, P.1
Lajugie, R.2
Bach, F.3
Laptev, I.4
Ponce, J.5
Schmid, C.6
Sivic, J.7
-
8
-
-
77956008665
-
Constrained parametric min-cuts for automatic object segmentation
-
J. Carreira and C. Sminchisescu. Constrained parametric min-cuts for automatic object segmentation. In CVPR, 2010.
-
(2010)
CVPR
-
-
Carreira, J.1
Sminchisescu, C.2
-
9
-
-
84887345951
-
A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
-
P. Das, C. Xu, R. F. Doell, and J. J. Corso. A thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR, 2013.
-
(2013)
CVPR
-
-
Das, P.1
Xu, C.2
Doell, R.F.3
Corso, J.J.4
-
10
-
-
85081863350
-
Automatic annotation of human actions in video
-
O. Duchenne, I. Laptev, J. Sivic, F. Bash, and J. Ponce. Automatic annotation of human actions in video. In ICCV, 2009.
-
(2009)
ICCV
-
-
Duchenne, O.1
Laptev, I.2
Sivic, J.3
Bash, F.4
Ponce, J.5
-
12
-
-
80052017343
-
Every picture tells a story: Generating sentences from images
-
A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV 2010. 2010.
-
(2010)
ECCV 2010.
-
-
Farhadi, A.1
Hejrati, M.2
Sadeghi, M.A.3
Young, P.4
Rashtchian, C.5
Hockenmaier, J.6
Forsyth, D.7
-
14
-
-
84908309028
-
Joint modeling of multiple related time series via the beta process with application to motion capture segmentation
-
E. Fox, M. Hughes, E. Sudderth, and M. Jordan. Joint modeling of multiple related time series via the beta process with application to motion capture segmentation. Annals of Applied Statistics, 8(3):1281-1313, 2014.
-
(2014)
Annals of Applied Statistics
, vol.8
, Issue.3
, pp. 1281-1313
-
-
Fox, E.1
Hughes, M.2
Sudderth, E.3
Jordan, M.4
-
15
-
-
70349653329
-
Generating photo manipulation tutorials by demonstration
-
F. Grabler, M. Agrawala, W. Li, M. Dontcheva, and T. Igarashi. Generating photo manipulation tutorials by demonstration. TOG, 28(3):66, 2009.
-
(2009)
TOG
, vol.28
, Issue.3
, pp. 66
-
-
Grabler, F.1
Agrawala, M.2
Li, W.3
Dontcheva, M.4
Igarashi, T.5
-
17
-
-
70450202741
-
Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
-
A. Gupta, P. Srinivasan, J. Shi, and L. S. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In CVPR, 2009.
-
(2009)
CVPR
-
-
Gupta, A.1
Srinivasan, P.2
Shi, J.3
Davis, L.S.4
-
18
-
-
80052873938
-
Joint segmentation and classification of human actions in video
-
M. Hoai, Z.-Z. Lan, and F. De la Torre. Joint segmentation and classification of human actions in video. In CVPR, 2011.
-
(2011)
CVPR
-
-
Hoai, M.1
Lan, Z.-Z.2
De La Torre, F.3
-
19
-
-
84887398298
-
Better exploiting motion for better action recognition
-
M. Jain, H. Jegou, and P. Bouthemy. Better exploiting motion for better action recognition. In CVPR, 2013.
-
(2013)
CVPR
-
-
Jain, M.1
Jegou, H.2
Bouthemy, P.3
-
21
-
-
84905052261
-
-
Y.-G. Jiang, J. Liu, A. Roshan Zamir, G. Toderici, I. Laptev, M. Shah, and R. Sukthankar. THUMOS challenge: Action recognition with a large number of classes. 2014.
-
(2014)
THUMOS Challenge: Action Recognition with A Large Number of Classes.
-
-
Jiang, Y.-G.1
Liu, J.2
Roshan Zamir, A.3
Toderici, G.4
Laptev, I.5
Shah, M.6
Sukthankar, R.7
-
25
-
-
84911405209
-
Joint summarization of large-scale collections of web images and videos for storyline reconstruction
-
G. Kim, L. Sigal, and E. P. Xing. Joint summarization of large-scale collections of web images and videos for storyline reconstruction. In CVPR, 2014.
-
(2014)
CVPR
-
-
Kim, G.1
Sigal, L.2
Xing, E.P.3
-
26
-
-
84911385330
-
Reconstructing storyline graphs for image recommendation from web community photos
-
G. Kim and E. P. Xing. Reconstructing storyline graphs for image recommendation from web community photos. In CVPR, 2014.
-
(2014)
CVPR
-
-
Kim, G.1
Xing, E.P.2
-
28
-
-
84911370987
-
What are you talking about? Text-to-image coreference
-
C. Kong, D. Lin, M. Bansal, R. Urtasun, and S. Fidler. What are you talking about? text-to-image coreference. In CVPR, 2014.
-
(2014)
CVPR
-
-
Kong, C.1
Lin, D.2
Bansal, M.3
Urtasun, R.4
Fidler, S.5
-
29
-
-
84876231242
-
Imagenet classification with deep convolutional neural networks
-
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
-
(2012)
NIPS
-
-
Krizhevsky, A.1
Sutskever, I.2
Hinton, G.E.3
-
30
-
-
84856682691
-
A large video database for human motion recognition
-
H. Kuehne, H. Jhuang, E. Garrote, T. Poggio, and T. Serre. A large video database for human motion recognition. In ICCV, 2011.
-
(2011)
ICCV
-
-
Kuehne, H.1
Jhuang, H.2
Garrote, E.3
Poggio, T.4
Serre, T.5
-
31
-
-
84973919511
-
Learning action primitives for multi-level video event understanding
-
T. Lan, L. Chen, Z. Deng, G.-T. Zhou, and G. Mori. Learning action primitives for multi-level video event understanding. In Workshop on Visual Surveillance and Re-Identification, 2014.
-
(2014)
Workshop on Visual Surveillance and Re-Identification
-
-
Lan, T.1
Chen, L.2
Deng, Z.3
Zhou, G.-T.4
Mori, G.5
-
32
-
-
84947609310
-
A hierarchical representation for future action prediction
-
T. Lan, T.-C. Chen, and S. Savarese. A hierarchical representation for future action prediction. In ECCV, 2014.
-
(2014)
ECCV
-
-
Lan, T.1
Chen, T.-C.2
Savarese, S.3
-
34
-
-
84973914602
-
Retrieving actions in movies
-
I. Laptev and P. Pérez. Retrieving actions in movies. In ICCV, 07.
-
ICCV
, vol.7
-
-
Laptev, I.1
Pérez, P.2
-
35
-
-
84866723224
-
Discovering important people and objects for egocentric video summarization
-
Y. J. Lee, J. Ghosh, and K. Grauman. Discovering important people and objects for egocentric video summarization. In CVPR, 2012.
-
(2012)
CVPR
-
-
Lee, Y.J.1
Ghosh, J.2
Grauman, K.3
-
36
-
-
84863045576
-
Key-segments for video object segmentation
-
Y. J. Lee, J. Kim, and K. Grauman. Key-segments for video object segmentation. In ICCV, 2011.
-
(2011)
ICCV
-
-
Lee, Y.J.1
Kim, J.2
Grauman, K.3
-
37
-
-
24044470614
-
Clustering of time series dataa survey
-
T. W. Liao. Clustering of time series dataa survey. Pattern recognition, 38(11):1857-1874, 2005.
-
(2005)
Pattern Recognition
, vol.38
, Issue.11
, pp. 1857-1874
-
-
Liao, T.W.1
-
38
-
-
84887342438
-
Story-driven summarization for egocentric video
-
Z. Lu and K. Grauman. Story-driven summarization for egocentric video. In CVPR, 2013.
-
(2013)
CVPR
-
-
Lu, Z.1
Grauman, K.2
-
39
-
-
84994119551
-
-
ArXiv e-prints, Mar.
-
J. Malmaud, J. Huang, V. Rathod, N. Johnston, A. Rabinovich, and K. Murphy. What's Cookin'? Interpreting Cooking Videos using Text, Speech and Vision. ArXiv e-prints, Mar. 2015.
-
(2015)
What's Cookin'? Interpreting Cooking Videos Using Text, Speech and Vision
-
-
Malmaud, J.1
Huang, J.2
Rathod, V.3
Johnston, N.4
Rabinovich, A.5
Murphy, K.6
-
41
-
-
84959182849
-
Improving video activity recognition using object recognition and text mining
-
T. S. Motwani and R. J. Mooney. Improving video activity recognition using object recognition and text mining. In ECAI, 2012.
-
(2012)
ECAI
-
-
Motwani, T.S.1
Mooney, R.J.2
-
42
-
-
80052874353
-
Modeling temporal structure of decomposable motion segments for activity classification
-
J. C. Niebles, C.-W. Chen, and L. Fei-Fei. Modeling temporal structure of decomposable motion segments for activity classification. In ECCV, 2010.
-
(2010)
ECCV
-
-
Niebles, J.C.1
Chen, C.-W.2
Fei-Fei, L.3
-
43
-
-
84973873629
-
Single-cluster spectral graph partitioning for robotics applications
-
E. Olson, M. Walter, S. J. Teller, and J. J. Leonard. Single-cluster spectral graph partitioning for robotics applications. In RSS, 05.
-
RSS
, vol.5
-
-
Olson, E.1
Walter, M.2
Teller, S.J.3
Leonard, J.J.4
-
45
-
-
85162522202
-
Im2text: Describing images using 1 million captioned photographs
-
V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
-
(2011)
NIPS
-
-
Ordonez, V.1
Kulkarni, G.2
Berg, T.L.3
-
46
-
-
0000851002
-
A factorization approach to grouping
-
P. Perona and W. Freeman. A factorization approach to grouping. In ECCV. 1998.
-
(1998)
ECCV
-
-
Perona, P.1
Freeman, W.2
-
47
-
-
84911384466
-
Parsing videos of actions with segmental grammars
-
H. Pirsiavash and D. Ramanan. Parsing videos of actions with segmental grammars. In CVPR, 2014.
-
(2014)
CVPR
-
-
Pirsiavash, H.1
Ramanan, D.2
-
49
-
-
0024610919
-
A tutorial on hidden markov models and selected applications in speech recognition
-
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In PROCEEDINGS OF THE IEEE, pages 257-286, 1989.
-
(1989)
PROCEEDINGS of the IEEE
, pp. 257-286
-
-
Rabiner, L.R.1
-
50
-
-
85014875102
-
Automatically extracting highlights for tv baseball programs
-
Y. Rui, A. Gupta, and A. Acero. Automatically extracting highlights for tv baseball programs. In ACM MM, 2000.
-
(2000)
ACM MM
-
-
Rui, Y.1
Gupta, A.2
Acero, A.3
-
51
-
-
77953187842
-
Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities
-
M. Ryoo and J. Aggarwal. Spatio-temporal relationship match: Video structure comparison for recognition of complex human activities. In ICCV, 2009.
-
(2009)
ICCV
-
-
Ryoo, M.1
Aggarwal, J.2
-
52
-
-
84938237430
-
-
Tech Report, Aug
-
A. Saxena, A. Jain, O. Sener, A. Jami, D. K. Misra, and H. S. Koppula. Robo brain: Large-scale knowledge engine for robots. Tech Report, Aug 2014.
-
(2014)
Robo Brain: Large-scale Knowledge Engine for Robots
-
-
Saxena, A.1
Jain, A.2
Sener, O.3
Jami, A.4
Misra, D.K.5
Koppula, H.S.6
-
54
-
-
77955998009
-
Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora
-
R. Socher and L. Fei-Fei. Connecting modalities: Semi-supervised segmentation and annotation of images using unaligned text corpora. In CVPR, pages 966-973, 2010.
-
(2010)
CVPR
, pp. 966-973
-
-
Socher, R.1
Fei-Fei, L.2
-
55
-
-
84964474107
-
Grounded compositional semantics for finding and describing images with sentences
-
R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. TACL, 2:207-218, 2014.
-
(2014)
TACL
, vol.2
, pp. 207-218
-
-
Socher, R.1
Karpathy, A.2
Le, Q.V.3
Manning, C.D.4
Ng, A.Y.5
-
56
-
-
84904972001
-
UCF101: A dataset of 101 human actions classes from videos in the wild
-
K. Soomro, A. Roshan Zamir, and M. Shah. UCF101: A dataset of 101 human actions classes from videos in the wild. In CRCV-TR-12-01, 2012.
-
(2012)
CRCV-TR-12-01
-
-
Soomro, K.1
Roshan Zamir, A.2
Shah, M.3
-
57
-
-
84911429593
-
Discover: Discovering important segments for classification of video events and recounting
-
C. Sun and R. Nevatia. Discover: Discovering important segments for classification of video events and recounting. In CVPR, 2014.
-
(2014)
CVPR
-
-
Sun, C.1
Nevatia, R.2
-
58
-
-
77955807439
-
Understanding and executing instructions for everyday manipulation tasks from the world wide web
-
M. Tenorth, D. Nyga, and M. Beetz. Understanding and executing instructions for everyday manipulation tasks from the world wide web. In ICRA, 2010.
-
(2010)
ICRA
-
-
Tenorth, M.1
Nyga, D.2
Beetz, M.3
-
59
-
-
33847733529
-
Video abstraction: A systematic review and classification
-
B. T. Truong and S. Venkatesh. Video abstraction: A systematic review and classification. ACM TOMM, 3(1):3, 2007.
-
(2007)
ACM TOMM
, vol.3
, Issue.1
, pp. 3
-
-
Truong, B.T.1
Venkatesh, S.2
-
60
-
-
77955988492
-
Modeling mutual context of object and human pose in human-object interaction activities
-
B. Yao and L. Fei-Fei. Modeling mutual context of object and human pose in human-object interaction activities. In CVPR, 2010.
-
(2010)
CVPR
-
-
Yao, B.1
Fei-Fei, L.2
-
61
-
-
84897743886
-
Grounded language learning from video described with sentences
-
H. Yu and J. M. Siskind. Grounded language learning from video described with sentences. In ACL, 2013.
-
(2013)
ACL
-
-
Yu, H.1
Siskind, J.M.2
-
62
-
-
84887338442
-
Bringing semantics into focus using visual abstraction
-
C. L. Zitnick and D. Parikh. Bringing semantics into focus using visual abstraction. In CVPR, 2013.
-
(2013)
CVPR
-
-
Zitnick, C.L.1
Parikh, D.2
-
63
-
-
84973919484
-
Learning the visual interpretation of sentences
-
C. L. Zitnick, D. Parikh, and L. Vanderwende. Learning the visual interpretation of sentences. In CVPR, 2013.
-
(2013)
CVPR
-
-
Zitnick, C.L.1
Parikh, D.2
Vanderwende, L.3
|