SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2015 International Conference on Computer Vision, ICCV 2015, Issue , 2015, Pages 4462-4470

Weakly-supervised alignment of video with text

(7) Bojanowski, P a Lajugie, R a Grave, E b Bach, F a Laptev, I a Ponce, J c Schmid, C a

a INRIA (France)

b Och Spine at New York Presbyterian Hospitals (United States)

c ECOLE NORMALE SUPÉRIEURE (France)

Author keywords

[No Author keywords available]

Indexed keywords

COMBINATORIAL OPTIMIZATION; INTEGER PROGRAMMING; QUADRATIC PROGRAMMING; RELAXATION PROCESSES; VISUAL LANGUAGES;

ASSIGNMENT PROBLEMS; CONDITIONAL GRADIENT; INTEGER SOLUTIONS; MANUAL ANNOTATION; NATURAL LANGUAGES; QUADRATIC PROGRAMS; ROUNDING PROCEDURES; TEXTUAL DESCRIPTION;

COMPUTER VISION;

EID: 84973883674 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2015.507 Document Type: Conference Paper

Times cited : (157)

References (47)

1
- 84898797429
- Monte carlo tree search for scheduling activity recognition
- M. R. Amer, S. Todorovic, A. Fern, and S.-C. Zhu. Monte carlo tree search for scheduling activity recognition. In ICCV, 2013.
- (2013) ICCV
- Amer, M.R.¹ Todorovic, S.² Fern, A.³ Zhu, S.-C.⁴

2
- 84900675076
- Diffrac: A discriminative and flexible framework for clustering
- F. Bach and Z. Harchaoui. Diffrac: A discriminative and flexible framework for clustering. In NIPS, 2007.
- (2007) NIPS
- Bach, F.¹ Harchaoui, Z.²

3
- 0041876117
- Matching words and pictures
- K. Barnard, P. Duygulu, D. A. Forsyth, N. de Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003.
- (2003) JMLR
- Barnard, K.¹ Duygulu, P.² Forsyth, D.A.³ De Freitas, N.⁴ Blei, D.M.⁵ Jordan, M.I.⁶

4
- 70350686154
- The wacky wide web: A collection of very large linguistically processed web-crawled corpora
- M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: A collection of very large linguistically processed web-crawled corpora. Language Ressources and Evaluation, 2009.
- (2009) Language Ressources and Evaluation
- Baroni, M.¹ Bernardini, S.² Ferraresi, A.³ Zanchetta, E.⁴

5
- 0003713964
- Athena Scientific
- D. Bertsekas. Nonlinear Programming. Athena Scientific, 1999.
- (1999) Nonlinear Programming
- Bertsekas, D.¹

6
- 84898792367
- Finding actors and actions in movies
- P. Bojanowski, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Finding actors and actions in movies. In ICCV, 2013.
- (2013) ICCV
- Bojanowski, P.¹ Bach, F.² Laptev, I.³ Ponce, J.⁴ Schmid, C.⁵ Sivic, J.⁶

7
- 84943800045
- Weakly supervised action labeling in videos under ordering constraints
- P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic. Weakly supervised action labeling in videos under ordering constraints. In ECCV, 2014.
- (2014) ECCV
- Bojanowski, P.¹ Lajugie, R.² Bach, F.³ Laptev, I.⁴ Ponce, J.⁵ Schmid, C.⁶ Sivic, J.⁷

8
- 84944046597
- arXiv preprint arXiv:1411. 4389
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recur-rent convolutional networks for visual recognition and description. ArXiv preprint arXiv:1411. 4389, 2014.
- (2014) Long-term Recur-rent Convolutional Networks for Visual Recognition and Description
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

9
- 0038401728
- Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary
- P. Duygulu, K. Barnard, J. F. G. d. Freitas, and D. A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, 2002.
- (2002) ECCV
- Duygulu, P.¹ Barnard, K.² Freitas, J.F.G.D.³ Forsyth, D.A.⁴

10
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, S. M. M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. A. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
- (2010) ECCV
- Farhadi, A.¹ Hejrati, S.M.M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.A.⁷

11
- 0001971618
- An algorithm for quadratic programming
- M. Frank and P. Wolfe. An algorithm for quadratic programming. Naval Research Logistics Quarterly, 1956.
- (1956) Naval Research Logistics Quarterly
- Frank, M.¹ Wolfe, P.²

12
- 84898958665
- Devise: A deep visual-semantic embedding model
- A. FRome, G. S. Corrado, J. Shlens, S. Bengio, J. Dean, M. Ranzato, and T. Mikolov. Devise: A deep visual-semantic embedding model. In NIPS, 2013.
- (2013) NIPS
- Frome, A.¹ Corrado, G.S.² Shlens, J.³ Bengio, S.⁴ Dean, J.⁵ Ranzato, M.⁶ Mikolov, T.⁷

13
- 80052915321
- Actom sequence models for efficient action detection
- A. Gaidon, Z. Harchaoui, and C. Schmid. Actom sequence models for efficient action detection. In CVPR, 2011.
- (2011) CVPR
- Gaidon, A.¹ Harchaoui, Z.² Schmid, C.³

14
- 84894905366
- A multi-view embedding space for modeling internet images, tags, and their semantics
- Y. Gong, Q. Ke, M. Isard, and S. Lazebnik. A multi-view embedding space for modeling internet images, tags, and their semantics. IJCV, 2014.
- (2014) IJCV
- Gong, Y.¹ Ke, Q.² Isard, M.³ Lazebnik, S.⁴

15
- 84959394156
- A markovian approach to distributional semantics with application to semantic compositionality
- E. Grave, G. Obozinski, and F. Bach. A markovian approach to distributional semantics with application to semantic compositionality. In COLING, 2014.
- (2014) COLING
- Grave, E.¹ Obozinski, G.² Bach, F.³

16
- 84898930423
- Convex relaxations of latent variable training
- Y. Guo and D. Schuurmans. Convex relaxations of latent variable training. In NIPS, 2007.
- (2007) NIPS
- Guo, Y.¹ Schuurmans, D.²

17
- 10044285992
- Canonical correlation analysis: An overview with application to learning methods
- D. Hardoon, S. Szedmak, and J. Shawe-Taylor. Canonical correlation analysis: An overview with application to learning methods. Neural computation, 16(12):2639-2664, 2004.
- (2004) Neural Computation , vol.16 , Issue.12 , pp. 2639-2664
- Hardoon, D.¹ Szedmak, S.² Shawe-Taylor, J.³

18
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, pages 853-899, 2013.
- (2013) JAIR , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

19
- 0000107975
- Relations between two sets of variates
- H. Hotelling. Relations between two sets of variates. Biometrika, 3:321-377, 1936.
- (1936) Biometrika , vol.3 , pp. 321-377
- Hotelling, H.¹

20
- 77955990943
- Discriminative clustering for image co-segmentation
- A. Joulin, F. Bach, and J. Ponce. Discriminative clustering for image co-segmentation. In CVPR, 2010.
- (2010) CVPR
- Joulin, A.¹ Bach, F.² Ponce, J.³

21
- 84866640434
- Multi-class cosegmentation
- A. Joulin, F. Bach, and J. Ponce. Multi-class cosegmentation. In CVPR, 2012.
- (2012) CVPR
- Joulin, A.¹ Bach, F.² Ponce, J.³

22
- 84943738421
- Efficient image and video co-localization with frank-wolfe algorithm
- A. Joulin, K. Tang, and L. Fei-Fei. Efficient image and video co-localization with frank-wolfe algorithm. In ECCV, 2014.
- (2014) ECCV
- Joulin, A.¹ Tang, K.² Fei-Fei, L.³

23
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- A. Karpathy, A. Joulin, and F. F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping. In NIPS, 2014.
- (2014) NIPS
- Karpathy, A.¹ Joulin, A.² Li, F.F.F.³

24
- 84915757230
- Combining perframe and per-track cues for multi-person action recognition
- S. Khamis, V. I. Morariu, and L. S. Davis. Combining perframe and per-track cues for multi-person action recognition. In ECCV, 2012.
- (2012) ECCV
- Khamis, S.¹ Morariu, V.I.² Davis, L.S.³

25
- 80052882471
- Scenario-based video event recognition by constraint flow
- S. Kwak, B. Han, and J. H. Han. Scenario-based video event recognition by constraint flow. In CVPR, 2011.
- (2011) CVPR
- Kwak, S.¹ Han, B.² Han, J.H.³

26
- 51949083365
- Learning realistic human actions from movies
- I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. Learning realistic human actions from movies. In CVPR, 2008.
- (2008) CVPR
- Laptev, I.¹ Marszalek, M.² Schmid, C.³ Rozenfeld, B.⁴

27
- 34948883502
- Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video
- B. Laxton, J. Lim, and D. J. Kriegman. Leveraging temporal, contextual and ordering constraints for recognizing complex activities in video. In CVPR, 2007.
- (2007) CVPR
- Laxton, B.¹ Lim, J.² Kriegman, D.J.³

28
- 84970028761
- arXiv preprint arXiv:1502. 03671
- R. Lebret, P. O. Pinheiro, and R. Collobert. Phrase-based image captioning. ArXiv preprint arXiv:1502. 03671, 2015.
- (2015) Phrase-based Image Captioning
- Lebret, R.¹ Pinheiro, P.O.² Collobert, R.³

29
- 84959916685
- What's cookin'? Interpreting cooking videos using text, speech and vision
- J. Malmaud, J. Huang, V. Rathod, N. Johnston, A. Rabinovich, and K. Murphy. What's cookin'? interpreting cooking videos using text, speech and vision. NAACL, 2015.
- (2015) NAACL
- Malmaud, J.¹ Huang, J.² Rathod, V.³ Johnston, N.⁴ Rabinovich, A.⁵ Murphy, K.⁶

30
- 85117622017
- The stanford corenlp natural language processing toolkit
- C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, and D. McClosky. The Stanford CoreNLP natural language processing toolkit. In ACL (Demo.), 2014.
- (2014) ACL (Demo.)
- Manning, C.D.¹ Surdeanu, M.² Bauer, J.³ Finkel, J.⁴ Bethard, S.J.⁵ McClosky, D.⁶

31
- 70450177757
- Actions in context
- M. Marszalek, I. Laptev, and C. Schmid. Actions in context. In CVPR, 2009.
- (2009) CVPR
- Marszalek, M.¹ Laptev, I.² Schmid, C.³

32
- 84898956512
- Distributed representations of words and phrases and their compositionality
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In NIPS, 2013.
- (2013) NIPS
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

33
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

34
- 84964997141
- Prentice Hall
- L. R. Rabiner and B.-H. Juang. Fundamentals of Speech Recognition. Prentice Hall, 1993.
- (1993) Fundamentals of Speech Recognition
- Rabiner, L.R.¹ Juang, B.-H.²

35
- 84884994717
- Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping
- T. Rakthanmanon, B. Campana, A. Mueen, G. Batista, B. Westover, Q. Zhu, J. Zakaria, and E. Keogh. Addressing big data time series: Mining trillions of time series subsequences under dynamic time warping. ACM Trans. on Knowledge Discovery from Data (TKDD), 7(3):10, 2013.
- (2013) ACM Trans. on Knowledge Discovery from Data (TKDD) , vol.7 , Issue.3 , pp. 10
- Rakthanmanon, T.¹ Campana, B.² Mueen, A.³ Batista, G.⁴ Westover, B.⁵ Zhu, Q.⁶ Zakaria, J.⁷ Keogh, E.⁸

36
- 84943782750
- Linking people with "their" names using coreference resolution
- V. Ramanathan, A. Joulin, P. Liang, and L. Fei-Fei. Linking people with "their" names using coreference resolution. In ECCV, 2014.
- (2014) ECCV
- Ramanathan, V.¹ Joulin, A.² Liang, P.³ Fei-Fei, L.⁴

37
- 84898785648
- Grounding action descriptions in videos
- M. Regneri, M. Rohrbach, D. Wetzel, S. Thater, B. Schiele, and M. Pinkal. Grounding action descriptions in videos. TACL, 1:25-36, 2013.
- (2013) TACL , vol.1 , pp. 25-36
- Regneri, M.¹ Rohrbach, M.² Wetzel, D.³ Thater, S.⁴ Schiele, B.⁵ Pinkal, M.⁶

38
- 84898775239
- Translating video content to natural language descriptions
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In ICCV, 2013.
- (2013) ICCV
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

39
- 33845588233
- Recognition of composite human activities through context-free grammar based representation
- M. S. Ryoo and J. K. Aggarwal. Recognition of composite human activities through context-free grammar based representation. In CVPR, 2006.
- (2006) CVPR
- Ryoo, M.S.¹ Aggarwal, J.K.²

40
- 0017930815
- Dynamic programming algorithm optimization for spoken word recognition
- H. Sakoe and S. Chiba. Dynamic programming algorithm optimization for spoken word recognition. Acoustics, Speech and Signal Processing, 1978.
- (1978) Acoustics, Speech and Signal Processing
- Sakoe, H.¹ Chiba, S.²

41
- 80052901415
- Modeling the temporal extent of actions
- S. Satkin and M. Hebert. Modeling the temporal extent of actions. In ECCV, 2010.
- (2010) ECCV
- Satkin, S.¹ Hebert, M.²

42
- 77955998009
- Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
- R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010.
- (2010) CVPR
- Socher, R.¹ Fei-Fei, L.²

43
- 84964474107
- Grounded compositional semantics for finding and describing images with sentences
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. TACL, 2014.
- (2014) TACL
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

44
- 84866659479
- Knock! knock! who is it?" probabilistic person identification in tv-series
- M. Tapaswi, M. Bäuml, and R. Stiefelhagen. "knock! knock! who is it?" probabilistic person identification in tv-series. In CVPR, 2012.
- (2012) CVPR
- Tapaswi, M.¹ Bäuml, M.² Stiefelhagen, R.³

45
- 84959255361
- Book2movie: Aligning video scenes with book chapters
- M. Tapaswi, M. Bäuml, and R. Stiefelhagen. Book2movie: Aligning video scenes with book chapters. In CVPR, 2015.
- (2015) CVPR
- Tapaswi, M.¹ Bäuml, M.² Stiefelhagen, R.³

46
- 84898805910
- Action recognition with improved trajectories
- H. Wang and C. Schmid. Action recognition with improved trajectories. In ICCV, 2013.
- (2013) ICCV
- Wang, H.¹ Schmid, C.²

47
- 78149328370
- Canonical time warping for alignment of human behavior
- F. Zhou and F. De La Torre. Canonical time warping for alignment of human behavior. NIPS, 2009.
- (2009) NIPS
- Zhou, F.¹ De La Torre, F.²

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.