SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn , Issue , 2013, Pages 433-440

Translating video content to natural language descriptions

(6) Rohrbach, Marcus a Qiu, Wei a,b Titov, Ivan c Thater, Stefan b Pinkal, Manfred b Schiele, Bernt a

a MAX PLANCK INSTITUTE FOR INFORMATICS (Germany)

b SAARLAND UNIVERSITY (Germany)

c UNIVERSITY OF AMSTERDAM (Netherlands)

Author keywords

[No Author keywords available]

Indexed keywords

SEMANTICS; VISION;

AUTOMATIC EVALUATION; IMAGE DESCRIPTIONS; MACHINE TRANSLATIONS; NATURAL LANGUAGES; SEMANTIC REPRESENTATION; STATISTICAL MACHINE TRANSLATION; TEXTUAL DESCRIPTION; VISUAL PERCEPTION;

NATURAL LANGUAGE PROCESSING SYSTEMS;

EID: 84898775239 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2013.61 Document Type: Conference Paper

Times cited : (382)

References (27)

1
- 80052886947
- Generating image descriptions using dependency relational patterns
- A. Aker and R. J. Gaizauskas. Generating image descriptions using dependency relational patterns. In ACL, 2010.
- (2011) ACL
- Aker, A.¹ Gaizauskas, R.J.²

2
- 84885996388
- Video in sentences out
- A. Barbu, A. Bridge, Z. Burchill, D. Coroian, S. Dickinson, S. Fidler, A. Michaux, S. Mussman, S. Narayanaswamy, D. Salvi, L. Schmidt, J. Shangguan, J. M. Siskind, J. Waggoner, S. Wang, J. Wei, Y. Yin, and Z. Zhang. Video in sentences out. In UAI, 2012.
- (2012) UAI
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰ Schmidt, L.¹¹ Shangguan, J.¹² Siskind, J.M.¹³ Waggoner, J.¹⁴ Wang, S.¹⁵ Wei, J.¹⁶ Yin, Y.¹⁷ Zhang, Z.¹⁸

3
- 84887345951
- Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- J. Corso, C. Xu, P. Das, R. F. Doell, and P. Rosebrough. Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching. In CVPR, 2013.
- (2013) CVPR
- Corso, J.¹ Xu, C.² Das, P.³ Doell, R.F.⁴ Rosebrough, P.⁵

4
- 0038401728
- Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary
- P. Duygulu, K. Barnard, N. de Freitas, and D. A. Forsyth. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In ECCV, 2002.
- (2002) ECCV
- Duygulu, P.¹ Barnard, K.² De Freitas, N.³ Forsyth, D.A.⁴

5
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010.
- (2011) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

6
- 84867209146
- IRSTLM: An open source toolkit for handling large scale language models
- M. Federico, N. Bertoldi, and M. Cettolo. IRSTLM: an open source toolkit for handling large scale language models. In Interspeech. ISCA, 2008.
- (2008) Interspeech. ISCA
- Federico, M.¹ Bertoldi, N.² Cettolo, M.³

7
- 84898793348
- How many words is a picture worth? Automatic caption generation for news images
- Y. Feng and M. Lapata. How many words is a picture worth? Automatic caption generation for news images. ACL'10.
- ACL , pp. 10
- Feng, Y.¹ Lapata, M.²

8
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
- S. Guadarrama, N. Krishnamoorthy, G. Malkarnenkar, R. Mooney, T. Darrell, and K. Saenko. Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition. In ICCV, 2013.
- (2013) ICCV
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

9
- 70450202741
- Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos
- A. Gupta, P. Srinivasan, J. B. Shi, and L. Davis. Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In CVPR, 2009.
- (2009) CVPR
- Gupta, A.¹ Srinivasan, P.² Shi, J.B.³ Davis, L.⁴

10
- 84877964523
- Automated textual descriptions for a wide range of video events with 48 human actions
- P. Hanckmann, K. Schutte, and G. J. Burghouts. Automated textual descriptions for a wide range of video events with 48 human actions. In ECCV Workshops, 2012.
- (2012) ECCV Workshops
- Hanckmann, P.¹ Schutte, K.² Burghouts, G.J.³

11
- 84863029475
- Human focused video description
- M. U. G. Khan, L. Zhang, and Y. Gotoh. Human focused video description. In ICCV Workshops, 2011.
- (2011) ICCV Workshops
- Khan, M.U.G.¹ Zhang, L.² Gotoh, Y.³

12
- 49449108990
- Cambridge University Press
- P. Koehn. Statistical Machine Translation. Cambridge University Press, 2010.
- (2011) Statistical Machine Translation
- Koehn, P.¹

13
- 85110867932
- Moses: Open source toolkit for statistical machine translation
- P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. Moses: Open source toolkit for statistical machine translation. In ACL demo, 2007.
- (2007) ACL Demo
- Koehn, P.¹ Hoang, H.² Birch, A.³ Callison-Burch, C.⁴ Federico, M.⁵ Bertoldi, N.⁶ Cowan, B.⁷ Shen, W.⁸ Moran, C.⁹ Zens, R.¹⁰ Dyer, C.¹¹ Bojar, O.¹² Constantin, A.¹³ Herbst, E.¹⁴

14
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- A. Kojima, T. Tamura, and K. Fukunaga. Natural language description of human activities from video images based on concept hierarchy of actions. IJCV, 2002.
- (2002) IJCV
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

15
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating simple image descriptions. In CVPR, 2011.
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

16
- 84878189119
- Collective generation of natural image descriptions
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In ACL, 2012.
- (2012) ACL
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

17
- 49449119085
- Statistical machine translation
- A. Lopez. Statistical machine translation. ACM, 2008.
- (2008) ACM
- Lopez, A.¹

18
- 85044322278
- Generating image descriptions from computer vision detections
- M. Mitchell, J. Dodge, A. Goyal, K. Yamaguchi, K. Stratos, X. Han, A. Mensch, A. C. Berg, T. L. Berg, and H. D. III. Midge, Generating image descriptions from computer vision detections. In EACL, 2012.
- (2012) EACL
- Mitchell, M.¹ Dodge, J.² Goyal, A.³ Yamaguchi, K.⁴ Stratos, K.⁵ Han, X.⁶ Mensch, A.⁷ Berg, A.C.⁸ Berg, T.L.⁹ Midge III, H.D.¹⁰

19
- 0042879653
- A systematic comparison of various statistical alignment models
- F. J. Och and H. Ney. A systematic comparison of various statistical alignment models. CL, 2003.
- (2003) CL
- Och, F.J.¹ Ney, H.²

20
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

21
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- K. Papineni, S. Roukos, T. Ward, and W. jing Zhu. BLEU: a method for automatic evaluation of machine translation. In ACL, 2002.
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Jing Zhu, W.⁴

22
- 84898775557
- Video event understanding using natural language descriptions
- V. Ramanathan, P. Liang, and L. Fei-Fei. Video event understanding using natural language descriptions. In ICCV, 2013.
- (2013) ICCV
- Ramanathan, V.¹ Liang, P.² Fei-Fei, L.³

23
- 84898785648
- Grounding action descriptions in videos
- M. Regneri, M. Rohrbach, D. Wetzel, S. Thater, B. Schiele, and M. Pinkal. Grounding action descriptions in videos. TACL, 2013.
- (2013) TACL
- Regneri, M.¹ Rohrbach, M.² Wetzel, D.³ Thater, S.⁴ Schiele, B.⁵ Pinkal, M.⁶

24
- 84887351648
- Script data for attribute-based recognition of composite activities
- M. Rohrbach, M. Regneri, M. Andriluka, S. Amin, M. Pinkal, and B. Schiele. Script data for attribute-based recognition of composite activities. In ECCV, 2012.
- (2012) ECCV
- Rohrbach, M.¹ Regneri, M.² Andriluka, M.³ Amin, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

25
- 84872942522
- M. Schmidt. UGM: Matlab code for undirected graphical models. di.ens.fr/∼mschmidt/Software/UGM.html, 2013.
- (2013) UGM: Matlab Code for Undirected Graphical Models
- Schmidt, M.¹

26
- 84455192418
- Towards textually describing complex video contents with audio-visual concept classifiers
- C. C. Tan, Y.-G. Jiang, and C.-W. Ngo. Towards textually describing complex video contents with audio-visual concept classifiers. In ACM Multimedia, 2011.
- (2011) ACM Multimedia
- Tan, C.C.¹ Jiang, Y.-G.² Ngo, C.-W.³

27
- 84876945537
- Dense trajectories and motion boundary descriptors for action recognition
- H. Wang, A. Kl̈aser, C. Schmid, and C. Liu. Dense trajectories and motion boundary descriptors for action recognition. IJCV, 2013.
- (2013) IJCV
- Wang, H.¹ Kl̈aser, A.² Schmid, C.³ Liu, C.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.