SCOPUS 정보 검색 플랫폼

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volumn 9358, Issue , 2015, Pages 209-221

The long-short story of movie description

(3) Rohrbach, Anna a Rohrbach, Marcus b Schiele, Bernt a

a MAX PLANCK INSTITUTE FOR INFORMATICS (Germany)

b UNIVERSITY OF CALIFORNIA (United States)

Author keywords

[No Author keywords available]

Indexed keywords

HUMAN ROBOT INTERACTION; MOTION PICTURES; ROBOTS;

BLIND PEOPLE; COMPARE AND ANALYZE; IMAGE CAPTIONING; IMAGE DESCRIPTIONS; LONG SHORT-TERM MEMORY; RECURRENT NETWORKS;

PATTERN RECOGNITION;

EID: 84952308628 PISSN: 03029743 EISSN: 16113349 Source Type: Book Series
DOI: 10.1007/978-3-319-24947-6_17 Document Type: Conference Paper

Times cited : (84)

References (40)

1
- 84885996388
- Video in sentences out
- Barbu, A., Bridge, A., Burchill, Z., Coroian, D., Dickinson, S., Fidler, S., Michaux, A., Mussman, S., Narayanaswamy, S., Salvi, D., Schmidt, L., Shangguan, J., Siskind, J.M., Waggoner, J., Wang, S., Wei, J., Yin, Y., Zhang, Z.: Video in sentences out. In: UAI (2012)
- (2012) UAI
- Barbu, A.¹ Bridge, A.² Burchill, Z.³ Coroian, D.⁴ Dickinson, S.⁵ Fidler, S.⁶ Michaux, A.⁷ Mussman, S.⁸ Narayanaswamy, S.⁹ Salvi, D.¹⁰ Schmidt, L.¹¹ Shangguan, J.¹² Siskind, J.M.¹³ Waggoner, J.¹⁴ Wang, S.¹⁵ Wei, J.¹⁶ Yin, Y.¹⁷ Zhang, Z.¹⁸

2
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- Chen, D., Dolan, W.: Collecting highly parallel data for paraphrase evaluation. In: ACL (2011)
- (2011) ACL
- Chen, D.¹ Dolan, W.²

3
- 84952349295
- arXiv:1504.00325
- Chen, X., Fang, H., Lin, T., Vedantam, R., Gupta, S., Dollr, P., Zitnick, C.L.: Microsoft coco captions: data collection and evaluation server (2015). arXiv:1504.00325
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.³ Vedantam, R.⁴ Gupta, S.⁵ Dollr, P.⁶ Zitnick, C.L.⁷

4
- 84887345951
- Thousand frames in just a few words: Lingual description of videos through latent topics and sparse object stitching
- Das, P., Xu, C., Doell, R., Corso, J.: Thousand frames in just a few words: lingual description of videos through latent topics and sparse object stitching. In: CVPR (2013)
- (2013) CVPR
- Das, P.¹ Xu, C.² Doell, R.³ Corso, J.⁴

5
- 84952349296
- arXiv:1505.01809
- Devlin, J., Cheng, H., Fang, H., Gupta, S., Deng, L., He, X., Zweig, G., Mitchell, M.: Language models for image captioning: the quirks and what works (2015). arXiv:1505.01809
- (2015) Language Models for Image Captioning: The Quirks and What Works
- Devlin, J.¹ Cheng, H.² Fang, H.³ Gupta, S.⁴ Deng, L.⁵ He, X.⁶ Zweig, G.⁷ Mitchell, M.⁸

6
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- Donahue, J., Hendricks, L.A., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR (2015)
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

7
- 84906929591
- Image description using visual dependency representations
- Elliott, D., Keller, F.: Image description using visual dependency representations. In: EMNLP, pp. 1292-1302 (2013)
- (2013) EMNLP , pp. 1292-1302
- Elliott, D.¹ Keller, F.²

8
- 84959250180
- From captions to visual concepts and back
- Fang, H., Gupta, S., Iandola, F.N., Srivastava, R., Deng, L., Dollar, P., Gao, J., He, X., Mitchell, M., Platt, J.C., Zitnick, C.L., Zweig, G.: From captions to visual concepts and back. In: CVPR (2015)
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.N.³ Srivastava, R.⁴ Deng, L.⁵ Dollar, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰ Zitnick, C.L.¹¹ Zweig, G.¹²

9
- 78149311145
- Every picture tells a story: Generating sentences from images
- In: Daniilidis, K., Maragos, P., Paragios, N. (eds.), Springer, Heidelberg
- Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every picture tells a story: generating sentences from images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 15-29. Springer, Heidelberg (2010)
- (2010) ECCV 2010, Part IV. LNCS , vol.6314 , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

10
- 0004289791
- The MIT Press, Cambridge
- Fellbaum, C.: WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)
- (1998) Wordnet: An Electronic Lexical Database
- Fellbaum, C.¹

11
- 84898773262
- Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition
- Guadarrama, S., Krishnamoorthy, N., Malkarnenkar, G., Venugopalan, S., Mooney, R., Darrell, T., Saenko, K.: Youtube2text: Recognizing and describing arbitrary activities using semantic hierarchies and zero-shoot recognition. In: ICCV (2013)
- (2013) ICCV
- Guadarrama, S.¹ Krishnamoorthy, N.² Malkarnenkar, G.³ Venugopalan, S.⁴ Mooney, R.⁵ Darrell, T.⁶ Saenko, K.⁷

12
- 84867720412
- arXiv:1207.0580
- Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv:1207.0580
- (2012) Improving Neural Networks by Preventing Co-Adaptation of Feature Detectors
- Hinton, G.E.¹ Srivastava, N.² Krizhevsky, A.³ Sutskever, I.⁴ Salakhutdinov, R.R.⁵

13
- 0031573117
- Long short-term memory
- Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735-1780 (1997)
- (1997) Neural Comput , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

14
- 84924803045
- LSDA: Large scale detection through adaptation
- Hoffman, J., Guadarrama, S., Tzeng, E., Donahue, J., Girshick, R., Darrell, T., Saenko, K.: LSDA: large scale detection through adaptation. In: NIPS (2014)
- (2014) NIPS
- Hoffman, J.¹ Guadarrama, S.² Tzeng, E.³ Donahue, J.⁴ Girshick, R.⁵ Darrell, T.⁶ Saenko, K.⁷

15
- 84913555165
- arXiv:1408.5093
- Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding (2014). arXiv:1408.5093
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

16
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: CVPR (2015)
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

17
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- Kiros, R., Salakhutdinov, R., Zemel, R.S.: Unifying visual-semantic embeddings with multimodal neural language models. TACL (2015)
- (2015) TACL
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

18
- 0036843382
- Natural language description of human activities from video images based on concept hierarchy of actions
- Kojima, A., Tamura, T., Fukunaga, K.: Natural language description of human activities from video images based on concept hierarchy of actions. IJCV 50(2), 171-184 (2002)
- (2002) IJCV , vol.50 , Issue.2 , pp. 171-184
- Kojima, A.¹ Tamura, T.² Fukunaga, K.³

19
- 80052901011
- Baby talk: Understanding and generating simple image descriptions
- Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Baby talk: understanding and generating simple image descriptions. In: CVPR (2011)
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

20
- 84934873221
- Treetalk: Composition and compression of trees for image descriptions
- Kuznetsova, P., Ordonez, V., Berg, T.L., Hill, U.C., Choi, Y.: Treetalk: composition and compression of trees for image descriptions. In: TACL (2014)
- (2014) TACL
- Kuznetsova, P.¹ Ordonez, V.² Berg, T.L.³ Hill, U.C.⁴ Choi, Y.⁵

21
- 85107661995
- Meteor universal: Language specific translation evaluation for any target language
- Lavie, M.D.A.: Meteor universal: language specific translation evaluation for any target language. In: ACL 2014, p. 376 (2014)
- (2014) ACL 2014 , pp. 376
- Lavie, M.D.A.¹

22
- 85083950512
- Deep captioning with multimodal recurrent neural networks (M-RNN)
- Mao, J., Xu, W., Yang, Y., Wang, J., Huang, Z., Yuille, A.: Deep captioning with multimodal recurrent neural networks (m-RNN). In: ICLR (2015)
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

23
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- Mitchell, M., Dodge, J., Goyal, A., Yamaguchi, K., Stratos, K., Han, X., Mensch, A., Berg, A.C., Berg, T.L., Daume, H.: Midge: generating image descriptions from computer vision detections. In: EACL (2012)
- (2012) EACL
- Mitchell, M.¹ Dodge, J.² Goyal, A.³ Yamaguchi, K.⁴ Stratos, K.⁵ Han, X.⁶ Mensch, A.⁷ Berg, A.C.⁸ Berg, T.L.⁹ Daume, H.¹⁰

24
- 84952349300
- arXiv:1505.01861
- Pan, Y., Mei, T., Yao, T., Li, H., Rui, Y.: Jointly modeling embedding and translation to bridge video and language (2015). arXiv:1505.01861
- (2015) Jointly Modeling Embedding and Translation to Bridge Video and Language
- Pan, Y.¹ Mei, T.² Yao, T.³ Li, H.⁴ Rui, Y.⁵

25
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: ACL (2002)
- (2002) ACL
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.J.⁴

26
- 84908670256
- Coherent multi-sentence video description with variable level of detail
- In: Jiang, X., Hornegger, J., Koch, R. (eds.), Springer, Heidelberg
- Rohrbach, A., Rohrbach, M., Qiu, W., Friedrich, A., Pinkal, M., Schiele, B.: Coherent multi-sentence video description with variable level of detail. In: Jiang, X., Hornegger, J., Koch, R. (eds.) GCPR 2014. LNCS, vol. 8753, pp. 184-195. Springer, Heidelberg (2014)
- (2014) GCPR 2014. LNCS , vol.8753 , pp. 184-195
- Rohrbach, A.¹ Rohrbach, M.² Qiu, W.³ Friedrich, A.⁴ Pinkal, M.⁵ Schiele, B.⁶

27
- 84973887740
- arXiv:1506.01698
- Rohrbach, A., Rohrbach, M., Schiele, B.: The long-short story of movie description (2015). arXiv:1506.01698
- (2015) The Long-Short Story of Movie Description
- Rohrbach, A.¹ Rohrbach, M.² Schiele, B.³

28
- 84959211977
- A dataset for movie description
- Rohrbach, A., Rohrbach, M., Tandon, N., Schiele, B.: A dataset for movie description. In: CVPR (2015)
- (2015) CVPR
- Rohrbach, A.¹ Rohrbach, M.² Tandon, N.³ Schiele, B.⁴

29
- 84898775239
- Translating video content to natural language descriptions
- Rohrbach, M., Qiu, W., Titov, I., Thater, S., Pinkal, M., Schiele, B.: Translating video content to natural language descriptions. In: ICCV (2013)
- (2013) ICCV
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

30
- 84959932469
- Integrating language and vision to generate natural language descriptions of videos in the wild
- Thomason, J., Venugopalan, S., Guadarrama, S., Saenko, K., Mooney, R.J.: Integrating language and vision to generate natural language descriptions of videos in the wild. In: COLING (2014)
- (2014) COLING
- Thomason, J.¹ Venugopalan, S.² Guadarrama, S.³ Saenko, K.⁴ Mooney, R.J.⁵

31
- 84952349304
- arXiv:1503.01070v1
- Torabi, A., Pal, C., Larochelle, H., Courville, A.: Using descriptive video services to create a large data source for video annotation research (2015). arXiv:1503.01070v1
- Torabi, A.¹ Pal, C.² Larochelle, H.³ Courville, A.⁴

32
- 84956980995
- Cider: Consensus-based image description evaluation
- Vedantam, R., Zitnick, C.L., Parikh, D.: Cider: Consensus-based image description evaluation. In: CVPR (2015)
- (2015) CVPR
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

33
- 84952349305
- arXiv:1505.00487
- Venugopalan, S., Rohrbach, M., Donahue, J., Mooney, R., Darrell, T., Saenko, K.: Sequence to sequence - video to text (2015). arXiv:1505.00487
- (2015) Sequence to Sequence - Video to Text
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

34
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- Venugopalan, S., Xu, H., Donahue, J., Rohrbach, M., Mooney, R., Saenko, K.: Translating videos to natural language using deep recurrent neural networks. In: NAACL (2015)
- (2015) NAACL
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

35
- 84946747440
- Show and tell: A neural image caption generator
- Vinyals, O., Toshev, A., Bengio, S., Erhan, D.: Show and tell: A neural image caption generator. In: CVPR (2015)
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

36
- 84898805910
- Action recognition with improved trajectories
- Wang, H., Schmid, C.: Action recognition with improved trajectories. In: ICCV (2013)
- (2013) ICCV
- Wang, H.¹ Schmid, C.²

37
- 84952349307
- Jointly modeling deep video and compositional text to bridge vision and language in a unified framework
- Xu, R., Xiong, C., Chen, W., Corso, J.J.: Jointly modeling deep video and compositional text to bridge vision and language in a unified framework. In: AAAI (2015)
- (2015) AAAI
- Xu, R.¹ Xiong, C.² Chen, W.³ Corso, J.J.⁴

38
- 84952349308
- arXiv:1502.08029v4
- Yao, L., Torabi, A., Cho, K., Ballas, N., Pal, C., Larochelle, H., Courville, A.: Describing videos by exploiting temporal structure (2015). arXiv:1502.08029v4
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

39
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL 2, 67-78 (2014)
- (2014) TACL , vol.2 , pp. 67-78
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

40
- 84937964578
- Learning Deep Features for Scene Recognition using Places Database
- Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Oliva, A.: Learning Deep Features for Scene Recognition using Places Database. In: NIPS (2014)
- (2014) NIPS
- Zhou, B.¹ Lapedriza, A.² Xiao, J.³ Torralba, A.⁴ Oliva, A.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.