SCOPUS 정보 검색 플랫폼

Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017

Volumn 2017-January, Issue , 2017, Pages 955-964

StyleNet: Generating attractive visual captions with styles

(5) Gan, Chuang a Gan, Zhe b He, Xiaodong c Gao, Jianfeng c Deng, Li c

a TSINGHUA UNIVERSITY (China)

b DUKE UNIVERSITY (United States)

c MICROSOFT RESEARCH (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION;

FLICKR IMAGES; GENERATION PROCESS; HUMAN EVALUATION; IMAGE CAPTION; MODEL COMPONENTS; MONOLINGUAL TEXTS; RUNTIMES;

LONG SHORT-TERM MEMORY;

EID: 85044213495 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2017.108 Document Type: Conference Paper

Times cited : (330)

References (56)

1
- 85083953689
- Neural machine translation by jointly learning to align and translate
- 3
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. ICLR, 2015. 3
- (2015) ICLR
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 84965179228
- Scheduled sampling for sequence prediction with recurrent neural networks
- 1, 2
- S. Bengio, O. Vinyals, N. Jaitly, and N. Shazeer. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS, pages 1171-1179, 2015. 1, 2
- (2015) NIPS , pp. 1171-1179
- Bengio, S.¹ Vinyals, O.² Jaitly, N.³ Shazeer, N.⁴

3
- 84859089502
- Collecting highly parallel data for paraphrase evaluation
- 8
- D. L. Chen and W. B. Dolan. Collecting highly parallel data for paraphrase evaluation. In ACL, pages 190-200, 2011. 8
- (2011) ACL , pp. 190-200
- Chen, D.L.¹ Dolan, W.B.²

4
- 84952349295
- arXiv preprint 5
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollár, and C. L. Zitnick. Microsoft coco captions: Data collection and evaluation server. arXiv preprint arXiv:1504.00325, 2015. 5
- (2015) Microsoft Coco Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollár, P.⁶ Zitnick, C.L.⁷

5
- 84957029470
- Mind's eye: A recurrent visual representation for image caption generation
- 1, 2
- X. Chen and C. Lawrence Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, pages 2422-2431, 2015. 1, 2
- (2015) CVPR , pp. 2422-2431
- Chen, X.¹ Lawrence Zitnick, C.²

6
- 84961291190
- Learning phrase representations using rnn encoder-decoder for statistical machine translation
- 3
- K. Cho, B. V. Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014. 3
- (2014) EMNLP
- Cho, K.¹ Merrienboer, B.V.² Gulcehre, C.³ Bahdanau, D.⁴ Bougares, F.⁵ Schwenk, H.⁶ Bengio, Y.⁷

7
- 85198028989
- Imagenet: A large-scale hierarchical image database
- 6
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248-255, 2009. 6
- (2009) CVPR , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

8
- 85107661995
- Meteor universal: Language specific translation evaluation for any target language
- 6
- M. Denkowski and A. Lavie. Meteor universal: Language specific translation evaluation for any target language. In ACL, 2014. 6
- (2014) ACL
- Denkowski, M.¹ Lavie, A.²

9
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- 1, 2
- J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, pages 2625-2634, 2015. 1, 2
- (2015) CVPR , pp. 2625-2634
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

10
- 84959250180
- From captions to visual concepts and back
- 1, 2, 3
- H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, pages 1473-1482, 2015. 1, 2, 3
- (2015) CVPR , pp. 1473-1482
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰

11
- 78149311145
- Every picture tells a story: Generating sentences from images
- 2
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, pages 15-29, 2010. 2
- (2010) ECCV , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

12
- 84986281512
- Learning attributes equals multi-source domain generalization
- 2
- C. Gan, T. Yang, and B. Gong. Learning attributes equals multi-source domain generalization. In CVPR, pages 87-97, 2016. 2
- (2016) CVPR , pp. 87-97
- Gan, C.¹ Yang, T.² Gong, B.³

13
- 85021786108
- Semantic compositional networks for visual captioning
- 3
- Z. Gan, C. Gan, X. He, Y. Pu, K. Tran, J. Gao, L. Carin, and L. Deng. Semantic compositional networks for visual captioning. CVPR, 2017. 3
- (2017) CVPR
- Gan, Z.¹ Gan, C.² He, X.³ Pu, Y.⁴ Tran, K.⁵ Gao, J.⁶ Carin, L.⁷ Deng, L.⁸

14
- 85044442374
- Residual multiple instance learning for visually impaired image descriptions
- 3
- S. Gella and M. Mitchell. Residual multiple instance learning for visually impaired image descriptions. NIPS Women in Machine Learning Workshop, 2016. 3
- (2016) NIPS Women in Machine Learning Workshop
- Gella, S.¹ Mitchell, M.²

15
- 84964588182
- Fast r-cnn
- 2
- R. Girshick. Fast r-cnn. In ICCV, pages 1440-1448, 2015. 2
- (2015) ICCV , pp. 1440-1448
- Girshick, R.¹

16
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- 2
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. Computer Science, pages 580-587, 2014. 2
- (2014) Computer Science , pp. 580-587
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

17
- 84986274465
- Deep residual learning for image recognition
- 2, 6
- K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. CVPR, 2016. 2, 6
- (2016) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

18
- 84986274522
- Deep compositional captioning: Describing novel object categories without paired training data
- 3
- L. A. Hendricks, S. Venugopalan, M. Rohrbach, R. Mooney, K. Saenko, and T. Darrell. Deep compositional captioning: Describing novel object categories without paired training data. CVPR, 2016. 3
- (2016) CVPR
- Hendricks, L.A.¹ Venugopalan, S.² Rohrbach, M.³ Mooney, R.⁴ Saenko, K.⁵ Darrell, T.⁶

19
- 0031573117
- Long short-term memory
- 3
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9(8):1735-1780, 1997. 3
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

20
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 2, 5, 6
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research, 47:853-899, 2013. 2, 5, 6
- (2013) Journal of Artificial Intelligence Research , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

21
- 84973917813
- Guiding the long-short term memory model for image caption generation
- 2
- X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding the long-short term memory model for image caption generation. In ICCV, pages 2407-2415, 2015. 2
- (2015) ICCV , pp. 2407-2415
- Jia, X.¹ Gavves, E.² Fernando, B.³ Tuytelaars, T.⁴

22
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- 1, 2
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, pages 3128-3137, 2015. 1, 2
- (2015) CVPR , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

23
- 84911364368
- Large-scale video classification with convolutional neural networks
- 8
- A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition, pages 1725-1732, 2014. 8
- (2014) Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition , pp. 1725-1732
- Karpathy, A.¹ Toderici, G.² Shetty, S.³ Leung, T.⁴ Sukthankar, R.⁵ Fei-Fei, L.⁶

24
- 85083951076
- A method for stochastic optimization
- 6
- D. Kingma and J. Ba. Adam: A method for stochastic optimization. ICLR, 2015. 6
- (2015) ICLR
- Kingma, D.¹ Adam, J.Ba.²

25
- 84978730111
- arXiv preprint 3
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. arXiv preprint arXiv:1602.07332, 2016. 3
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

26
- 84876231242
- Imagenet classification with deep convolutional neural networks
- 2
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. NIPS, 2012. 2
- (2012) NIPS
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

27
- 80052901011
- Babytalk: Understanding and generating simple image descriptions
- 2
- G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: understanding and generating simple image descriptions. In CVPR, pages 1601-1608, 2011. 2
- (2011) CVPR , pp. 1601-1608
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.L.⁸

28
- 84934873221
- TREETALK: Composition and compression of trees for image descriptions
- 2
- P. Kuznetsova, V. Ordonez, T. L. Berg, and Y. Choi. TREETALK: composition and compression of trees for image descriptions. TACL, 2:351-362, 2014. 2
- (2014) TACL , vol.2 , pp. 351-362
- Kuznetsova, P.¹ Ordonez, V.² Berg, T.L.³ Choi, Y.⁴

29
- 85044442587
- Composing simple image descriptions using web-scale n-grams
- 2
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In ACL, 2011. 2
- (2011) ACL
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

30
- 26944501715
- Rouge: A package for automatic evaluation of summaries
- C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out: Proceedings of the ACL-04 workshop, volume 8, 2004. 6
- (2004) Text Summarization Branches Out: Proceedings of The ACL-04 Workshop , vol.8 , pp. 6
- Lin, C.-Y.¹

31
- 84994145330
- Multi-task sequence to sequence learning
- 2, 3, 5, 6
- M.-T. Luong, Q. V. Le, I. Sutskever, O. Vinyals, and L. Kaiser. Multi-task sequence to sequence learning. ICLR, 2015. 2, 3, 5, 6
- (2015) ICLR
- Luong, M.-T.¹ Le, Q.V.² Sutskever, I.³ Vinyals, O.⁴ Kaiser, L.⁵

32
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-RNN)
- 1, 2, 4
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-RNN). ICLR, 2015. 1, 2, 4
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

33
- 84973863256
- Learning like a child: Fast novel visual concept learning from sentence descriptions of images
- 3
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV, 2015. 3
- (2015) ICCV
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

34
- 85044481513
- Senticap: Generating image descriptions with sentiments
- 2, 3
- A. Mathews, L. Xie, and X. He. Senticap: Generating image descriptions with sentiments. AAAI, 2015. 2, 3
- (2015) AAAI
- Mathews, A.¹ Xie, L.² He, X.³

35
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- 2
- M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, and I. Hal Daum?? Midge: generating image descriptions from computer vision detections. In EACL, pages 747-756, 2012. 2
- (2012) EACL , pp. 747-756
- Mitchell, M.¹ Han, X.² Dodge, J.³ Mensch, A.⁴ Goyal, A.⁵ Berg, A.⁶ Yamaguchi, K.⁷ Berg, T.⁸ Stratos, K.⁹ Hal Daum, I.¹⁰

36
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- 2
- V. Ordonez, G. Kulkarni, T. L. Berg, V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. NIPS, pages 1143-1151, 2011. 2
- (2011) NIPS , pp. 1143-1151
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³ Ordonez, V.⁴ Kulkarni, G.⁵ Berg, T.L.⁶

37
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- 6
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: a method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002. 6
- (2002) ACL , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

38
- 85018916536
- Variational autoencoder for deep learning of images, labels and captions
- 3
- Y. Pu, Z. Gan, R. Henao, X. Yuan, C. Li, A. Stevens, and L. Carin. Variational autoencoder for deep learning of images, labels and captions. In NIPS, pages 2352-2360, 2016. 3
- (2016) NIPS , pp. 2352-2360
- Pu, Y.¹ Gan, Z.² Henao, R.³ Yuan, X.⁴ Li, C.⁵ Stevens, A.⁶ Carin, L.⁷

39
- 85011668632
- Faster r-cnn: Towards real-time object detection with region proposal networks
- 2
- S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1-1, 2016. 2
- (2016) IEEE Transactions on Pattern Analysis and Machine Intelligence , pp. 1
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

40
- 84990034009
- Very deep convolutional networks for large-scale image recognition
- 2
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. Computer Science, 2014. 2
- (2014) Computer Science
- Simonyan, K.¹ Zisserman, A.²

41
- 84973888835
- Automatic concept discovery from parallel text and visual corpora
- 2
- C. Sun, C. Gan, and R. Nevatia. Automatic concept discovery from parallel text and visual corpora. In ICCV, pages 2596-2604, 2015. 2
- (2015) ICCV , pp. 2596-2604
- Sun, C.¹ Gan, C.² Nevatia, R.³

42
- 84928547704
- Sequence to sequence learning with neural networks
- 2, 3
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In NIPS, pages 3104-3112, 2014. 2, 3
- (2014) NIPS , pp. 3104-3112
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

43
- 84937522268
- Going deeper with convolutions
- 2
- C. Szegedy, W. Liu, Y. Jia, and P. Sermanet. Going deeper with convolutions. CVPR, pages 1-9, 2015. 2
- (2015) CVPR , pp. 1-9
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴

44
- 84979557463
- 6
- Theano Development Team. Theano: A Python framework for fast computation of mathematical expressions. arXiv: 1605.02688, 2016. 6
- (2016) Theano: A Python Framework for Fast Computation of Mathematical Expressions

45
- 84973865953
- Learning spatiotemporal features with 3D convolutional networks
- 8
- D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri. Learning spatiotemporal features with 3D convolutional networks. In ICCV, pages 4489-4497, 2015. 8
- (2015) ICCV , pp. 4489-4497
- Tran, D.¹ Bourdev, L.² Fergus, R.³ Torresani, L.⁴ Paluri, M.⁵

46
- 84991067015
- arXiv preprint 1, 2, 3, 6, 7
- K. Tran, X. He, L. Zhang, J. Sun, C. Carapcea, C. Thrasher, C. Buehler, and C. Sienkiewicz. Rich image captioning in the wild. arXiv preprint arXiv:1603.09016, 2016. 1, 2, 3, 6, 7
- (2016) Rich Image Captioning in The Wild
- Tran, K.¹ He, X.² Zhang, L.³ Sun, J.⁴ Carapcea, C.⁵ Thrasher, C.⁶ Buehler, C.⁷ Sienkiewicz, C.⁸

47
- 84956980995
- Cider: Consensus-based image description evaluation
- 6
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. Cider: Consensus-based image description evaluation. In CVPR, pages 4566-4575, 2015. 6
- (2015) CVPR , pp. 4566-4575
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

48
- 85032508350
- 3
- S. Venugopalan, L. A. Hendricks, R. Mooney, and K. Saenko. Improving lstm-based video description with linguistic knowledge mined from text. 2016. 3
- (2016) Improving Lstm-Based Video Description with Linguistic Knowledge Mined from Text
- Venugopalan, S.¹ Hendricks, L.A.² Mooney, R.³ Saenko, K.⁴

49
- 84959876769
- Translating videos to natural language using deep recurrent neural networks
- 8
- S. Venugopalan, H. Xu, J. Donahue, M. Rohrbach, R. Mooney, and K. Saenko. Translating videos to natural language using deep recurrent neural networks. NAACL, 2015. 8
- (2015) NAACL
- Venugopalan, S.¹ Xu, H.² Donahue, J.³ Rohrbach, M.⁴ Mooney, R.⁵ Saenko, K.⁶

50
- 84946747440
- Show and tell: A neural image caption generator
- 1, 2, 3, 4, 5, 6
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, pages 3156-3164, 2015. 1, 2, 3, 4, 5, 6
- (2015) CVPR , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

51
- 85044461924
- Dense-cap: Fully convolutional localization networks for dense captioning
- 3
- L. Wei, Q. Huang, D. Ceylan, E. Vouga, and H. Li. Dense-cap: Fully convolutional localization networks for dense captioning. Computer Science, 2015. 3
- (2015) Computer Science
- Wei, L.¹ Huang, Q.² Ceylan, D.³ Vouga, E.⁴ Li, H.⁵

52
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- 1, 2, 4
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudi-nov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, pages 2048-2057, 2015. 1, 2, 4
- (2015) ICML , pp. 2048-2057
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhudi-Nov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

53
- 80053258778
- Corpus-guided sentence generation of natural images
- 2
- Y. Yang, C. L. Teo, Daum, H. Iii, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, pages 444-454, 2011. 2
- (2011) EMNLP , pp. 444-454
- Yang, Y.¹ Teo, C.L.² Daum, H.I.³ Aloimonos, Y.⁴

54
- 85030211479
- Encode, review, and decode: Reviewer module for caption generation
- 1, 2
- Z. Yang, Y. Yuan, Y. Wu, R. Salakhutdinov, and W. W. Cohen. Encode, review, and decode: Reviewer module for caption generation. NIPS, 2016. 1, 2
- (2016) NIPS
- Yang, Z.¹ Yuan, Y.² Wu, Y.³ Salakhutdinov, R.⁴ Cohen, W.W.⁵

55
- 84986317307
- Image captioning with semantic attention
- 1, 2
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. CVPR, 2016. 1, 2
- (2016) CVPR
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

56
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- 5
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. TACL, 2014. 5
- (2014) TACL
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.