SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE International Conference on Computer Vision

Volumn 2017-October, Issue , 2017, Pages 1270-1279

Scene Graph Generation from Objects, Phrases and Region Captions

(5) Li, Yikang a Ouyang, Wanli a,b Zhou, Bolei c Wang, Kun a Wang, Xiaogang a

a CHINESE UNIVERSITY OF HONG KONG (Hong Kong)

b UNIVERSITY OF SYDNEY (Australia)

c MASSACHUSETTS INSTITUTE OF TECHNOLOGY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; GRAPHIC METHODS; NEURAL NETWORKS; SEMANTICS;

CONTEXT INFORMATION; JOINT LEARNING; LANGUAGE DESCRIPTION; NOVEL NEURAL NETWORK; SCENE DESCRIPTION; SCENE UNDERSTANDING; SEMANTIC LEVELS; STATE-OF-ART METHODS;

OBJECT DETECTION;

EID: 85041915815 PISSN: 15505499 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/ICCV.2017.142 Document Type: Conference Paper

Times cited : (615)

References (51)

1
- 84973890960
- Vqa: Visual question answering
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence Zitnick, and D. Parikh. Vqa: Visual question answering. In ICCV, 2015
- (2015) ICCV
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Lawrence Zitnick, C.⁶ Parikh, D.⁷

2
- 85116156579
- Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
- S. Banerjee and A. Lavie. Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005
- (2005) Proceedings of the Acl Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation And/or Summarization
- Banerjee, S.¹ Lavie, A.²

3
- 0041876117
- Matching words and pictures
- K. Barnard, P. Duygulu, D. Forsyth, N. d. Freitas, D. M. Blei, and M. I. Jordan. Matching words and pictures. JMLR, 2003
- (2003) JMLR
- Barnard, K.¹ Duygulu, P.² Forsyth, D.³ Freitas, D.N.⁴ Blei, D.M.⁵ Jordan, M.I.⁶

4
- 85107362379
- Nltk: The natural language toolkit
- S. Bird. Nltk: The natural language toolkit. In ACL, 2006
- (2006) ACL
- Bird, S.¹

5
- 84944115859
- arXiv preprint arXiv:1411.5654
- X. Chen and C. L. Zitnick. Learning a recurrent visual representation for image caption generation. ArXiv preprint arXiv:1411.5654, 2014
- (2014) Learning A Recurrent Visual Representation for Image Caption Generation
- Chen, X.¹ Zitnick, C.L.²

6
- 84887394346
- Understanding indoor scenes using 3d geometric phrases
- W. Choi, Y.-W. Chao, C. Pantofaru, and S. Savarese. Understanding indoor scenes using 3d geometric phrases. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 33-40, 2013
- (2013) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 33-40
- Choi, W.¹ Chao, Y.-W.² Pantofaru, C.³ Savarese, S.⁴

7
- 85041894385
- arXiv preprint arXiv:1703.06029
- B. Dai, D. Lin, R. Urtasun, and S. Fidler. Towards diverse and natural image descriptions via a conditional gan. ArXiv preprint arXiv:1703.06029, 2017
- (2017) Towards Diverse and Natural Image Descriptions Via A Conditional Gan
- Dai, B.¹ Lin, D.² Urtasun, R.³ Fidler, S.⁴

8
- 85041892861
- Detecting visual relationships with deep relational networks
- B. Dai, Y. Zhang, and D. Lin. Detecting visual relationships with deep relational networks. CVPR, 2017
- (2017) CVPR
- Dai, B.¹ Zhang, Y.² Lin, D.³

9
- 84877748784
- Detecting actions, poses, and objects with relational phraselets
- C. Desai and D. Ramanan. Detecting actions, poses, and objects with relational phraselets. In ECCV, 2012
- (2012) ECCV
- Desai, C.¹ Ramanan, D.²

10
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015
- (2015) CVPR
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

11
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015
- (2015) CVPR
- Donahue, J.¹ Anne Hendricks, L.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

12
- 84959250180
- From captions to visual concepts and back
- H. Fang, S. Gupta, F. Iandola, R. K. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. C. Platt, et al. From captions to visual concepts and back. In CVPR, 2015
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰

13
- 80052017343
- Every picture tells a story: Generating sentences from images
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, 2010
- (2010) ECCV
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

14
- 85029359197
- Fast r-cnn
- R. Girshick. Fast r-cnn. In ICCV, 2015
- (2015) ICCV
- Girshick, R.¹

15
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014
- (2014) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

16
- 70450155469
- Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers
- A. Gupta and L. S. Davis. Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In ECCV, 2008
- (2008) ECCV
- Gupta, A.¹ Davis, L.S.²

17
- 0001138328
- Algorithm as 136: A kmeans clustering algorithm
- J. A. Hartigan and M. A. Wong. Algorithm as 136: A kmeans clustering algorithm. Journal of the Royal Statistical Society. Series C (Applied Statistics), 1979
- (1979) Journal of the Royal Statistical Society. Series C (Applied Statistics)
- Hartigan, J.A.¹ Wong, M.A.²

18
- 84959229874
- Spatial pyramid pooling in deep convolutional networks for visual recognition
- K. He, X. Zhang, S. Ren, and J. Sun. Spatial pyramid pooling in deep convolutional networks for visual recognition. In CVPR, 2014
- (2014) CVPR
- He, K.¹ Zhang, X.² Ren, S.³ Sun, J.⁴

19
- 84856653718
- Learning cross-modality similarity for multinomial data
- Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In ICCV, 2011
- (2011) ICCV
- Jia, Y.¹ Salzmann, M.² Darrell, T.³

20
- 84986278097
- arXiv preprint arXiv:1511.07571
- J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. ArXiv preprint arXiv:1511.07571, 2015
- (2015) Densecap: Fully Convolutional Localization Networks for Dense Captioning
- Johnson, J.¹ Karpathy, A.² Fei-Fei, L.³

21
- 80053435765
- Learning with whom to share in multi-task feature learning
- Z. Kang, K. Grauman, and F. Sha. Learning with whom to share in multi-task feature learning. In ICML, 2011
- (2011) ICML
- Kang, Z.¹ Grauman, K.² Sha, F.³

22
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In CVPR, 2015
- (2015) CVPR
- Karpathy, A.¹ Fei-Fei, L.²

23
- 84978730111
- arXiv preprint arXiv:1602.07332
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. ArXiv preprint arXiv:1602.07332, 2016
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

24
- 84887601544
- Babytalk: Understanding and generating simple image descriptions
- G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Babytalk: Understanding and generating simple image descriptions. TPAMI, 2013
- (2013) TPAMI
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.L.⁸

25
- 77955997860
- Efficiently selecting regions for scene understanding
- M. P. Kumar and D. Koller. Efficiently selecting regions for scene understanding. In CVPR, 2010
- (2010) CVPR
- Kumar, M.P.¹ Koller, D.²

26
- 84907331257
- Generalizing image captions for image-text parallel corpus
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Generalizing image captions for image-text parallel corpus. In ACL, 2013
- (2013) ACL
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

27
- 85041893972
- Person search with natural language description
- S. Li, T. Xiao, H. Li, B. Zhou, D. Yue, and X. Wang. Person search with natural language description. In CVPR, 2017
- (2017) CVPR
- Li, S.¹ Xiao, T.² Li, H.³ Zhou, B.⁴ Yue, D.⁵ Wang, X.⁶

28
- 85041906062
- Vip-cnn: Visual phrase guided convolutional neural network
- Y. Li, W. Ouyang, X. Wang, and X. Tang. Vip-cnn: Visual phrase guided convolutional neural network. CVPR, 2017
- (2017) CVPR
- Li, Y.¹ Ouyang, W.² Wang, X.³ Tang, X.⁴

29
- 85041926899
- Deep variation-structured reinforcement learning for visual relationship and attribute detection
- X. Liang, L. Lee, and E. P. Xing. Deep variation-structured reinforcement learning for visual relationship and attribute detection. CVPR, 2017
- (2017) CVPR
- Liang, X.¹ Lee, L.² Xing, E.P.³

30
- 85011035819
- arXiv preprint arXiv:1512.02325
- W. Liu, D. Anguelov, D. Erhan, C. Szegedy, and S. Reed. Ssd: Single shot multibox detector. ArXiv preprint arXiv:1512.02325, 2015
- (2015) Ssd: Single Shot Multibox Detector
- Liu, W.¹ Anguelov, D.² Erhan, D.³ Szegedy, C.⁴ Reed, S.⁵

31
- 85035234967
- Visual relationship detection with language priors
- C. Lu, R. Krishna, M. Bernstein, and L. Fei-Fei. Visual relationship detection with language priors. In ECCV, 2016
- (2016) ECCV
- Lu, C.¹ Krishna, R.² Bernstein, M.³ Fei-Fei, L.⁴

32
- 85030238250
- arXiv preprint arXiv:1611.06641
- B. A. Plummer, A. Mallya, C. M. Cervantes, J. Hockenmaier, and S. Lazebnik. Phrase localization and visual relationship detection with comprehensive linguistic cues. ArXiv preprint arXiv:1611.06641, 2016
- (2016) Phrase Localization and Visual Relationship Detection with Comprehensive Linguistic Cues
- Plummer, B.A.¹ Mallya, A.² Cervantes, C.M.³ Hockenmaier, J.⁴ Lazebnik, S.⁵

33
- 84961917629
- arXiv preprint arXiv:1506.02640
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. ArXiv preprint arXiv:1506.02640, 2015
- (2015) You only Look Once: Unified, Real-time Object Detection
- Redmon, J.¹ Divvala, S.² Girshick, R.³ Farhadi, A.⁴

34
- 84960980241
- Faster r-cnn: Towards real-time object detection with region proposal networks
- S. Ren, K. He, R. Girshick, and J. Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS, 2015
- (2015) NIPS
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

35
- 33845596932
- Using multiple segmentations to discover objects and their extent in image collections
- B. C. Russell, W. T. Freeman, A. A. Efros, J. Sivic, and A. Zisserman. Using multiple segmentations to discover objects and their extent in image collections. In CVPR, 2006
- (2006) CVPR
- Russell, B.C.¹ Freeman, W.T.² Efros, A.A.³ Sivic, J.⁴ Zisserman, A.⁵

36
- 80052889458
- Recognition using visual phrases
- M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011
- (2011) CVPR
- Sadeghi, M.A.¹ Farhadi, A.²

37
- 84925410541
- arXiv preprint arXiv:1409.1556
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. ArXiv preprint arXiv:1409.1556, 2014
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

38
- 84925410541
- arXiv preprint arXiv:1409.1556
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. ArXiv preprint arXiv:1409.1556, 2014
- (2014) Very Deep Convolutional Networks for Large-scale Image Recognition
- Simonyan, K.¹ Zisserman, A.²

39
- 77955998009
- Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora
- R. Socher and L. Fei-Fei. Connecting modalities: Semisupervised segmentation and annotation of images using unaligned text corpora. In CVPR, 2010
- (2010) CVPR
- Socher, R.¹ Fei-Fei, L.²

40
- 85041900712
- arXiv preprint arXiv:1701.02426
- D. Xu, Y. Zhu, C. B. Choy, and L. Fei-Fei. Scene graph generation by iterative message passing. ArXiv preprint arXiv:1701.02426, 2017
- (2017) Scene Graph Generation by Iterative Message Passing
- Xu, D.¹ Zhu, Y.² Choy, C.B.³ Fei-Fei, L.⁴

41
- 84939821074
- arXiv preprint arXiv:1502.03044
- K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. S. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. ArXiv preprint arXiv:1502.03044, 2015
- (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhutdinov, R.⁶ Zemel, R.S.⁷ Bengio, Y.⁸

42
- 33846487387
- Multitask learning for classification with dirichlet process priors
- Y. Xue, X. Liao, L. Carin, and B. Krishnapuram. Multitask learning for classification with dirichlet process priors. Journal of Machine Learning Research, 2007
- (2007) Journal of Machine Learning Research
- Xue, Y.¹ Liao, X.² Carin, L.³ Krishnapuram, B.⁴

43
- 85041906381
- Multi-level attention networks for visual question answering
- D. Yu, J. Fu, T. Mei, and Y. Rui. Multi-level attention networks for visual question answering. In CVPR, 2017
- (2017) CVPR
- Yu, D.¹ Fu, J.² Mei, T.³ Rui, Y.⁴

44
- 85029388674
- Visual translation embedding network for visual relation detection
- H. Zhang, Z. Kyaw, S.-F. Chang, and T.-S. Chua. Visual translation embedding network for visual relation detection. In CVPR, 2017
- (2017) CVPR
- Zhang, H.¹ Kyaw, Z.² Chang, S.-F.³ Chua, T.-S.⁴

45
- 85035223616
- Ppr-fcn: Weakly supervised visual relation detection via parallel pairwise rfcn
- H. Zhang, Z. Kyaw, J. Yu, and S.-F. Chang. Ppr-fcn: Weakly supervised visual relation detection via parallel pairwise rfcn. In ICCV, 2017
- (2017) ICCV
- Zhang, H.¹ Kyaw, Z.² Yu, J.³ Chang, S.-F.⁴

46
- 84907044969
- arXiv preprint arXiv:1203.3536
- Y. Zhang and D.-Y. Yeung. A convex formulation for learning task relationships in multi-task learning. ArXiv preprint arXiv:1203.3536, 2012
- (2012) A Convex Formulation for Learning Task Relationships in Multi-task Learning
- Zhang, Y.¹ Yeung, D.-Y.²

47
- 85162027638
- Probabilistic multi-task feature selection
- Y. Zhang, D.-Y. Yeung, and Q. Xu. Probabilistic multi-task feature selection. In NIPS, 2010
- (2010) NIPS
- Zhang, Y.¹ Yeung, D.-Y.² Xu, Q.³

48
- 85009935878
- Facial landmark detection by deep multi-task learning
- Z. Zhang, P. Luo, C. C. Loy, and X. Tang. Facial landmark detection by deep multi-task learning. In ECCV, 2014
- (2014) ECCV
- Zhang, Z.¹ Luo, P.² Loy, C.C.³ Tang, X.⁴

49
- 84986301525
- arXiv preprint arXiv:1512.02167
- B. Zhou, Y. Tian, S. Sukhbaatar, A. Szlam, and R. Fergus. Simple baseline for visual question answering. ArXiv preprint arXiv:1512.02167, 2015
- (2015) Simple Baseline for Visual Question Answering
- Zhou, B.¹ Tian, Y.² Sukhbaatar, S.³ Szlam, A.⁴ Fergus, R.⁵

50
- 85029782049
- Towards contextaware interaction recognition
- B. Zhuang, L. Liu, C. Shen, and I. Reid. Towards contextaware interaction recognition. ICCV, 2017
- (2017) ICCV
- Zhuang, B.¹ Liu, L.² Shen, C.³ Reid, I.⁴

51
- 85041920577
- arXiv preprint arXiv:1705.09892
- B. Zhuang, Q. Wu, C. Shen, I. Reid, and A. v. d. Hengel. Care about you: Towards large-scale human-centric visual relationship detection. ArXiv preprint arXiv:1705.09892, 2017.
- (2017) Care about You: Towards Large-scale Human-centric Visual Relationship Detection
- Zhuang, B.¹ Wu, Q.² Shen, C.³ Reid, I.⁴ Hengel, A.V.D.⁵

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.