SCOPUS 정보 검색 플랫폼

EMNLP 2017 - Conference on Empirical Methods in Natural Language Processing, Proceedings

Volumn , Issue , 2017, Pages 177-187

OBJ2TEXT: Generating visually descriptive language from object layouts

(2) Yin, Xuwang a Ordonez, Vicente a

a University of Virginia (United States)

Author keywords

[No Author keywords available]

Indexed keywords

ENCODING (SYMBOLS); IMAGE ENHANCEMENT; NATURAL LANGUAGE PROCESSING SYSTEMS; OBJECT DETECTION;

IMAGE CAPTIONING; INPUT SEQUENCE; LANGUAGE MODEL; OBJECT DETECTORS; OBJECT LAYOUTS; SEQUENCE MODELING; SPATIAL RELATIONSHIPS; STATE OF THE ART;

LONG SHORT-TERM MEMORY;

EID: 85045896626 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.18653/v1/d17-1017 Document Type: Conference Paper

Times cited : (54)

References (42)

1
- 85083953689
- Neural machine translation by jointly learning to align and translate
- Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations (ICLR).
- (2015) International Conference on Learning Representations (ICLR)
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

2
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In IEEE conference on computer vision and pattern recognition (CVPR), pages 2625–2634.
- (2015) IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 2625-2634
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

3
- 84906929591
- Image description using visual dependency representations
- Desmond Elliott and Frank Keller. 2013. Image description using visual dependency representations. In EMNLP, volume 13, pages 1292–1302.
- (2013) EMNLP , vol.13 , pp. 1292-1302
- Elliott, D.¹ Keller, F.²

4
- 85073148740
- arXiv preprint
- Benjamin Eysenbach, Carl Vondrick, and Antonio Torralba. 2016. Who is mistaken? arXiv preprint arXiv:1612.01175.
- (2016) Who Is Mistaken?
- Eysenbach, B.¹ Vondrick, C.² Torralba, A.³

5
- 84959250180
- From captions to visual concepts and back
- Hao Fang, Saurabh Gupta, Forrest Iandola, Rupesh K Srivastava, Li Deng, Piotr Dollár, Jianfeng Gao, Xiaodong He, Margaret Mitchell, John C Platt, et al. 2015. From captions to visual concepts and back. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 1473–1482.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 1473-1482
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.K.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.C.¹⁰

6
- 78149311145
- Every picture tells a story: Generating sentences from images
- Springer
- Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In European conference on computer vision, pages 15–29. Springer.
- (2010) European Conference on Computer Vision , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

7
- 84911394491
- Predicting object dynamics in scenes
- David F Fouhey and C Lawrence Zitnick. 2014. Predicting object dynamics in scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2019–2026.
- (2014) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 2019-2026
- Fouhey, D.F.¹ Lawrence Zitnick, C.²

8
- 0031573117
- Long short-term memory
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, 9(8):1735–1780.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

9
- 84986305787
- Natural language object retrieval
- Ronghang Hu, Huazhe Xu, Marcus Rohrbach, Jiashi Feng, Kate Saenko, and Trevor Darrell. 2016. Natural language object retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4555–4564.
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 4555-4564
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

10
- 85012016847
- Summarizing source code using a neural attention model
- Berlin, Germany. Association for Computational Linguistics
- Srinivasan Iyer, Ioannis Konstas, Alvin Cheung, and Luke Zettlemoyer. 2016. Summarizing source code using a neural attention model. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2073–2083, Berlin, Germany. Association for Computational Linguistics.
- (2016) Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pp. 2073-2083
- Iyer, S.¹ Konstas, I.² Cheung, A.³ Zettlemoyer, L.⁴

11
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 3128–3137.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

12
- 85073158456
- Andrej Karpathy et al. 2016. Neuraltalk2. https://github.com/karpathy/neuraltalk2/.
- (2016) Neuraltalk2
- Karpathy, A.¹

13
- 84926222565
- Unsupervised concept-to-text generation with hypergraphs
- Ioannis Konstas and Mirella Lapata. 2012. Unsupervised concept-to-text generation with hypergraphs. In Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 752–761.
- (2012) Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) , pp. 752-761
- Konstas, I.¹ Lapata, M.²

14
- 85011596790
- Visual genome: Connecting language and vision using crowdsourced dense image annotations
- Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. 2017. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1):32–73.
- (2017) International Journal of Computer Vision , vol.123 , Issue.1 , pp. 32-73
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

15
- 84876231242
- Imagenet classification with deep convolutional neural networks
- Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Neural Information Processing Systems (NIPS), pages 1097–1105.
- (2012) Neural Information Processing Systems (NIPS) , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

16
- 0020885283
- Design of a knowledge-based report generator
- Association for Computational Linguistics
- Karen Kukich. 1983. Design of a knowledge-based report generator. In Proceedings of the 21st annual meeting on Association for Computational Linguistics, pages 145–150. Association for Computational Linguistics.
- (1983) Proceedings of the 21st Annual Meeting on Association for Computational Linguistics , pp. 145-150
- Kukich, K.¹

17
- 84906493406
- Microsoft coco: Common objects in context
- Springer
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV, pages 740–755. Springer.
- (2014) ECCV , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Lawrence Zitnick, C.⁸

18
- 85039172448
- Effective approaches to attention-based neural machine translation
- abs/1508.04025
- Minh-Thang Luong, Hieu Pham, and Christopher D. Manning. 2015. Effective approaches to attention-based neural machine translation. CoRR, abs/1508.04025.
- (2015) CoRR
- Luong, M.-T.¹ Pham, H.² Manning, C.D.³

19
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep captioning with multimodal recurrent neural networks (m-rnn). ICLR.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

20
- 84906925144
- Nonparametric method for data-driven image captioning
- Rebecca Mason and Eugene Charniak. 2014. Nonparametric method for data-driven image captioning. In ACL (2), pages 592–598.
- (2014) ACL , Issue.2 , pp. 592-598
- Mason, R.¹ Charniak, E.²

21
- 84986247766
- Large scale retrieval and generation of image descriptions
- Vicente Ordonez, Xufeng Han, Polina Kuznetsova, Girish Kulkarni, Margaret Mitchell, Kota Yamaguchi, Karl Stratos, Amit Goyal, Jesse Dodge, Alyssa Mensch, III Daume, Hal, Alexander C. Berg, Yejin Choi, and Tamara L. Berg. 2015. Large scale retrieval and generation of image descriptions. International Journal of Computer Vision, pages 1–14.
- (2015) International Journal of Computer Vision , pp. 1-14
- Ordonez, V.¹ Han, X.² Kuznetsova, P.³ Kulkarni, G.⁴ Mitchell, M.⁵ Yamaguchi, K.⁶ Stratos, K.⁷ Goyal, A.⁸ Dodge, J.⁹ Mensch, A.¹⁰ Daume, H.¹¹ Berg, A.C.¹² Choi, Y.¹³ Berg, T.L.¹⁴

22
- 85162522202
- Im2Text: Describing images using 1 million captioned photographs
- Vicente Ordonez, Girish Kulkarni, and Tamara L Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In Advances in Neural Information Processing Systems, pages 1143–1151.
- (2011) Advances in Neural Information Processing Systems , pp. 1143-1151
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

23
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- Association for Computational Linguistics
- Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting on association for computational linguistics, pages 311–318. Association for Computational Linguistics.
- (2002) Proceedings of the 40th Annual Meeting on Association for Computational Linguistics , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

24
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models
- Bryan A Plummer, Liwei Wang, Chris M Cervantes, Juan C Caicedo, Julia Hockenmaier, and Svetlana Lazebnik. 2015. Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models. In Proceedings of the IEEE international conference on computer vision, pages 2641–2649.
- (2015) Proceedings of the IEEE International Conference on Computer Vision , pp. 2641-2649
- Plummer, B.A.¹ Wang, L.² Cervantes, C.M.³ Caicedo, J.C.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

25
- 84959897086
- Combining geometric, textual and visual features for predicting prepositions in image descriptions
- Association for Computational Linguistics
- Arnau Ramisa, JK Wang, Ying Lu, Emmanuel Dellandrea, Francesc Moreno-Noguer, and Robert Gaizauskas. 2015. Combining geometric, textual and visual features for predicting prepositions in image descriptions. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 214–220. Association for Computational Linguistics.
- (2015) Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 214-220
- Ramisa, A.¹ Wang, J.K.² Lu, Y.³ Dellandrea, E.⁴ Moreno-Noguer, F.⁵ Gaizauskas, R.⁶

26
- 85041900441
- YOLO9000: Better, faster, stronger
- Joseph Redmon and Ali Farhadi. 2017. YOLO9000: better, faster, stronger. In Computer Vision and Pattern Recognition (CVPR).
- (2017) Computer Vision and Pattern Recognition (CVPR)
- Redmon, J.¹ Farhadi, A.²

27
- 84990068682
- Grounding of textual phrases in images by reconstruction
- Springer
- Anna Rohrbach, Marcus Rohrbach, Ronghang Hu, Trevor Darrell, and Bernt Schiele. 2016. Grounding of textual phrases in images by reconstruction. In European Conference on Computer Vision, pages 817–834. Springer.
- (2016) European Conference on Computer Vision , pp. 817-834
- Rohrbach, A.¹ Rohrbach, M.² Hu, R.³ Darrell, T.⁴ Schiele, B.⁵

28
- 84947041871
- Imagenet large scale visual recognition challenge
- Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252.
- (2015) International Journal of Computer Vision , vol.115 , Issue.3 , pp. 211-252
- Russakovsky, O.¹ Deng, J.² Su, H.³ Krause, J.⁴ Satheesh, S.⁵ Ma, S.⁶ Huang, Z.⁷ Karpathy, A.⁸ Khosla, A.⁹ Bernstein, M.¹⁰

29
- 85072830763
- Word ordering without syntax
- Austin, Texas
- Allen Schmaltz, Alexander M. Rush, and Stuart Shieber. 2016. Word ordering without syntax. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2319–2324, Austin, Texas.
- (2016) Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 2319-2324
- Schmaltz, A.¹ Rush, A.M.² Shieber, S.³

30
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. International Conference on Learning Representations (ICLR).
- (2015) International Conference on Learning Representations (ICLR)
- Simonyan, K.¹ Zisserman, A.²

31
- 85119692453
- Automatically extracting and representing collocations for language generation
- Frank A Smadja and Kathleen R McKeown. 1990. Automatically extracting and representing collocations for language generation. In Annual meeting of the Association for Computational Linguistics (ACL), pages 252–259.
- (1990) Annual Meeting of the Association for Computational Linguistics (ACL) , pp. 252-259
- Smadja, F.A.¹ McKeown, K.R.²

32
- 84956980995
- CiDer: Consensus-based image description evaluation
- Ramakrishna Vedantam, C Lawrence Zitnick, and Devi Parikh. 2015a. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4566–4575.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 4566-4575
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

33
- 84973926486
- Learning common sense through visual abstraction
- Ramakrishna Vedantam, Xiao Lin, Tanmay Batra, C Lawrence Zitnick, and Devi Parikh. 2015b. Learning common sense through visual abstraction. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), pages 2542–2550.
- (2015) Proceedings of the IEEE International Conference on Computer Vision (ICCV) , pp. 2542-2550
- Vedantam, R.¹ Lin, X.² Batra, T.³ Lawrence Zitnick, C.⁴ Parikh, D.⁵

34
- 84946747440
- Show and tell: A neural image caption generator
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3156–3164.
- (2015) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

35
- 84959897734
- Semantically conditioned lstm-based natural language generation for spoken dialogue systems
- Lisbon, Portugal
- Tsung-Hsien Wen, Milica Gasic, Nikola Mrkšić, Pei-Hao Su, David Vandyke, and Steve Young. 2015. Semantically conditioned lstm-based natural language generation for spoken dialogue systems. In Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1711–1721, Lisbon, Portugal.
- (2015) Conference on Empirical Methods in Natural Language Processing (EMNLP) , pp. 1711-1721
- Wen, T.-H.¹ Gasic, M.² Mrkšić, N.³ Su, P.-H.⁴ Vandyke, D.⁵ Young, S.⁶

36
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In International Conference on Machine Learning, pages 2048–2057.
- (2015) International Conference on Machine Learning , pp. 2048-2057
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhudinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

37
- 85046865379
- Oracle performance for visual captioning
- Li Yao, Nicolas Ballas, Kyunghyun Cho, John R. Smith, and Yoshua Bengio. 2016a. Oracle performance for visual captioning. In British Machine Vision Conference (BMVC).
- (2016) British Machine Vision Conference (BMVC)
- Yao, L.¹ Ballas, N.² Cho, K.³ Smith, J.R.⁴ Bengio, Y.⁵

38
- 85029380574
- arXiv preprint
- Ting Yao, Yingwei Pan, Yehao Li, Zhaofan Qiu, and Tao Mei. 2016b. Boosting image captioning with attributes. arXiv preprint arXiv:1611.01646.
- (2016) Boosting Image Captioning with Attributes
- Yao, T.¹ Pan, Y.² Li, Y.³ Qiu, Z.⁴ Mei, T.⁵

39
- 84994129838
- Stating the obvious: Extracting visual common sense knowledge
- San Diego, California. Association for Computational Linguistics
- Mark Yatskar, Vicente Ordonez, and Ali Farhadi. 2016. Stating the obvious: Extracting visual common sense knowledge. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 193–198, San Diego, California. Association for Computational Linguistics.
- (2016) Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , pp. 193-198
- Yatskar, M.¹ Ordonez, V.² Farhadi, A.³

40
- 84986317307
- Image captioning with semantic attention
- Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4651–4659.
- (2016) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pp. 4651-4659
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

41
- 84887338442
- Bringing semantics into focus using visual abstraction
- C Lawrence Zitnick and Devi Parikh. 2013. Bringing semantics into focus using visual abstraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 3009–3016.
- (2013) Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pp. 3009-3016
- Lawrence Zitnick, C.¹ Parikh, D.²

42
- 84898772194
- Learning the visual interpretation of sentences
- C Lawrence Zitnick, Devi Parikh, and Lucy Vanderwende. 2013. Learning the visual interpretation of sentences. In Proceedings of the IEEE International Conference on Computer Vision, pages 1681–1688.
- (2013) Proceedings of the IEEE International Conference on Computer Vision , pp. 1681-1688
- Lawrence Zitnick, C.¹ Parikh, D.² Vanderwende, L.³

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.