SCOPUS 정보 검색 플랫폼

MM 2018 - Proceedings of the 2018 ACM Multimedia Conference

Volumn , Issue , 2018, Pages 1029-1037

Decoupled novel object captioner

(4) Wu, Yu a Zhu, Linchao a Jiang, Lu b Yang, Yi a,c

a UNIVERSITY OF TECHNOLOGY SYDNEY (Australia)

b GOOGLE INC (United States)

c INSTITUTE OF GEOLOGY AND GEOPHYSICS (China)

Author keywords

Image captioning; Novel object; Novel object captioning

Indexed keywords

DETECTION MODELS; IMAGE CAPTIONING; NOVEL OBJECT; NOVEL OBJECT CAPTIONING; OBJECT CATEGORIES; OBJECT DESCRIPTION; SEQUENCE MODELING; VISUAL INFORMATION;

EID: 85058217998 PISSN: None EISSN: None Source Type: Conference Proceeding
DOI: 10.1145/3240508.3240640 Document Type: Conference Paper

Times cited : (64)

References (43)

1
- 85075670920
- TensorFlow: A system for large-scale machine learning
- Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. TensorFlow: A System for Large-Scale Machine Learning.. In OSDI, Vol. 16. 265-283.
- (2016) OSDI , vol.16 , pp. 265-283
- Abadi, M.¹ Barham, P.² Chen, J.³ Chen, Z.⁴ Davis, A.⁵ Dean, J.⁶ Devin, M.⁷ Ghemawat, S.⁸ Irving, G.⁹ Isard, M.¹⁰

2
- 85048487879
- Guided open vocabulary image captioning with constrained beam search
- Peter Anderson, Basura Fernando, Mark Johnson, and Stephen Gould. 2017. Guided open vocabulary image captioning with constrained beam search. In EMNLP.
- (2017) EMNLP
- Anderson, P.¹ Fernando, B.² Johnson, M.³ Gould, S.⁴

3
- 84986274522
- Deep compositional captioning: Describing novel object categories without paired training data
- Lisa Anne Henzdricks, Subhashini Venugopalan, Marcus Rohrbach, Raymond Mooney, Kate Saenko, Trevor Darrell, Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, et al. 2016. Deep compositional captioning: Describing novel object categories without paired training data. In CVPR.
- (2016) CVPR
- Henzdricks, L.A.¹ Venugopalan, S.² Rohrbach, M.³ Mooney, R.⁴ Saenko, K.⁵ Darrell, T.⁶ Mao, J.⁷ Huang, J.⁸ Toshev, A.⁹ Camburu, O.¹⁰

4
- 85116156579
- Meteor: An automatic metric for MT evaluation with improved correlation with human judgments
- Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In ACL-W. 65-72.
- (2005) ACL-W , pp. 65-72
- Banerjee, S.¹ Lavie, A.²

5
- 84965179228
- Scheduled sampling for sequence prediction with recurrent neural networks
- Samy Bengio, Oriol Vinyals, Navdeep Jaitly, and Noam Shazeer. 2015. Scheduled sampling for sequence prediction with recurrent neural networks. In NIPS. 1171-1179.
- (2015) NIPS , pp. 1171-1179
- Bengio, S.¹ Vinyals, O.² Jaitly, N.³ Shazeer, N.⁴

6
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- Jeffrey Donahue, Lisa Anne Hendricks, Sergio Guadarrama, Marcus Rohrbach, Subhashini Venugopalan, Kate Saenko, and Trevor Darrell. 2015. Long-term recurrent convolutional networks for visual recognition and description. In CVPR. 2625-2634.
- (2015) CVPR , pp. 2625-2634
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

7
- 85058215742
- Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
- Xuanyi Dong, Linchao Zhu, De Zhang, Yi Yang, and Fei Wu. 2018. Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering. In ACM on Multimedia.
- (2018) ACM on Multimedia
- Dong, X.¹ Zhu, L.² Zhang, D.³ Yang, Y.⁴ Wu, F.⁵

8
- 78149311145
- Every picture tells a story: Generating sentences from images
- Ali Farhadi, Mohsen Hejrati, Mohammad Amin Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth. 2010. Every picture tells a story: Generating sentences from images. In ECCV. 15-29.
- (2010) ECCV , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

9
- 85046762258
- Model-agnostic meta-learning for fast adaptation of deep networks
- Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In ICML. 1126-1135.
- (2017) ICML , pp. 1126-1135
- Finn, C.¹ Abbeel, P.² Levine, S.³

10
- 0031573117
- Long short-term memory
- 1997
- Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735-1780.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

11
- 85041891404
- Speed/accuracy trade-offs for modern convolutional object detectors
- Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al. 2017. Speed/accuracy trade-offs for modern convolutional object detectors. In CVPR.
- (2017) CVPR
- Huang, J.¹ Rathod, V.² Sun, C.³ Zhu, M.⁴ Korattikara, A.⁵ Fathi, A.⁶ Fischer, I.⁷ Wojna, Z.⁸ Song, Y.⁹ Guadarrama, S.¹⁰

12
- 84962449644
- Bridging the ultimate semantic gap: A semantic search engine for internet videos
- Lu Jiang, Shoou-I Yu, Deyu Meng, Teruko Mitamura, and Alexander G Hauptmann. 2015. Bridging the ultimate semantic gap: A semantic search engine for internet videos. In ICMR. 27-34.
- (2015) ICMR , pp. 27-34
- Jiang, L.¹ Yu, S.-I.² Meng, D.³ Mitamura, T.⁴ Hauptmann, A.G.⁵

13
- 84962796449
- Fast and accurate content-based semantic search in 100m internet videos
- Lu Jiang, Shoou-I Yu, Deyu Meng, Yi Yang, Teruko Mitamura, and Alexander G Hauptmann. 2015. Fast and accurate content-based semantic search in 100m internet videos. In ACM on Multimedia. 49-58.
- (2015) ACM on Multimedia , pp. 49-58
- Jiang, L.¹ Yu, S.-I.² Meng, D.³ Yang, Y.⁴ Mitamura, T.⁵ Hauptmann, A.G.⁶

14
- 84986245786
- DenseCap: Fully convolutional localization networks for dense captioning
- Justin Johnson, Andrej Karpathy, and Li Fei-Fei. 2016. Densecap: Fully convolutional localization networks for dense captioning. In CVPR. 4565-4574.
- (2016) CVPR , pp. 4565-4574
- Johnson, J.¹ Karpathy, A.² Fei-Fei, L.³

15
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- Andrej Karpathy and Li Fei-Fei. 2015. Deep visual-semantic alignments for generating image descriptions. In CVPR. 3128-3137.
- (2015) CVPR , pp. 3128-3137
- Karpathy, A.¹ Fei-Fei, L.²

16
- 85083951076
- ADaM: A method for stochastic optimization
- Diederik P Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In ICLR.
- (2015) ICLR
- Kingma, D.P.¹ Ba, J.²

17
- 84929363334
- Multimodal neural language models
- Ryan Kiros, Ruslan Salakhutdinov, and Rich Zemel. 2014. Multimodal neural language models. In ICML. 595-603.
- (2014) ICML , pp. 595-603
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.³

18
- 84887601544
- BabyTalk: Understanding and generating simple image descriptions
- 2013
- Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C Berg, and Tamara L Berg. 2013. Babytalk: Understanding and generating simple image descriptions. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 12 (2013), 2891-2903.
- (2013) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.35 , Issue.12 , pp. 2891-2903
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.L.⁸

19
- 84894522762
- Attribute-based classification for zero-shot visual object categorization
- 2014
- Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2014. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 453-465.
- (2014) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.36 , Issue.3 , pp. 453-465
- Lampert, C.H.¹ Nickisch, H.² Harmeling, S.³

20
- 84906493406
- Microsoft coco: Common objects in context
- Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In ECCV. 740-755.
- (2014) ECCV , pp. 740-755
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Lawrence Zitnick, C.⁸

21
- 85058234384
- Neural baby talk
- Jiasen Lu, Jianwei Yang, Dhruv Batra, and Devi Parikh. 2018. Neural Baby Talk. In CVPR. 7219-7228.
- (2018) CVPR , pp. 7219-7228
- Lu, J.¹ Yang, J.² Batra, D.³ Parikh, D.⁴

22
- 84973863256
- Learning like a child: Fast novel visual concept learning from sentence descriptions of images
- Junhua Mao, Xu Wei, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan L Yuille. 2015. Learning like a child: Fast novel visual concept learning from sentence descriptions of images. In ICCV. 2533-2541.
- (2015) ICCV , pp. 2533-2541
- Mao, J.¹ Wei, X.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.L.⁶

23
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-RNN)
- 2015
- Junhua Mao, Wei Xu, Yi Yang, Jiang Wang, Zhiheng Huang, and Alan Yuille. 2015. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). ICLR (2015).
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

24
- 34248833974
- Introduction to WordNet: An on-line lexical database
- 1990
- George A Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J Miller. 1990. Introduction to WordNet: An on-line lexical database. International journal of lexicography 3, 4 (1990), 235-244.
- (1990) International Journal of Lexicography , vol.3 , Issue.4 , pp. 235-244
- Miller, G.A.¹ Beckwith, R.² Fellbaum, C.³ Gross, D.⁴ Miller, K.J.⁵

25
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- Margaret Mitchell, Xufeng Han, Jesse Dodge, Alyssa Mensch, Amit Goyal, Alex Berg, Kota Yamaguchi, Tamara Berg, Karl Stratos, and Hal Daumé III. 2012. Midge: Generating Image Descriptions From Computer Vision Detections. In EACL. 747-756.
- (2012) EACL , pp. 747-756
- Mitchell, M.¹ Han, X.² Dodge, J.³ Mensch, A.⁴ Goyal, A.⁵ Berg, A.⁶ Yamaguchi, K.⁷ Berg, T.⁸ Stratos, K.⁹ Daumé, H.¹⁰

26
- 85162522202
- Im2Text: Describing images using 1 million captioned photographs
- Vicente Ordonez, Girish Kulkarni, and Tamara L Berg. 2011. Im2text: Describing images using 1 million captioned photographs. In NIPS. 1143-1151.
- (2011) NIPS , pp. 1143-1151
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

27
- 85083951479
- Sequence level training with recurrent neural networks
- Marc'Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. 2016. Sequence level training with recurrent neural networks. In ICLR.
- (2016) ICLR
- Ranzato, M.¹ Chopra, S.² Auli, M.³ Zaremba, W.⁴

28
- 84960980241
- Faster R-CNN: Towards real-time object detection with region proposal networks
- Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. 2015. Faster r-cnn: Towards real-time object detection with region proposal networks. In NIPS. 91-99.
- (2015) NIPS , pp. 91-99
- Ren, S.¹ He, K.² Girshick, R.³ Sun, J.⁴

29
- 80052892795
- Evaluating knowledge transfer and zero-shot learning in a large-scale setting
- Marcus Rohrbach, Michael Stark, and Bernt Schiele. 2011. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In CVPR. 1641-1648.
- (2011) CVPR , pp. 1641-1648
- Rohrbach, M.¹ Stark, M.² Schiele, B.³

30
- 85058212593
- One-shot learning with memory-augmented neural networks
- 2016
- Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, and Timothy Lillicrap. 2016. One-shot learning with memory-augmented neural networks. NIPS-W (2016).
- (2016) NIPS-W
- Santoro, A.¹ Bartunov, S.² Botvinick, M.³ Wierstra, D.⁴ Lillicrap, T.⁵

31
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

32
- 85028013193
- Inception-v4, inception-resnet and the impact of residual connections on learning
- Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, and Alexander A Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.
- (2017) AAAI
- Szegedy, C.¹ Ioffe, S.² Vanhoucke, V.³ Alemi, A.A.⁴

33
- 85041928364
- Paying attention to descriptions generated by image captioning models
- Hamed R Tavakoliy, Rakshith Shetty, Ali Borji, and Jorma Laaksonen. 2017. Paying Attention to Descriptions Generated by Image Captioning Models. In ICCV. 2506-2515.
- (2017) ICCV , pp. 2506-2515
- Tavakoliy, H.R.¹ Shetty, R.² Borji, A.³ Laaksonen, J.⁴

34
- 85044269789
- Captioning images with diverse objects
- Subhashini Venugopalan, Lisa Anne Hendricks, Marcus Rohrbach, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2017. Captioning Images with Diverse Objects. In CVPR.
- (2017) CVPR
- Venugopalan, S.¹ Hendricks, L.A.² Rohrbach, M.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

35
- 84973882730
- Sequence to sequence-video to text
- Subhashini Venugopalan, Marcus Rohrbach, Jeffrey Donahue, Raymond Mooney, Trevor Darrell, and Kate Saenko. 2015. Sequence to sequence-video to text. In ICCV. 4534-4542.
- (2015) ICCV , pp. 4534-4542
- Venugopalan, S.¹ Rohrbach, M.² Donahue, J.³ Mooney, R.⁴ Darrell, T.⁵ Saenko, K.⁶

36
- 85018863845
- Matching networks for one shot learning
- Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. 2016. Matching networks for one shot learning. In NIPS. 3630-3638.
- (2016) NIPS , pp. 3630-3638
- Vinyals, O.¹ Blundell, C.² Lillicrap, T.³ Wierstra, D.⁴

37
- 84946747440
- Show and tell: A neural image caption generator
- Oriol Vinyals, Alexander Toshev, Samy Bengio, and Dumitru Erhan. 2015. Show and tell: A neural image caption generator. In CVPR. 3156-3164.
- (2015) CVPR , pp. 3156-3164
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

38
- 85015770940
- Show and Tell: Lessons learned from the 2015 MSCOCO image captioning challenge
- April 2017
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. 2017. Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4 (April 2017), 652-663.
- (2017) IEEE Transactions on Pattern Analysis and Machine Intelligence , vol.39 , Issue.4 , pp. 652-663
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

39
- 85050180490
- Zero-shot learning - A comprehensive evaluation of the good, the bad and the ugly
- 2018
- Y. Xian, C. H. Lampert, B. Schiele, and Z. Akata. 2018. Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence (2018), 1-1. https://doi.org/10.1109/TPAMI.2018.2857768
- (2018) IEEE Transactions on Pattern Analysis and Machine Intelligence , pp. 1
- Xian, Y.¹ Lampert, C.H.² Schiele, B.³ Akata, Z.⁴

40
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- Kelvin Xu, Jimmy Ba, Ryan Kiros, Kyunghyun Cho, Aaron Courville, Ruslan Salakhudinov, Rich Zemel, and Yoshua Bengio. 2015. Show, attend and tell: Neural image caption generation with visual attention. In ICML. 2048-2057.
- (2015) ICML , pp. 2048-2057
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, K.⁴ Courville, A.⁵ Salakhudinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

41
- 85029391966
- Incorporating copying mechanism in image captioning for learning novel objects
- Ting Yao, Yingwei Pan, Yehao Li, and Tao Mei. 2017. Incorporating copying mechanism in image captioning for learning novel objects. In CVPR. 5263-5271.
- (2017) CVPR , pp. 5263-5271
- Yao, T.¹ Pan, Y.² Li, Y.³ Mei, T.⁴

42
- 84986317307
- Image captioning with semantic attention
- Quanzeng You, Hailin Jin, Zhaowen Wang, Chen Fang, and Jiebo Luo. 2016. Image captioning with semantic attention. In CVPR. 4651-4659.
- (2016) CVPR , pp. 4651-4659
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

43
- 85023773582
- Uncovering the temporal context for video question answering
- 01 Sep 2017
- Linchao Zhu, Zhongwen Xu, Yi Yang, and Alexander G. Hauptmann. 2017. Uncovering the Temporal Context for Video Question Answering. International Journal of Computer Vision 124, 3 (01 Sep 2017), 409-421. https://doi.org/10.1007/s11263-017-1033-7
- (2017) International Journal of Computer Vision , vol.124 , Issue.3 , pp. 409-421
- Zhu, L.¹ Xu, Z.² Yang, Y.³ Hauptmann, A.G.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.