SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 11-20

Generation and comprehension of unambiguous object descriptions

(6) Mao, Junhua b Huang, Jonathan a Toshev, Alexander a Camburu, Oana c Yuille, Alan b,d Murphy, Kevin a

a GOOGLE INC (United States)

b UNIVERSITY OF CALIFORNIA (United States)

c UNIVERSITY OF OXFORD (United Kingdom)

d JOHNS HOPKINS UNIVERSITY (United States)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION;

DEEP LEARNING; IMAGE CAPTIONING; LARGE-SCALE DATASET; OBJECT DESCRIPTION; OBJECTIVE EVALUATION; REFERRING EXPRESSIONS;

PATTERN RECOGNITION;

EID: 84986260074 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.9 Document Type: Conference Paper

Times cited : (1407)

References (56)

1
- 85009846866
- Ms coco captioning challenge. http: //mscoco. org/dataset/#captions-challenge2015.
- Ms Coco Captioning Challenge

2
- 84959502295
- arXiv, 2
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. Vqa: Visual question answering. ArXiv, 2015.
- (2015) Vqa: Visual Question Answering
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.L.⁶ Parikh, D.⁷

3
- 0022890536
- Maximum mutual information estimation of hidden Markov model parameters for speech recognition
- Apr.
- L. Bahl, P. Brown, P. V. de Souza, and R. Mercer. Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In ICASSP, volume 11, pages 49-52, Apr. 1986.
- (1986) ICASSP , vol.11 , pp. 49-52
- Bahl, L.¹ Brown, P.² De Souza, P.V.³ Mercer, R.⁴

4
- 84986255868
- arXiv preprint arXiv: 1508. 06161, 2
- D. P. Barrett, S. A. Bronikowski, H. Yu, and J. M. Siskind. Robot language learning, generation, and comprehension. ArXiv preprint arXiv: 1508. 06161, 2015.
- (2015) Robot Language Learning, Generation, and Comprehension
- Barrett, D.P.¹ Bronikowski, S.A.² Yu, H.³ Siskind, J.M.⁴

5
- 84957029470
- Mind's eye: A recurrent visual representation for image caption generation
- X. Chen and C. L. Zitnick. Mind's eye: A recurrent visual representation for image caption generation. In CVPR, 2015.
- (2015) CVPR , vol.1 , pp. 2
- Chen, X.¹ Zitnick, C.L.²

6
- 84905732098
- ImageSpirit: Verbal guided image parsing
- 2
- M.-M. Cheng, S. Zheng, W.-Y. Lin, V. Vineet, P. Sturgess, N. Crook, N. J. Mitra, and P. Torr. ImageSpirit: Verbal guided image parsing. ACM Trans. Graphics, 2014.
- (2014) ACM Trans. Graphics
- Cheng, M.-M.¹ Zheng, S.² Lin, W.-Y.³ Vineet, V.⁴ Sturgess, P.⁵ Crook, N.⁶ Mitra, N.J.⁷ Torr, P.⁸

7
- 85198028989
- Imagenet: A large-scale hierarchical image database
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In CVPR, pages 248-255, 2009.
- (2009) CVPR , vol.4 , pp. 248-255
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

8
- 84965102873
- arXiv preprint arXiv: 1505. 04467, 1
- J. Devlin, S. Gupta, R. Girshick, M. Mitchell, and C. L. Zitnick. Exploring nearest neighbor approaches for image captioning. ArXiv preprint arXiv: 1505. 04467, 2015.
- (2015) Exploring Nearest Neighbor Approaches for Image Captioning
- Devlin, J.¹ Gupta, S.² Girshick, R.³ Mitchell, M.⁴ Zitnick, C.L.⁵

9
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- 1, 2, 4
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In CVPR, 2015.
- (2015) CVPR
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

10
- 84911443425
- Scalable object detection using deep neural networks
- 4
- D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov. Scalable object detection using deep neural networks. In CVPR, pages 2155-2162, 2014.
- (2014) CVPR , pp. 2155-2162
- Erhan, D.¹ Szegedy, C.² Toshev, A.³ Anguelov, D.⁴

11
- 77649188328
- The segmented and annotated IAPR TC-12 benchmark
- 2
- H. J. Escalante, C. A. Hernandez, J. A. Gonzalez, A. Lopez-Lopez, M. Montes, E. F. Morales, L. E. Sucar, L. Villasenor, and M. Grubinger. The segmented and annotated IAPR TC-12 benchmark. CVIU, 2010.
- (2010) CVIU
- Escalante, H.J.¹ Hernandez, C.A.² Gonzalez, J.A.³ Lopez-Lopez, A.⁴ Montes, M.⁵ Morales, E.F.⁶ Sucar, L.E.⁷ Villasenor, L.⁸ Grubinger, M.⁹

12
- 84959250180
- From captions to visual concepts and back
- 1, 2
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. In CVPR, 2015.
- (2015) CVPR
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

13
- 78149311145
- Every picture tells a story: Generating sentences from images
- 1, 2
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In ECCV, pages 15-29. 2010.
- (2010) ECCV , pp. 15-29
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

14
- 84908171707
- Learning distributions over logical forms for referring expression generation
- 1, 2
- N. FitzGerald, Y. Artzi, and L. S. Zettlemoyer. Learning distributions over logical forms for referring expression generation. In EMNLP, pages 1914-1925, 2013.
- (2013) EMNLP , pp. 1914-1925
- FitzGerald, N.¹ Artzi, Y.² Zettlemoyer, L.S.³

15
- 84965148420
- Are you talking to a machine dataset and methods for multilingual image question answering
- 2
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are you talking to a machine dataset and methods for multilingual image question answering. In NIPS, 2015.
- (2015) NIPS
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

16
- 84925422907
- Visual turing test for computer vision systems
- 2
- D. Geman, S. Geman, N. Hallonquist, and L. Younes. Visual turing test for computer vision systems. PANS, 112 (12): 3618-3623, 2015.
- (2015) PANS , vol.112 , Issue.12 , pp. 3618-3623
- Geman, D.¹ Geman, S.² Hallonquist, N.³ Younes, L.⁴

17
- 84911400494
- Rich feature hierarchies for accurate object detection and semantic segmentation
- 4
- R. Girshick, J. Donahue, T. Darrell, and J. Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. In CVPR, 2014.
- (2014) CVPR
- Girshick, R.¹ Donahue, J.² Darrell, T.³ Malik, J.⁴

18
- 84954299658
- From the virtual to the real world: Referring to objects in Real-World spatial scenes
- 2
- D. Gkatzia, V. Rieser, P. Bartie, andW. Mackaness. From the virtual to the real world: Referring to objects in Real-World spatial scenes. In EMNLP, 2015.
- (2015) EMNLP
- Gkatzia, D.¹ Rieser, V.² Bartie, P.³ Mackaness, W.⁴

19
- 80053265931
- A game-theoretic approach to generating spatial descriptions
- 1, 2, 5
- D. Golland, P. Liang, and D. Klein. A game-theoretic approach to generating spatial descriptions. In EMNLP, pages 410-419, 2010.
- (2010) EMNLP , pp. 410-419
- Golland, D.¹ Liang, P.² Klein, D.³

20
- 84905579579
- Probabilistic semantics and pragmatics: Uncertainty in language and thought
- Wiley-Blackwell, 2
- N. D. Goodman and D. Lassiter. Probabilistic semantics and pragmatics: Uncertainty in language and thought. Handbook of Contemporary Semantic Theory. Wiley-Blackwell, 2014.
- (2014) Handbook of Contemporary Semantic Theory
- Goodman, N.D.¹ Lassiter, D.²

21
- 85009871094
- Lstm: A search space odyssey
- 4
- K. Greff, R. K. Srivastava, J. Koutnk, B. R. Steunebrink, and J. Schmidhuber. Lstm: A search space odyssey. In ICML, 2015.
- (2015) ICML
- Greff, K.¹ Srivastava, R.K.² Koutnk, J.³ Steunebrink, B.R.⁴ Schmidhuber, J.⁵

22
- 85009917737
- Logic and conversation
- 2
- H. P. Grice. Logic and conversation. na, 1970.
- (1970) Na
- Grice, H.P.¹

23
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 2
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, 47: 853-899, 2013.
- (2013) JAIR , vol.47 , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

24
- 84986305787
- Natural language object retrieval
- 2
- R. Hu, H. Xu, M. Rohrbach, J. Feng, K. Saenko, and T. Darrell. Natural language object retrieval. CVPR, 2016.
- (2016) CVPR
- Hu, R.¹ Xu, H.² Rohrbach, M.³ Feng, J.⁴ Saenko, K.⁵ Darrell, T.⁶

25
- 84986278097
- arXiv preprint arXiv: 1511. 07571, 2
- J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. ArXiv preprint arXiv: 1511. 07571, 2015.
- (2015) Densecap: Fully Convolutional Localization Networks for Dense Captioning
- Johnson, J.¹ Karpathy, A.² Fei-Fei, L.³

26
- 84942676733
- arXiv preprint arXiv: 1412. 2306, 1, 2
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. ArXiv preprint arXiv: 1412. 2306, 2014.
- (2014) Deep Visual-semantic Alignments for Generating Image Descriptions
- Karpathy, A.¹ Fei-Fei, L.²

27
- 84943540775
- Referitgame: Referring to objects in photographs of natural scenes
- 1, 2, 3
- S. Kazemzadeh, V. Ordonez, M. Matten, and T. L. Berg. Referitgame: Referring to objects in photographs of natural scenes. In EMNLP, pages 787-798, 2014.
- (2014) EMNLP , pp. 787-798
- Kazemzadeh, S.¹ Ordonez, V.² Matten, M.³ Berg, T.L.⁴

28
- 84944113729
- arXiv preprint arXiv: 1411. 2539, 1, 2
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. ArXiv preprint arXiv: 1411. 2539, 2014.
- (2014) Unifying Visual-semantic Embeddings with Multimodal Neural Language Models
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

29
- 84919921461
- Multimodal neural language models
- 4
- R. Kiros, R. Zemel, and R. Salakhutdinov. Multimodal neural language models. In ICML, 2014.
- (2014) ICML
- Kiros, R.¹ Zemel, R.² Salakhutdinov, R.³

30
- 0041856430
- Efficient context-sensitive generation of referring expressions
- 2
- E. Krahmer and M. Theune. Efficient context-sensitive generation of referring expressions. Information sharing: Reference and presupposition in language generation and interpretation, 143: 223-263, 2002.
- (2002) Information Sharing: Reference and Presupposition in Language Generation and Interpretation , vol.143 , pp. 223-263
- Krahmer, E.¹ Theune, M.²

31
- 84856184938
- Computational generation of referring expressions: A survey
- 1, 2
- E. Krahmer and K. van Deemter. Computational generation of referring expressions: A survey. Comp. Linguistics, 38, 2012.
- (2012) Comp. Linguistics , vol.38
- Krahmer, E.¹ Van Deemter, K.²

32
- 84978730111
- arXiv preprint arXiv: 1602. 07332, 2
- R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L.-J. Li, D. A. Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. ArXiv preprint arXiv: 1602. 07332, 2016.
- (2016) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
- Krishna, R.¹ Zhu, Y.² Groth, O.³ Johnson, J.⁴ Hata, K.⁵ Kravitz, J.⁶ Chen, S.⁷ Kalantidis, Y.⁸ Li, L.-J.⁹ Shamma, D.A.¹⁰

33
- 84876231242
- Imagenet classification with deep convolutional neural networks
- 4
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, pages 1097-1105, 2012.
- (2012) NIPS , pp. 1097-1105
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

34
- 80052901011
- Baby talk: Understanding and generating image descriptions
- 2
- G. Kulkarni, V. Premraj, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. Baby talk: Understanding and generating image descriptions. In CVPR, 2011.
- (2011) CVPR
- Kulkarni, G.¹ Premraj, V.² Dhar, S.³ Li, S.⁴ Choi, Y.⁵ Berg, A.C.⁶ Berg, T.L.⁷

35
- 85120046073
- Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgements
- 6
- A. Lavie and A. Agarwal. Meteor: An automatic metric for mt evaluation with high levels of correlation with human judgements. In Workshop on Statistical Machine Translation, pages 228-231, 2007.
- (2007) Workshop on Statistical Machine Translation , pp. 228-231
- Lavie, A.¹ Agarwal, A.²

36
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- 2
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In CoNLL, pages 220-228, 2011.
- (2011) CoNLL , pp. 220-228
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

37
- 85009931853
- Microsoft coco: Common objects in context
- 2, 3
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014.
- (2014) ECCV
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

38
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- 2
- M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In NIPS, pages 1682-1690, 2014.
- (2014) NIPS , pp. 1682-1690
- Malinowski, M.¹ Fritz, M.²

39
- 84986313218
- Ask your neurons: A neural-based approach to answering questions about images
- 2
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask your neurons: A neural-based approach to answering questions about images. In NIPS, 2015.
- (2015) NIPS
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

40
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-rnn)
- 1, 2, 4
- J. Mao, W. Xu, Y. Yang, J. Wang, Z. Huang, and A. Yuille. Deep captioning with multimodal recurrent neural networks (m-rnn). In ICLR, 2015.
- (2015) ICLR
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Huang, Z.⁵ Yuille, A.⁶

41
- 84858142989
- Natural reference to objects in a visual domain
- 1, 2
- M. Mitchell, K. van Deemter, and E. Reiter. Natural reference to objects in a visual domain. In INLG, pages 95-104, 2010.
- (2010) INLG , pp. 95-104
- Mitchell, M.¹ Van Deemter, K.² Reiter, E.³

42
- 84908171705
- Generating expressions that refer to visible objects
- 1, 2
- M. Mitchell, K. van Deemter, and E. Reiter. Generating expressions that refer to visible objects. In HLT-NAACL, pages 1174-1184, 2013.
- (2013) HLT-NAACL , pp. 1174-1184
- Mitchell, M.¹ Van Deemter, K.² Reiter, E.³

43
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- 2
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In NIPS, 2011.
- (2011) NIPS
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

44
- 85133336275
- Bleu: A method for automatic evaluation of machine translation
- 6
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: A method for automatic evaluation of machine translation. In ACL, pages 311-318, 2002.
- (2002) ACL , pp. 311-318
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

45
- 84973856017
- Flickr30k entities: Collecting region-to-phrase correspondences for richer imageto-sentence models
- 2
- B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Hockenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase correspondences for richer imageto-sentence models. In ICCV, 2015.
- (2015) ICCV
- Plummer, B.A.¹ Wang, L.² Cervantes, C.M.³ Caicedo, J.C.⁴ Hockenmaier, J.⁵ Lazebnik, S.⁶

46
- 80052889458
- Recognition using visual phrases
- 2
- M. A. Sadeghi and A. Farhadi. Recognition using visual phrases. In CVPR, 2011.
- (2011) CVPR
- Sadeghi, M.A.¹ Farhadi, A.²

47
- 84866654828
- Image description with a goal: Building efficient discriminating expressions for images
- 2
- A. Sadovnik, Y.-I. Chiu, N. Snavely, S. Edelman, and T. Chen. Image description with a goal: Building efficient discriminating expressions for images. In CVPR, 2012.
- (2012) CVPR
- Sadovnik, A.¹ Chiu, Y.-I.² Snavely, N.³ Edelman, S.⁴ Chen, T.⁵

48
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- 4
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In ICLR, 2015.
- (2015) ICLR
- Simonyan, K.¹ Zisserman, A.²

49
- 84964474107
- Grounded compositional semantics for finding and describing images with sentences
- 2
- R. Socher, Q. Le, C. Manning, and A. Ng. Grounded compositional semantics for finding and describing images with sentences. In TACL, 2014.
- (2014) TACL
- Socher, R.¹ Le, Q.² Manning, C.³ Ng, A.⁴

50
- 84858111046
- Building a semantically transparent corpus for the generation of referring expressions
- 1, 2
- K. van Deemter, I. van der Sluis, and A. Gatt. Building a semantically transparent corpus for the generation of referring expressions. In INLG, pages 130-132, 2006.
- (2006) INLG , pp. 130-132
- Van Deemter, K.¹ Sluis Der Van, I.² Gatt, A.³

51
- 84956980995
- CIDEr: Consensus-based image description evaluation
- 6
- R. Vedantam, C. Lawrence Zitnick, and D. Parikh. CIDEr: Consensus-based image description evaluation. In CVPR, 2015.
- (2015) CVPR
- Vedantam, R.¹ Lawrence Zitnick, C.² Parikh, D.³

52
- 84858111888
- The use of spatial relations in referring expression generation
- Association for Computational Linguistics, 1, 2
- J. Viethen and R. Dale. The use of spatial relations in referring expression generation. In INLG, pages 59-67. Association for Computational Linguistics, 2008.
- (2008) INLG , pp. 59-67
- Viethen, J.¹ Dale, R.²

53
- 84946747440
- Show and tell: A neural image caption generator
- 1, 2, 4
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In CVPR, 2015.
- (2015) CVPR
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

54
- 0013359151
- Understanding natural language
- 2
- T. Winograd. Understanding natural language. Cognitive psychology, 3 (1): 1-191, 1972.
- (1972) Cognitive Psychology , vol.3 , Issue.1 , pp. 1-191
- Winograd, T.¹

55
- 84970002232
- Show, attend and tell: Neural image caption generation with visual attention
- 1, 2
- K. Xu, J. Ba, R. Kiros, C. A. Cho, Kyunghyun, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2015.
- (2015) ICML
- Xu, K.¹ Ba, J.² Kiros, R.³ Cho, C.A.⁴ Kyunghyun⁵ Salakhutdinov, R.⁶ Zemel, R.⁷ Bengio, Y.⁸

56
- 80053258778
- Corpus-guided sentence generation of natural images
- 2
- Y. Yang, C. L. Teo, H. Daumé III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In EMNLP, pages 444-454, 2011.
- (2011) EMNLP , pp. 444-454
- Yang, Y.¹ Teo, C.L.² Daumé, H.³ Aloimonos, Y.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.