SCOPUS 정보 검색 플랫폼

Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Volumn 2016-December, Issue , 2016, Pages 203-212

What value do explicit high level concepts have in vision to language problems?

(5) Wu, Qi a Shen, Chunhua a Liu, Lingqiao a Dick, Anthony a Van Den Hengel, Anton a

a UNIVERSITY OF ADELAIDE (Australia)

Author keywords

[No Author keywords available]

Indexed keywords

COMPUTER VISION; NEURAL NETWORKS; PATTERN RECOGNITION; RECURRENT NEURAL NETWORKS; SEMANTICS;

BENCHMARK DATASETS; CONVOLUTIONAL NEURAL NETWORK; EXPLICIT REPRESENTATION; HIGH LEVEL SEMANTICS; HIGH-LEVEL INFORMATION; QUESTION ANSWERING; RECURRENT NEURAL NETWORK (RNNS); SEMANTIC INFORMATION;

HIGH LEVEL LANGUAGES;

EID: 84986301177 PISSN: 10636919 EISSN: None Source Type: Conference Proceeding
DOI: 10.1109/CVPR.2016.29 Document Type: Conference Paper

Times cited : (499)

References (59)

1
- 80052886947
- Generating image descriptions using dependency relational patterns
- 2
- A. Aker and R. Gaizauskas. Generating image descriptions using dependency relational patterns. In Proc. Conf. Associ-ation for Computational Linguistics, 2010.
- (2010) Proc. Conf. Associ-ation for Computational Linguistics
- Aker, A.¹ Gaizauskas, R.²

2
- 84973890960
- VQA: Visual question answering
- 2, 3, 7
- S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. L. Zitnick, and D. Parikh. VQA: Visual Question Answering. In Proc. IEEE Int. Conf. Comp. Vis., 2015.
- (2015) Proc. IEEE Int. Conf. Comp. Vis.
- Antol, S.¹ Agrawal, A.² Lu, J.³ Mitchell, M.⁴ Batra, D.⁵ Zitnick, C.L.⁶ Parikh, D.⁷

3
- 85083953689
- Neural machine translation by jointly learning to align and translate
- 1
- D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In Proc. Int. Conf. Learn. Representations, 2015.
- (2015) Proc. Int. Conf. Learn. Representations
- Bahdanau, D.¹ Cho, K.² Bengio, Y.³

4
- 85116156579
- METEOR: An automatic metric for MT evaluation with improved correlation with human judgments
- 6
- S. Banerjee and A. Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, 2005.
- (2005) Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation And/or Summarization
- Banerjee, S.¹ Lavie, A.²

5
- 84952349295
- arXiv: 1504. 00325, 6
- X. Chen, H. Fang, T.-Y. Lin, R. Vedantam, S. Gupta, P. Dollar, and C. L. Zitnick. Microsoft COCO captions: Data collection and evaluation server. ArXiv: 1504. 00325, 2015.
- (2015) Microsoft COCO Captions: Data Collection and Evaluation Server
- Chen, X.¹ Fang, H.² Lin, T.-Y.³ Vedantam, R.⁴ Gupta, S.⁵ Dollar, P.⁶ Zitnick, C.L.⁷

6
- 84957029470
- Mind's Eye: A recurrent visual representation for image caption generation
- June.
- X. Chen and C. Lawrence Zitnick. Mind's Eye: A Recurrent Visual Representation for Image Caption Generation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., June 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Chen, X.¹ Lawrence Zitnick, C.²

7
- 84961291190
- Learning phrase representations using rnn encoder-decoder for statistical machine translation
- 1
- K. Cho, B. van Merrienboer, C. Gulcehre, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. In Proc. Conf. Empirical Methods in Natural Language Processing, 2014.
- (2014) Proc. Conf. Empirical Methods in Natural Language Processing
- Cho, K.¹ Van Merrienboer, B.² Gulcehre, C.³ Bougares, F.⁴ Schwenk, H.⁵ Bengio, Y.⁶

8
- 85198028989
- Imagenet: A large-scale hierarchical image database
- 4
- J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2009.
- (2009) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Deng, J.¹ Dong, W.² Socher, R.³ Li, L.-J.⁴ Li, K.⁵ Fei-Fei, L.⁶

9
- 84944096380
- Language models for image captioning: The quirks and what works
- 2
- J. Devlin, H. Cheng, H. Fang, S. Gupta, L. Deng, X. He, G. Zweig, and M. Mitchell. Language models for image captioning: The quirks and what works. In Proc. IEEE Int. Conf. Comp. Vis., 2015.
- (2015) Proc. IEEE Int. Conf. Comp. Vis.
- Devlin, J.¹ Cheng, H.² Fang, H.³ Gupta, S.⁴ Deng, L.⁵ He, X.⁶ Zweig, G.⁷ Mitchell, M.⁸

10
- 84959236502
- Long-term recurrent convolutional networks for visual recognition and description
- 1, 3, 5, 6, 7
- J. Donahue, L. A. Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Donahue, J.¹ Hendricks, L.A.² Guadarrama, S.³ Rohrbach, M.⁴ Venugopalan, S.⁵ Saenko, K.⁶ Darrell, T.⁷

11
- 84959250180
- From captions to visual concepts and back
- 2, 3, 4, 5, 6
- H. Fang, S. Gupta, F. Iandola, R. Srivastava, L. Deng, P. Dollár, J. Gao, X. He, M. Mitchell, J. Platt, et al. From captions to visual concepts and back. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Fang, H.¹ Gupta, S.² Iandola, F.³ Srivastava, R.⁴ Deng, L.⁵ Dollár, P.⁶ Gao, J.⁷ He, X.⁸ Mitchell, M.⁹ Platt, J.¹⁰

12
- 80052017343
- Every picture tells a story: Generating sentences from images
- 2, 3
- A. Farhadi, M. Hejrati, M. A. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. Every picture tells a story: Generating sentences from images. In Proc. Eur. Conf. Comp. Vis. 2010.
- (2010) Proc. Eur. Conf. Comp. Vis.
- Farhadi, A.¹ Hejrati, M.² Sadeghi, M.A.³ Young, P.⁴ Rashtchian, C.⁵ Hockenmaier, J.⁶ Forsyth, D.⁷

13
- 84965148420
- Are you talking to a machine dataset and methods for multilingual image question answering
- 2, 3, 5
- H. Gao, J. Mao, J. Zhou, Z. Huang, L. Wang, and W. Xu. Are You Talking to a Machine Dataset and Methods for Multilingual Image Question Answering. In Proc. Advances in Neural Inf. Process. Syst., 2015.
- (2015) Proc. Advances in Neural Inf. Process. Syst.
- Gao, H.¹ Mao, J.² Zhou, J.³ Huang, Z.⁴ Wang, L.⁵ Xu, W.⁶

14
- 84862277874
- Understanding the difficulty of training deep feedforward neural networks
- 4
- X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In Proc. Int. Conf. Artificial Intell. & Stat., pages 249-256, 2010.
- (2010) Proc. Int. Conf. Artificial Intell. & Stat , pp. 249-256
- Glorot, X.¹ Bengio, Y.²

15
- 84959243872
- Improving image-sentence embeddings using large weakly annotated photo collections
- 2
- Y. Gong, L. Wang, M. Hodosh, J. Hockenmaier, and S. Lazebnik. Improving image-sentence embeddings using large weakly annotated photo collections. In Proc. Eur. Conf. Comp. Vis. 2014.
- (2014) Proc. Eur. Conf. Comp. Vis.
- Gong, Y.¹ Wang, L.² Hodosh, M.³ Hockenmaier, J.⁴ Lazebnik, S.⁵

16
- 0031573117
- Long short-term memory
- 4
- S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Computation, 9 (8): 1735-1780, 1997.
- (1997) Neural Computation , vol.9 , Issue.8 , pp. 1735-1780
- Hochreiter, S.¹ Schmidhuber, J.²

17
- 84883394520
- Framing image description as a ranking task: Data, models and evaluation metrics
- 2, 5
- M. Hodosh, P. Young, and J. Hockenmaier. Framing image description as a ranking task: Data, models and evaluation metrics. JAIR, pages 853-899, 2013.
- (2013) JAIR , pp. 853-899
- Hodosh, M.¹ Young, P.² Hockenmaier, J.³

18
- 84973917813
- Guiding long-short term memory for image caption generation
- 2, 6
- X. Jia, E. Gavves, B. Fernando, and T. Tuytelaars. Guiding Long-Short Term Memory for Image Caption Generation. In Proc. IEEE Int. Conf. Comp. Vis., 2015.
- (2015) Proc. IEEE Int. Conf. Comp. Vis.
- Jia, X.¹ Gavves, E.² Fernando, B.³ Tuytelaars, T.⁴

19
- 84856653718
- Learning cross-modality similarity for multinomial data
- 2
- Y. Jia, M. Salzmann, and T. Darrell. Learning cross-modality similarity for multinomial data. In Proc. IEEE Int. Conf. Comp. Vis., 2011.
- (2011) Proc. IEEE Int. Conf. Comp. Vis.
- Jia, Y.¹ Salzmann, M.² Darrell, T.³

20
- 85009867858
- arXiv: 1408. 5093, 6
- Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. Caffe: Convolutional Architecture for Fast Feature Embedding. ArXiv: 1408. 5093, 2014.
- (2014) Caffe: Convolutional Architecture for Fast Feature Embedding
- Jia, Y.¹ Shelhamer, E.² Donahue, J.³ Karayev, S.⁴ Long, J.⁵ Girshick, R.⁶ Guadarrama, S.⁷ Darrell, T.⁸

21
- 84986312327
- arXiv: 1506. 06272, 6
- J. Jin, K. Fu, R. Cui, F. Sha, and C. Zhang. Aligning where to see and what to tell: image caption with region-based attention and scene factorization. ArXiv: 1506. 06272, 2015.
- (2015) Aligning Where to See and What to Tell: Image Caption with Region-based Attention and Scene Factorization
- Jin, J.¹ Fu, K.² Cui, R.³ Sha, F.⁴ Zhang, C.⁵

22
- 84946734827
- Deep visual-semantic alignments for generating image descriptions
- 2, 4, 5, 6, 7
- A. Karpathy and L. Fei-Fei. Deep visual-semantic alignments for generating image descriptions. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Karpathy, A.¹ Fei-Fei, L.²

23
- 84937843643
- Deep fragment embeddings for bidirectional image sentence mapping
- 1, 2
- A. Karpathy, A. Joulin, and F. F. Li. Deep fragment embeddings for bidirectional image sentence mapping. In Proc. Advances in Neural Inf. Process. Syst., 2014.
- (2014) Proc. Advances in Neural Inf. Process. Syst.
- Karpathy, A.¹ Joulin, A.² Li, F.F.³

24
- 84952349298
- Unifying visual-semantic embeddings with multimodal neural language models
- 2
- R. Kiros, R. Salakhutdinov, and R. S. Zemel. Unifying visual-semantic embeddings with multimodal neural language models. In Proc. Conf. Association for Computational Linguistics, 2015.
- (2015) Proc. Conf. Association for Computational Linguistics
- Kiros, R.¹ Salakhutdinov, R.² Zemel, R.S.³

25
- 84876231242
- Imagenet classification with deep convolutional neural networks
- 1, 4
- A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proc. Advances in Neural Inf. Process. Syst., 2012.
- (2012) Proc. Advances in Neural Inf. Process. Syst.
- Krizhevsky, A.¹ Sutskever, I.² Hinton, G.E.³

26
- 85009854844
- 2, 3
- G. Kulkarni, V. Premraj, V. Ordonez, S. Dhar, S. Li, Y. Choi, A. C. Berg, and T. L. Berg. IEEE Trans. Pattern Anal. Mach. Intell.
- IEEE Trans. Pattern Anal. Mach. Intell.
- Kulkarni, G.¹ Premraj, V.² Ordonez, V.³ Dhar, S.⁴ Li, S.⁵ Choi, Y.⁶ Berg, A.C.⁷ Berg, T.L.⁸

27
- 84878189119
- Collective generation of natural image descriptions
- 2
- P. Kuznetsova, V. Ordonez, A. C. Berg, T. L. Berg, and Y. Choi. Collective generation of natural image descriptions. In Proc. Conf. Association for Computational Linguistics, 2012.
- (2012) Proc. Conf. Association for Computational Linguistics
- Kuznetsova, P.¹ Ordonez, V.² Berg, A.C.³ Berg, T.L.⁴ Choi, Y.⁵

28
- 84934873221
- Treetalk: Composition and compression of trees for image descriptions
- 2
- P. Kuznetsova, V. Ordonez, T. L. Berg, and Y. Choi. Treetalk: Composition and compression of trees for image descriptions. Proc. Conf. Association for Computational Linguistics, 2014.
- (2014) Proc. Conf. Association for Computational Linguistics
- Kuznetsova, P.¹ Ordonez, V.² Berg, T.L.³ Choi, Y.⁴

29
- 0032203257
- Gradientbased learning applied to document recognition
- 1
- Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradientbased learning applied to document recognition. Proc. IEEE, 86 (11): 2278-2324, 1998.
- (1998) Proc. IEEE , vol.86 , Issue.11 , pp. 2278-2324
- LeCun, Y.¹ Bottou, L.² Bengio, Y.³ Haffner, P.⁴

30
- 84862279067
- Composing simple image descriptions using web-scale n-grams
- 2, 3
- S. Li, G. Kulkarni, T. L. Berg, A. C. Berg, and Y. Choi. Composing simple image descriptions using web-scale n-grams. In CoNLL, 2011.
- (2011) CoNLL
- Li, S.¹ Kulkarni, G.² Berg, T.L.³ Berg, A.C.⁴ Choi, Y.⁵

31
- 85009838903
- Microsoft COCO: Common objects in context
- 5
- T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In Proc. Eur. Conf. Comp. Vis. 2014.
- (2014) Proc. Eur. Conf. Comp. Vis.
- Lin, T.-Y.¹ Maire, M.² Belongie, S.³ Hays, J.⁴ Perona, P.⁵ Ramanan, D.⁶ Dollár, P.⁷ Zitnick, C.L.⁸

32
- 85007153677
- Learning to answer questions from image using convolutional neural network
- 3, 7
- L. Ma, Z. Lu, and H. Li. Learning to Answer Questions From Image using Convolutional Neural Network. In AAAI, 2016.
- (2016) AAAI
- Ma, L.¹ Lu, Z.² Li, H.³

33
- 84937822746
- A multi-world approach to question answering about real-world scenes based on uncertain input
- 3
- M. Malinowski and M. Fritz. A multi-world approach to question answering about real-world scenes based on uncertain input. In Proc. Advances in Neural Inf. Process. Syst., pages 1682-1690, 2014.
- (2014) Proc. Advances in Neural Inf. Process. Syst , pp. 1682-1690
- Malinowski, M.¹ Fritz, M.²

34
- 84944790970
- arXiv: 1501. 03302, 3
- M. Malinowski and M. Fritz. Hard to Cheat: A Turing Test based on Answering Questions about Images. ArXiv: 1501. 03302, 2015.
- (2015) Hard to Cheat: A Turing Test Based on Answering Questions about Images
- Malinowski, M.¹ Fritz, M.²

35
- 84973896625
- Ask your neurons: A neural-based approach to answering questions about images
- 2, 3, 5
- M. Malinowski, M. Rohrbach, and M. Fritz. Ask Your Neurons: A Neural-based Approach to Answering Questions about Images. In Proc. IEEE Int. Conf. Comp. Vis., 2015.
- (2015) Proc. IEEE Int. Conf. Comp. Vis.
- Malinowski, M.¹ Rohrbach, M.² Fritz, M.³

36
- 85083950512
- Deep captioning with multimodal recurrent neural networks (m-RNN)
- 1, 2, 4, 5, 6, 7
- J. Mao, W. Xu, Y. Yang, J. Wang, and A. Yuille. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN). In Proc. Int. Conf. Learn. Representations, 2015.
- (2015) Proc. Int. Conf. Learn. Representations
- Mao, J.¹ Xu, W.² Yang, Y.³ Wang, J.⁴ Yuille, A.⁵

37
- 84898956512
- Distributed representations of words and phrases and their compositionality
- 8
- T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Proc. Advances in Neural Inf. Process. Syst., pages 3111-3119, 2013.
- (2013) Proc. Advances in Neural Inf. Process. Syst , pp. 3111-3119
- Mikolov, T.¹ Sutskever, I.² Chen, K.³ Corrado, G.S.⁴ Dean, J.⁵

38
- 84976702763
- WordNet: A lexical database for English
- 8
- G. A. Miller. WordNet: A lexical database for English. Communications of the ACM, 38 (11): 39-41, 1995.
- (1995) Communications of the ACM , vol.38 , Issue.11 , pp. 39-41
- Miller, G.A.¹

39
- 85034832841
- Midge: Generating image descriptions from computer vision detections
- 2
- M. Mitchell, X. Han, J. Dodge, A. Mensch, A. Goyal, A. Berg, K. Yamaguchi, T. Berg, K. Stratos, and H. Daumé III. Midge: Generating image descriptions from computer vision detections. In EACL, 2012.
- (2012) EACL
- Mitchell, M.¹ Han, X.² Dodge, J.³ Mensch, A.⁴ Goyal, A.⁵ Berg, A.⁶ Yamaguchi, K.⁷ Berg, T.⁸ Stratos, K.⁹ Daumé, H.¹⁰

40
- 85162522202
- Im2text: Describing images using 1 million captioned photographs
- 2
- V. Ordonez, G. Kulkarni, and T. L. Berg. Im2text: Describing images using 1 million captioned photographs. In Proc. Advances in Neural Inf. Process. Syst., 2011.
- (2011) Proc. Advances in Neural Inf. Process. Syst.
- Ordonez, V.¹ Kulkarni, G.² Berg, T.L.³

41
- 85133336275
- BLEU: A method for automatic evaluation of machine translation
- 5
- K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. BLEU: A method for automatic evaluation of machine translation. In Proc. Conf. Association for Computational Linguistics, 2002.
- (2002) Proc. Conf. Association for Computational Linguistics
- Papineni, K.¹ Roukos, S.² Ward, T.³ Zhu, W.-J.⁴

42
- 84973900209
- arXiv: 1503. 00848, March.
- J. Pont-Tuset, P. Arbeláez, J. Barron, F. Marques, and J. Malik. Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation. In arXiv: 1503. 00848, March 2015.
- (2015) Multiscale Combinatorial Grouping for Image Segmentation and Object Proposal Generation
- Pont-Tuset, J.¹ Arbeláez, P.² Barron, J.³ Marques, F.⁴ Malik, J.⁵

43
- 84962816362
- Image question answering: A visual semantic embedding model and a new dataset
- 2, 3, 5, 7
- M. Ren, R. Kiros, and R. Zemel. Image Question Answering: A Visual Semantic Embedding Model and a New Dataset. In Proc. Advances in Neural Inf. Process. Syst., 2015.
- (2015) Proc. Advances in Neural Inf. Process. Syst.
- Ren, M.¹ Kiros, R.² Zemel, R.³

44
- 84898775239
- Translating video content to natural language descriptions
- 3
- M. Rohrbach, W. Qiu, I. Titov, S. Thater, M. Pinkal, and B. Schiele. Translating video content to natural language descriptions. In Proc. IEEE Int. Conf. Comp. Vis., 2013.
- (2013) Proc. IEEE Int. Conf. Comp. Vis.
- Rohrbach, M.¹ Qiu, W.² Titov, I.³ Thater, S.⁴ Pinkal, M.⁵ Schiele, B.⁶

45
- 85083953063
- Very deep convolutional networks for large-scale image recognition
- 1, 4, 6
- K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. In Proc. Int. Conf. Learn. Representations, 2015.
- (2015) Proc. Int. Conf. Learn. Representations
- Simonyan, K.¹ Zisserman, A.²

46
- 84906925854
- Grounded compositional semantics for finding and describing images with sentences
- 2
- R. Socher, A. Karpathy, Q. V. Le, C. D. Manning, and A. Y. Ng. Grounded compositional semantics for finding and describing images with sentences. Proc. Conf. Association for Computational Linguistics, 2014.
- (2014) Proc. Conf. Association for Computational Linguistics
- Socher, R.¹ Karpathy, A.² Le, Q.V.³ Manning, C.D.⁴ Ng, A.Y.⁵

47
- 84928547704
- Sequence to sequence learning with neural networks
- 1
- I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Proc. Advances in Neural Inf. Process. Syst., 2014.
- (2014) Proc. Advances in Neural Inf. Process. Syst.
- Sutskever, I.¹ Vinyals, O.² Le, Q.V.³

48
- 84937522268
- Going deeper with convolutions
- 1, 6
- C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Szegedy, C.¹ Liu, W.² Jia, Y.³ Sermanet, P.⁴ Reed, S.⁵ Anguelov, D.⁶ Erhan, D.⁷ Vanhoucke, V.⁸ Rabinovich, A.⁹

49
- 84956980995
- CIDEr: Consensus-based image description evaluation
- 6
- R. Vedantam, C. L. Zitnick, and D. Parikh. CIDEr: Consensus-based Image Description Evaluation. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2015.
- (2015) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Vedantam, R.¹ Zitnick, C.L.² Parikh, D.³

50
- 84939821075
- Show and tell: A neural image caption generator
- 1, 2, 3, 4, 5, 6, 7
- O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., 2014.
- (2014) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- Vinyals, O.¹ Toshev, A.² Bengio, S.³ Erhan, D.⁴

51
- 84938908409
- arXiv: 1406. 5726, 4
- Y. Wei, W. Xia, J. Huang, B. Ni, J. Dong, Y. Zhao, and S. Yan. CNN: Single-label to multi-label. ArXiv: 1406. 5726, 2014.
- (2014) CNN: Single-label to Multi-label
- Wei, Y.¹ Xia, W.² Huang, J.³ Ni, B.⁴ Dong, J.⁵ Zhao, Y.⁶ Yan, S.⁷

52
- 85146676791
- Verbs semantics and lexical selection
- 7
- Z. Wu and M. Palmer. Verbs semantics and lexical selection. In Proc. Conf. Association for Computational Linguistics, 1994.
- (1994) Proc. Conf. Association for Computational Linguistics
- Wu, Z.¹ Palmer, M.²

53
- 84970002232
- Show, Attend and tell: Neural image caption generation with visual attention
- 2, 5, 6
- K. Xu, J. Ba, R. Kiros, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. In Proc. Int. Conf. Mach. Learn., 2015.
- (2015) Proc. Int. Conf. Mach. Learn.
- Xu, K.¹ Ba, J.² Kiros, R.³ Courville, A.⁴ Salakhutdinov, R.⁵ Zemel, R.⁶ Bengio, Y.⁷

54
- 80053258778
- Corpus-guided sentence generation of natural images
- 3
- Y. Yang, C. L. Teo, H. Daumé III, and Y. Aloimonos. Corpus-guided sentence generation of natural images. In Proc. Conf. Empirical Methods in Natural Language Processing, 2011.
- (2011) Proc. Conf. Empirical Methods in Natural Language Processing
- Yang, Y.¹ Teo, C.L.² Daumé, H.³ Aloimonos, Y.⁴

55
- 84973884896
- Describing videos by exploiting temporal structure
- 1
- L. Yao, A. Torabi, K. Cho, N. Ballas, C. Pal, H. Larochelle, and A. Courville. Describing videos by exploiting temporal structure. In Proc. IEEE Int. Conf. Comp. Vis., 2015.
- (2015) Proc. IEEE Int. Conf. Comp. Vis.
- Yao, L.¹ Torabi, A.² Cho, K.³ Ballas, N.⁴ Pal, C.⁵ Larochelle, H.⁶ Courville, A.⁷

56
- 84986317307
- Image captioning with semantic attention
- June.
- Q. You, H. Jin, Z. Wang, C. Fang, and J. Luo. Image captioning with semantic attention. In Proc. IEEE Conf. Comp. Vis. Patt. Recogn., June 2016.
- (2016) Proc. IEEE Conf. Comp. Vis. Patt. Recogn.
- You, Q.¹ Jin, H.² Wang, Z.³ Fang, C.⁴ Luo, J.⁵

57
- 84906494296
- From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
- 5
- P. Young, A. Lai, M. Hodosh, and J. Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Proc. Conf. Association for Computational Linguistics, 2, 2014.
- (2014) Proc. Conf. Association for Computational Linguistics , vol.2
- Young, P.¹ Lai, A.² Hodosh, M.³ Hockenmaier, J.⁴

58
- 84864049528
- Multiple instance boosting for object detection
- 3
- C. Zhang, J. C. Platt, and P. A. Viola. Multiple instance boosting for object detection. In Proc. Advances in Neural Inf. Process. Syst., 2005.
- (2005) Proc. Advances in Neural Inf. Process. Syst.
- Zhang, C.¹ Platt, J.C.² Viola, P.A.³

59
- 84986248327
- arXiv: 1507. 05670, 3, 8
- Y. Zhu, C. Zhang, C. Ré, and L. Fei-Fei. Building a Largescale Multimodal Knowledge Base for Visual Question Answering. ArXiv: 1507. 05670, 2015.
- (2015) Building A Largescale Multimodal Knowledge Base for Visual Question Answering
- Zhu, Y.¹ Zhang, C.² Ré, C.³ Fei-Fei, L.⁴

* 이 정보는 Elsevier사의 SCOPUS DB에서 KISTI가 분석하여 추출한 것입니다.